EDA Optimization Tuning (Phase 2)

Overview

The performance of the full screen map's use-cases should be improved to provide a smooth experience for large datasets.

Discussion 01/27/2023

Discussed possibility of combining merging, subsetting and data service to avoid extra serialization. We decided this was possibly too big of an undertaking.
Decided to add a mode to merging/subsetting where tabular results are output as application/octet-stream of serialized primitive types.
- [ ] TODO: Create an issue for this
Discussed prioritizing issue to write vocabulary files. I think this one can probably wait until the above since above is a larger bottleneck
Need to test where the bottleneck is for multi-entity requests. So far I've mainly been using testing default UMSP map request

1/30/2023: Currently identified bottlenecks:

Single-Entity Requests A lot of work has been done to streamline this case. Because we can short-circuit merging logic, the main bottleneck comes in data service.

Requests with overlay
- The overlay plugin calls out to R, which currently adds overhead compared to requests without the overlay.
- Difference in duration with and without overlay: (17 seconds with overlay, 10 seconds without overlay)
- Possible course of action:
- Implement overlay plugin in Java.
- Implement option for merging service to output binary
Requests without overlay
- These are the fastest map requests we currently make, but they still appear to be bottlenecked by data service.
- This may be due to the string parsing required in the data plugin, but we need to verify (TODO)
  - Currently running profiler on data service in his case.
  - Using Scanner to read lines proved to be very inefficient (VEuPathDB/EdaDataService#233)
- This was recently improved by VEuPathDB/EdaDataService#230

Multi-Entity Requests

The bottleneck for multi-entity requests is primarily merging service.
- See
TODO, measure whether overlay is also a bottleneck in this case. This may involve running non-containerized rserve
Below is a profiler run of merging service against multi-entity request
- I'm experimenting with reducing usage of more complex data structures to speed things up.

Subtasks

90
VEuPathDB/tool-eda-file-dumper#10
VEuPathDB/lib-eda-subsetting#20
VEuPathDB/tool-eda-file-dumper#11
VEuPathDB/EdaSubsettingService#92
VEuPathDB/EdaDataService#230
VEuPathDB/EdaDataService#233
Implement Java plugin for map overlay (EdaDataService)
Implement vocabulary files for categorical variables

VEuPathDB / EdaSubsettingService

EDA Optimization Tuning (Phase 2) #91

Overview

Discussion 01/27/2023

1/30/2023: Currently identified bottlenecks:

Subtasks

90