The performance of the full screen map's use-cases should be improved to provide a smooth experience for large datasets.
Discussion 01/27/2023
Discussed possibility of combining merging, subsetting and data service to avoid extra serialization. We decided this was possibly too big of an undertaking.
Decided to add a mode to merging/subsetting where tabular results are output as application/octet-stream of serialized primitive types.
[ ] TODO: Create an issue for this
Discussed prioritizing issue to write vocabulary files. I think this one can probably wait until the above since above is a larger bottleneck
Need to test where the bottleneck is for multi-entity requests. So far I've mainly been using testing default UMSP map request
1/30/2023: Currently identified bottlenecks:
Single-Entity Requests
A lot of work has been done to streamline this case. Because we can short-circuit merging logic, the main bottleneck comes in data service.
Requests with overlay
The overlay plugin calls out to R, which currently adds overhead compared to requests without the overlay.
Difference in duration with and without overlay: (17 seconds with overlay, 10 seconds without overlay)
Possible course of action:
Implement overlay plugin in Java.
Implement option for merging service to output binary
Requests without overlay
These are the fastest map requests we currently make, but they still appear to be bottlenecked by data service.
This may be due to the string parsing required in the data plugin, but we need to verify (TODO)
Currently running profiler on data service in his case.
Using Scanner to read lines proved to be very inefficient (VEuPathDB/EdaDataService#233)
This was recently improved by VEuPathDB/EdaDataService#230
Multi-Entity Requests
The bottleneck for multi-entity requests is primarily merging service.
See
TODO, measure whether overlay is also a bottleneck in this case. This may involve running non-containerized rserve
Below is a profiler run of merging service against multi-entity request
I'm experimenting with reducing usage of more complex data structures to speed things up.
Subtasks
90
VEuPathDB/tool-eda-file-dumper#10
VEuPathDB/lib-eda-subsetting#20
VEuPathDB/tool-eda-file-dumper#11
VEuPathDB/EdaSubsettingService#92
VEuPathDB/EdaDataService#230
VEuPathDB/EdaDataService#233
Implement Java plugin for map overlay (EdaDataService)
Implement vocabulary files for categorical variables
Overview
The performance of the full screen map's use-cases should be improved to provide a smooth experience for large datasets.
Discussion 01/27/2023
application/octet-stream
of serialized primitive types.1/30/2023: Currently identified bottlenecks:
Single-Entity Requests A lot of work has been done to streamline this case. Because we can short-circuit merging logic, the main bottleneck comes in data service.
Requests with overlay
Requests without overlay
Scanner
to read lines proved to be very inefficient (VEuPathDB/EdaDataService#233)Multi-Entity Requests
Subtasks
90