VEuPathDB / EdaSubsettingService

A REST service to provide data and subsetting in the Exploratory Data Analysis Workspace
Apache License 2.0
0 stars 0 forks source link

Introduce concurrency to file-based subsetting #90

Closed dmgaldi closed 1 year ago

dmgaldi commented 1 year ago

Overview

Currently, generating map-markers visualizations for our largest studies is slow. The two parts of the algorithm that can be parallelized that I want to target initially are:

  1. Reading data and de-serializing it from binary to java objects
  2. Merging filtered data streams and mapping and mapping them up the tree (and everything else)

The reason for targeting number 1 is because concurrency can easily be added without modifying any of our interfaces. This can be done by asynchronously reading ID, Value pairs in FilteredValueIterator and buffering them in-memory before they are requested by upstream iterators.

Acceptance Criteria