esteinig / cerebro

Metagenomic diagnostics stack for low abundance sample types and clinical reporting
GNU General Public License v3.0
2 stars 0 forks source link

Refactor aggregate taxon model #5

Closed esteinig closed 1 month ago

esteinig commented 12 months ago

State

Currently the cerebro.taxa section of the Cerebro data model is a HashMap<taxid, Taxon> where type taxid = String. This is a result of the aggregation function which uses sequential HashMaps to group taxa by their taxid.

Problem

HashMap is not able to be queried efficiently using MongoDB aggregation pipelines. Downstream applications eventually use a Vec<Taxon> particularly endpoints on the API.

Refactoring to Vec is necessary, but at this stage may affect a number of dependent subsystems.