DataBiosphere / data-explorer-indexers

BSD 3-Clause "New" or "Revised" License
4 stars 5 forks source link

Willyn/bulk at end #130

Closed wnojopra closed 5 years ago

wnojopra commented 5 years ago

Fixes #127 .

Previously, for each table in the dataset we would push its indexed data up to elasticsearch. Unfortunately it seems that each additional table gets slower as it re-indexes. The idea of this fix is that we keep all table data in memory before pushing it up to the elasticsearch table. Here we do it for both samples and participants.

Testing with the baseline cdr data finished fairly quickly (~30-40 mins) and used an additional roughly 1.1 GB of memoryc

Testing with the 1000 genomes data (to test samples) was also successful.

wnojopra commented 5 years ago

@melissachang Thanks for the reviews, I added the updates.

wnojopra commented 5 years ago

Thanks @melissachang , made the changes.

wnojopra commented 5 years ago

@melissachang I made the changes we discussed, in regards to structuring the samples data.

wnojopra commented 5 years ago

@melissachang Thanks, I've included the updates.