biocore / q2-greengenes2

A QIIME 2 plugin for interaction with the Greengenes2 database
BSD 3-Clause "New" or "Revised" License
26 stars 3 forks source link

taxonomy-from-table memory usage #12

Open colinbrislawn opened 1 year ago

colinbrislawn commented 1 year ago

I'm testing out the new GreenGenes! 💚

The taxonomy-from-table is currently being killed once it reaches about 50% memory usage on a 16 GB VM.

The introduction post warned about this

NOTE: Just like filter-features, this command right now will require around 8-10GB of memory.

It's possible my system is overly aggressive with memory management, but either way I'm interested in tracking this issue for all the folks with potatoes

wasade commented 1 year ago

Thanks, @colinbrislawn!

Ya at the moment the method is quite burdensome, even with the short cuts we already implemented. Its original implementation was much worse in memory... What I'm considering is representing the taxonomic data in a SQLite3 database which would avoid the resident overhead, and likely would not greatly impacting performance. My hope is to have this in for the next release which I'm currently working on the upstream pieces for.

Out of curiosity, is this something you'd have time and interest in working on?

colinbrislawn commented 1 year ago

Maybe. I've not worked with SQLite3 before so I'm not sure I'm the best fit.

I can help with testing or docs, but that's mostly done for this plugin.

wasade commented 1 year ago

Okay, no worries, totally understand!