Closed grahame closed 6 years ago
@schangccg let's take a look at this together, it might make sense to leave this until I've got the contextual metadata refactoring done.
Slow query reproduced here, benchmark on my system is ~ 3:33 to export this data (~ 900K)
curl 'http://localhost:8000/private/api/v1/export_biom?token=&q=%7B%22taxonomy_filters%22%3A%5B%7B%22value%22%3A%222%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%2218%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%5D%2C%22contextual_filters%22%3A%7B%22filters%22%3A%5B%5D%2C%22environment%22%3A%7B%22value%22%3A%221%22%2C%22operator%22%3A%22is%22%7D%2C%22mode%22%3A%22and%22%7D%2C%22amplicon_filter%22%3A%7B%22value%22%3A%223%22%2C%22operator%22%3A%22is%22%7D%7D' > thing.zip
Slow function confirmed: it's otu_rows
which is 3:10 or so of the 3:30 runtime for the full export.
Slightly surprising, I'm chasing it down now.
Landed.
The BIOM export is working, but is a little slow.
Investigate performance improvements.
Most of the data output should be coming from the
abundance_tbl
function; we should heavily optimise this function. I suspect the call toformat()
at line 96 ofbiom.py
is actually more expensive than we'd like. We could change the calls to.index()
to use a O(1) lookup list rather than the currentO(N)
approach.