BioplatformsAustralia / bpaotu

OTU database access for the Australian Microbiome
GNU Affero General Public License v3.0
5 stars 1 forks source link

Speed up BIOM export #41

Closed grahame closed 6 years ago

grahame commented 6 years ago

The BIOM export is working, but is a little slow.

Investigate performance improvements.

Most of the data output should be coming from the abundance_tbl function; we should heavily optimise this function. I suspect the call to format() at line 96 of biom.py is actually more expensive than we'd like. We could change the calls to .index() to use a O(1) lookup list rather than the current O(N) approach.

grahame commented 6 years ago

@schangccg let's take a look at this together, it might make sense to leave this until I've got the contextual metadata refactoring done.

grahame commented 6 years ago

Slow query reproduced here, benchmark on my system is ~ 3:33 to export this data (~ 900K)

curl 'http://localhost:8000/private/api/v1/export_biom?token=&q=%7B%22taxonomy_filters%22%3A%5B%7B%22value%22%3A%222%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%2218%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%2C%7B%22value%22%3A%22%22%2C%22operator%22%3A%22is%22%7D%5D%2C%22contextual_filters%22%3A%7B%22filters%22%3A%5B%5D%2C%22environment%22%3A%7B%22value%22%3A%221%22%2C%22operator%22%3A%22is%22%7D%2C%22mode%22%3A%22and%22%7D%2C%22amplicon_filter%22%3A%7B%22value%22%3A%223%22%2C%22operator%22%3A%22is%22%7D%7D' > thing.zip

grahame commented 6 years ago

Slow function confirmed: it's otu_rows which is 3:10 or so of the 3:30 runtime for the full export. Slightly surprising, I'm chasing it down now.

grahame commented 6 years ago

Landed.