Identify and document scalability benchmarks

mortonjt commented 6 years ago

Empress needs to be run against a huge tree (> 1 million tips)

antgonza commented 4 years ago

Just wondering if there are any updates on this issue; thank you.

antgonza commented 4 years ago

Installed the latest version of empress and ran it on one of the large trees generated in Qiita within a 2020.2 Qiime2 conda environment; the mapping file, feature-table and taxonomies from the moving pictures dataset - only one dataset.

Note that this is a tree was created over a year ago (we could generate even larger today), is the 100bp fragments insertion tree and is ~8.8M tips:

In [1]: from skbio import TreeNode
In [2]: tree = TreeNode.read('../insertion_tree.relabelled.tre')
In [3]: print(tree.count(tips=True))     
8830174

I generated the no-taxonomy, GG and Silva added empress qzv's to test, each takes ~3hrs to generate the qzv and it works just fine (no error messages). However, when I try to open them in https://view.qiime2.org/, the browser fails with:

and if I unzip the qzv and try to open the index.html or empress.html, I get:

Anyway, here are the testing files.

cc: @ElDeveloper

kwcantrell commented 4 years ago

@antgonza I'm looking into this

fedarko commented 4 years ago

Once we identify upper bounds for what sorts of data sizes Empress can comfortably visualize, we should document this clearly in the README so that e.g. users with billion-tip trees know that they probably want to consult another tool and/or a priest ._.

biocore / empress

Identify and document scalability benchmarks #74