The test cases use just three viruses, which together with their parent nodes for the full lineage is only 10 taxonomy identifiers needed:
$ python filter_taxonomy.py 137758 946046 12227
Filtering NCBI taxonomy files nodes.dmp and names.dmp
Will create nodes_.dmp and names_.dmp using just the given
3 entries and their parent nodes.
Loaded 1692822 entries from nodes.dmp
Expanded 3 given TaxID to a list of 10 including ancestors
Created nodes_.dmp
Created names_.dmp
With these changes TravisCI will no longer download the full taxonomy, instead we provide this mini ten entry taxonomy under version control.
As a bonus, the TravisCI runs are now much faster. I presume on top of avoiding the download and unzip, the smaller taxonomy also speeds up Kraken and Kaiju as well.
The test cases use just three viruses, which together with their parent nodes for the full lineage is only 10 taxonomy identifiers needed:
With these changes TravisCI will no longer download the full taxonomy, instead we provide this mini ten entry taxonomy under version control.