Closed Sann5 closed 10 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
3b993ed
) 96.77% compared to head (ac2141c
) 96.91%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hey @VinzentRisch, can you give this one a review? Cheers!
Hey @Sann5, everything looks good to me. The tests run and the data can be imported without any issues. π
I just had some problems with getting the right env. You added some new formats to q2-types and those formats are not in the 2023.9 distribution of QIIME2 so I had to install q2-types directly from github.
And there is a typo in your import command. It should be ReferenceDB[NCBITaxonomy]
and not ReferenceDB[TaxonomyNCBI]
.
But when i figured those two things out everything went smoothly. π
@VinzentRisch
I just had some problems with getting the right env. You added some new formats to q2-types and those formats are not in the 2023.9 distribution of QIIME2 so I had to install q2-types directly from GitHub. And there is a typo in your import command. It should be ReferenceDB[NCBITaxonomy] and not ReferenceDB[TaxonomyNCBI].
Crap! Thank you for checking and thanks for the review :). Ill update the PR message accordingly.
@misialq do you want to take a quick look before I SQUASH-megre it?
Yup, thanks, I'll check it out and ping you π
We have decided not to go forward with this extension of the semantic type because the files (names
and nodes.dmp
) are updated very frequently and the last-modified-date information is already contained in the artifact without explicitly making a new file for it.
However, this branch will be pushed upstream just in case we wish to recycle some of the code further on.
About this repo
q2-types-genomics
is aqiime2
plugin that defines semantic types for other plugins.What's new
NCBITaxonomyDirFmt
now contains 4 files instead of 3.nodes.dmp
names.dmp
prot.accession2taxid.gz
version.tsv
version.tsv
file this information is effectively recorded in the provenance of downstream artifacts.Set up an environment
Run it locally
First, clone the repo and checkout the PR branch:
Let's get you some data to play with:
Download prot.accession2taxid.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz -P ncbi_tax_data
Make the version.tsv file
echo -e "file_name\tdate\ttime" > ncbi_tax_data/version.tsv ls -l -D "%d/%m/%Y %H:%M:%S" ncbi_tax_data | awk '{print $8, $6, $7}' | grep -E '(nodes.dmp|names.dmp|prot.accession2taxid.gz)' | tr ' ' '\t' >> ncbi_tax_data/version.tsv
Running the tests