Closed misialq closed 1 year ago
This is now ready for review. One (potential) issue here is that the q2-types-genomics version required by this new action is only available in the 2023.5 QIIME 2 conda channel: using this additional channel while creating a new environment (as it is described in the README - I updated the install command) results in an environment which is entirely 2023.5-dev. I have not found a way to only fetch q2-types-genomics from that additional channel (I think there is a bug/feature in conda which unfortunately prevents me from explicitly indicating this channel only for that package - installation fails in that case). Some solutions to this could be:
Thanks @misialq ! I am excited to review this 😁
I propose the following steps somewhere between 1 + 2:
Step 3 should be a separate PR to make it easier to revert (if desired later on). But for sure if this PR introduces a dependency on 2023.5 then we might want to have easy install instructions for those using older envs after 2023.5 anyway...
Thanks @nbokulich, that sounds perfect!
On your last point: yes, it also occurred to me today - it's not ideal and I'm thinking of making q2-types-genomics not always depend on the latest QIIME 2 version for this exact reason but we can discuss that elsewhere.
Hi @misialq, just a quick drive-by comment. I've been able to download genomes, etc. Works great! I think printing a more informative error statement would be useful here. Something like: "No genomes were found that fit the query terms 'taxon', 'assembly-source', 'assembly level', 'reference', 'exact-match'."
Or if you are able to specifically determine which of these parameters did or did not return anything and print those? 🤷♂️
It took me a few tries to get the genomes I was interested in playing with, e.g. this failed:
qiime rescript get-ncbi-genomes \
--p-taxon '147802' \
--output-dir ~/Downloads/lacto-iners-ref-only \
--verbose
But this worked:
qiime rescript get-ncbi-genomes \
--p-taxon '147802' \
--p-no-only-reference \
--output-dir ~/Downloads/lacto-iners-any \
--verbose
This is so awesome!
Hey @mikerobeson, thanks for giving it a try! I'll look into it and see whether I can provide a more informative message 👌
Hey @mikerobeson, I have updated the error message a little, according to your suggestion. It is, unfortunately, not possible to find out which of the parameters should be updated so the message had to be a bit more generic... Let me know what you think 🙏
This PR adds a new
get-ncbi-genomes
action which can be used to fetch any genomes from NCBI by taxon using their new Datasets API. The action would return:FeatureData[Sequence]
with all the requested genomes at the indicated assembly level (it's also possible to indicate whether only RefSeq or all genomes should be considered)GenomeData[Loci|Proteins]
artifacts with respective annotationsFeatureData[Taxonomy]
containing corresponding NCBI taxonomy.