bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
89 stars 26 forks source link

ENH: add support for fetching protein seqs from NCBI #92

Closed misialq closed 3 years ago

misialq commented 3 years ago

Adds a new get-ncbi-data-protein action for fetching protein sequences and related taxonomies. Features a small refactor of the existing get-ncbi-data and related functions to make it more general and applicable to sequence types other than DNA.

That method requires new protein types that will become available after https://github.com/qiime2/q2-types/pull/252 is merged.

Tried out with a small (10 seqs) and not-so-small (~1000 seqs) dataset - all looked ok.

Note to self: merge #91 first to get the updated license headers and clean up after that.

Sample IDs: sample-metadata.tsv.zip