bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
85 stars 26 forks source link

Download RefSeq Genomes and associated Taxonomy #125

Closed mikerobeson closed 1 year ago

mikerobeson commented 2 years ago

Provide the ability for users to download RefSeq Genome Assemblies, along with their associated taxonomy. See this forum thread for more details.

Some notes:

misialq commented 2 years ago

I think this can also be achieved by using the NCBI Datasets (see #96), which I started working on a while ago (https://github.com/misialq/RESCRIPt/blob/ncbi-datasets/rescript/ncbi_datasets.py) - using a taxon ID as a query one can pull all assembled genomes, together with their metadata and taxonomies (all using the Datasets API, so no need to do much additional parsing as far as I remember)... Although, it does require some new semantic types defined in https://github.com/bokulich-lab/q2-types-genomics (for storing the genome annotations).

nbokulich commented 1 year ago

@misialq is this PR closed by #153 ?

misialq commented 1 year ago

Yes, it is!