Rfam / rfam-production

Rfam production pipeline
Apache License 2.0
5 stars 3 forks source link

Download sequences using the NCBI Datasets CLI tool #118

Closed AntonPetrov closed 2 years ago

AntonPetrov commented 2 years ago

NCBI provides a new CLI tool for downloading genomes.

For example, run the following command:

datasets download genome accession GCA_000005845.2 --filename GCA_000005845.zip  --exclude-gff3 --exclude-protein --exclude-rna --exclude-genomic-cds  
unzip -d GCA_000005845 GCA_000005845.zip 

to get a FASTA file ending in _genomic.fna that will be used to build Rfamseq.

blakesweeney commented 2 years ago

This is no longer needed. It turns out we cannot use that binary. Some of the genomes in the Uniprot genomes are not found by the tool (old versions, etc), so we cannot use it. Instead we have developed a tool to download from NCBI ourselves.