Open oatesa opened 5 years ago
I'd suggest to go for the whole shebang and make an NCBI indexif you want all that! Using recentrifuge's rextract commands you can then parse out the taxa you dont want to see turn up really easily based on tax ID eg. -x 33630 -x 554915 -x 554296 -x 1401294 -x 193537 -x 3027 -x 33682 -x 207245 -x 38254 -x 2830 -x 2489521 -x 5752 -x 556282 -x 339960 -x 136087 -x 66288 -x 5719 -x 543769 -x 2763 -x 33634 -x 33090 -x 42452 -x 61964 takes out everything but fungi
its overkill for sure but so easy to use
I recommend doing the following steps to build a seqid2taxid.map file that will work with any refseq sequences you download:
wget https://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz gunzip nucl_gb.accession2taxid.gz cut -d $'\t' -f 2,3 nucl_gb.accession2taxid > seqid2taxid.map
Fairy new to this, working though the manual, quick check: im interested in archaea, bacterial,viral and fungi I use the centrifuge-download -o library -m -d "archaea,bacteria,viral,fungi" refseq > seqid2taxid.map
to download ref sequences, the next step states concat all downloaded sequences into a single file- cat //.fna > input-sequences.fna
(1) would i do this for all archaea,bacteria,viral,fungi? or do this individually for each one?
If i also wanted to include vertebrate_mammalian using centrifuge-download -o library –m -d "vertebrate_mammalian" -a "Chromosome" -t 9606 -c 'reference genome' >> seqid2taxid.map
(2) would this overwrite the contents of seqid2taxid.map or add to it?
Thanks in advance for any help