danielpodlesny / samestr

SameStr identifies shared strains between pairs of metagenomic samples based on the similarity of SNV profiles.
GNU Affero General Public License v3.0
17 stars 3 forks source link

reference genomes in samestr extract command #11

Closed qinglong89 closed 1 year ago

qinglong89 commented 1 year ago

The module samestr extract obtains MetaPhlAn marker sequences from reference genomes by using BLASTN. samestr extract \ --input-files reference_genomes/*.fasta \ --marker-dir marker_db/ \ --nprocs 30 \ --output-dir out_extract/

My question: where should we download the reference genomes? this is not clear to me.

Thanks!

qinglong89 commented 1 year ago

NCBI refseq genomes?

ncbi-genome-download --section refseq --format fasta --parallel 80 bacteria

danielpodlesny commented 1 year ago

Yes, for example from NCBI or proGenomes. samestr extract will extract the marker regions from the genome which can be compared and studied across genomes and metagenomes. Use samestr convert for dealing with metagenomes.

Hope this helps!