Open lipumpkin opened 2 years ago
Hi, the -n
parameter is an "up to" for each single species. To make an example, let's assume you specify (as you reported above):
phylophlan_get_reference -g g__Acinetobacter -o input_genomes/ -n 5
then up to 5 genomes for each species listed under g__Acinetobacter
will be downloaded.
Now, again for the sake of the example, assume that there are only 3 species followed by the number of available genomes:
g__Acinetobacter|s__species_1 3
g__Acinetobacter|s__species_2 15
g__Acinetobacter|s__species_3 6
In total, you have that there are 24 genomes, but you end up downloading 13 since s__species_1
only have 3 genomes.
Now, if you check phylophlan_get_reference -l | grep "g__Acinetobacter" | less -S
you'll find:
k__Bacteria|p__Proteobacteria|[..]|f__Moraxellaceae|g__Acinetobacter 227 2984
The above means that there are 227 species listed under g__Acinetobacter
and in total there are 2984 genomes that can be retrieved. So, it makes sense that you downloaded 227 genomes with -n 1
and 806 with -n 300
As there is s__Acinetobacter_baumannii
with 2478 genomes.
I hope this helps.
Thanks, Francesco
Hi, thank you very much.
I have fully understand the meaning of the -n
parameter.
There is no doubt that your answers help me understand this code better.
Thanks, Zikun
Hi, professor fasnicar Now i have a question about the option -g in phylophlan_get_reference. I downloaded ref genomes for genus Acinetobacter by this command (phylophlan_get_reference -g g__Acinetobacter -o input_genomes/ -n 1 --verbose 2>&1 | tee logs/phylophlan_get_reference.log). And i got 227 genomes of this genus finally. The txt(assembly_summary_genbank.txt) shows that over 10,000 species belong to genus Acinetobacter. And then I tried other command (-n 300), but i got 806 genomes finally. On what basis were these 227 or 806 species selected? And did they include all child taxa (species) with a validly published of the genus?
Thanks