AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
199 stars 25 forks source link

Save downloaded genomes from NCBI #95

Open jmtsuji opened 3 months ago

jmtsuji commented 3 months ago

Thanks so much for your continued work on this really helpful workflow, @AstrobioMike !

I have a feature request (so low priority) regarding GToTree. Currently, if a list of NCBI assembly accession numbers is provided as input to GToTree (via -a), GToTree automatically downloads the genome for each accession, predicts amino acids when amino acid files don't already exist, and then runs the SCG search/alignment workflow. Being able to download genomes from NCBI like this is extremely helpful. However, I sometimes find myself wanting to work with the amino acid sequence files for the analyzed genomes after GToTree is finished. It seems like GToTree deletes these amino acid files (and does not save them even in the tmp directory with -d, debug mode). Might it be possible to add a flag to keep these files or to preserve them when debug mode (-d) is set?

Again, this is not urgent, because I can just download the genomes again myself if needed. Thanks so much in advance, and again, I have so appreciated this useful tool!

AstrobioMike commented 3 months ago

Hi there, @jmtsuji!

Thanks for the kind words!

I will look into adding an option for this when I can, or at least there’s certainly no reason they shouldn’t be saved with the debug flag like you tried!

You mentioned you could download them yourself, but I’ll also note have the same NCBI download functionality packaged with my bit package for this very purpose, it just takes input assembly accessions just like GToTree.

The conda install steps are here: https://github.com/AstrobioMike/bit?tab=readme-ov-file#conda-install

and then you’d want the program bit-dl-ncbi-assemblies, and passing -f protein along with the input wanted accessions would download the amino acid files if they are available. If that’s helpful to you

Thanks for the suggestion!

jmtsuji commented 3 months ago

@AstrobioMike Thanks for the quick response! Also, good to know about bit; bit-dl-ncbi-assemblies could potentially be quite useful. All the best!