AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
204 stars 25 forks source link

Alphaproteobacteria (0 targets) #56

Closed ebervilla closed 2 years ago

ebervilla commented 2 years ago

Hi Mike

I hope you are doing well

I have been trying to construct a phylogenomic tree of rhizobia strains in GToTree using Alphaproteobacteria genes; however, apparently, GtoTree does not have any genes (see the attached picture of my console). Do you know what could I do to be able to construct the tree using alphaproteobacteria genes?

Thank you very much for your help and for all the effort you put into your awesome tutorials

Best

Eber

AstrobioMike commented 2 years ago

Hi there, Eber!

Thanks for the kind words :) Sorry it’s giving you trouble right now

I recently switched things such that GToTree downloads the specified SCG-set the first time it’s used and then stores it (to save a ton of space for those who don’t need all of them). It sounds like maybe something is not working there. But I think you forgot to attach the image (github needs a reminder like gmail, which saves me regularly from that, ha). Can you put that in or copy/paste the output text for me?

And also the output for the following:

GToTree -v
gtt-data-locations check
gtt-hmms
ls ${GToTree_HMM_dir}

Thanks!

ebervilla commented 2 years ago

Hi Mike

Yes, definitely it is necessary. Here I attached the outputs:

output_gtotree_alphaproteobacteria.txt

GToTree-v_gtt-data-locations_check.txt

Thank you very much in advance

Best

AstrobioMike commented 2 years ago

Thanks, Eber!

Either way, this will help me implement a check that will catch this before trying to run everything else, and give an actually useful error message...

I'm wondering if the file tried to download but didn't finish successfully, and now it's not trying to download it because it is being found. See what size it is with:

ls -lh ${GToTree_HMM_dir}Alphaproteobacteria.hmm

If it's fully there, it should be ~8.3M.

Either way, try removing that with:

rm ${GToTree_HMM_dir}Alphaproteobacteria.hmm

And then running your main GToTree command again where you're specifying -H Alphaproteobacteria, and see if you get the same thing after you removed that file. I'm hoping it will download it again, and then you'll see this as the program starts, showing the expected number of targets:

    HMM source to be used:
      - Alphaproteobacteria.hmm (117 targets)

🤞

ebervilla commented 2 years ago

Hi mike,

It worked 👍

Thank you very much!!

``-------------------------------- RUN INFO ---------------------------------

Input genome sources include:
  - NCBI accessions listed in Rhizobium_refseq_accessions.txt (38 genomes)
  - Fasta files listed in fasta_files.txt (2 genomes)

                         Total input genomes: 40

HMM source to be used:
  - Alphaproteobacteria.hmm (117 targets)

Options set:
  - The output directory has been set to "Syn-GtoTree-out_3/"
  - Taxonkit will be used to add NCBI taxonomy info to labels where possible
  - Lineage information added to labels will be Species
  - Number of jobs to run during parallelizable steps has been set to 4``
AstrobioMike commented 2 years ago

Great!

I'm going to leave this open until i implement a check to prevent this in the future

Thanks, Eber!