AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
204 stars 25 forks source link

Problem using -D to add GTDB lineage info to genome labels #62

Closed gabriellerocap closed 2 years ago

gabriellerocap commented 2 years ago

Hi Mike! Thanks for all your work on GToTree. I have v1.6.36 installed in its own conda environment on a Mac (OS 10.13.6). It is working great so far except when I try to use the -D flag to add GTDB labels on my tree. Then it tries to run through everything but not only can it not add the labels but at the very end everything that was created is deleted (both the tmp dir and the run dir). Only the log file remains- where it says in the relevant part

Downloading GTDB archaea and bacteria info tables... Download of GTDB taxonomy info failed :( Is the internet connection weak maybe? Continuing, because we've come this far, but labels won't have GTDB lineages incorporated.

Running the exact same input files without any label requests or when using the -t flag to get GenBank taxonomy labels works just fine.

I know it is not a weak internet connection but upon closer inspection of the terminal output it looks like there is some sort of curl issue in downloading the GTDB info.

    Downloading GTDB archaea and bacteria info tables...

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (60) SSL certificate problem: certificate has expired More details here: https://curl.haxx.se/docs/sslcerts.html

I guess maybe the curl problem is on my end? But in terms of GToTree behavior- 1) It should probably just continue as if there were no -D flag and give me the unlabeled tree and other output rather than deleting everything? 2) I am not sure why it is trying to download the GTDB info tables again at this stage when don't they also already live within the conda env folder '/Users/Gabrielle/miniconda3/envs/gtotree/share/gtotree/gtdb_tax_info/GTDB-arc-and-bac-metadata.tsv' (gtt-data-locations check confirms that GTDB_dir is set to this dir)

I can use the gtt-get-accessions-from-GTDB command successfully which I think is drawing from this file rather than redownloading from GTDB each time, wondering if it is possible for the -D flag to try that as well...(or provide an option to point to an info file?)

Thanks for any help! Gabrielle

AstrobioMike commented 2 years ago

@gabriellerocap I am so sorry! My github emails started getting flagged as spam and I haven't seen things here :(

I'm sure you're way past this causing a problem for you and/or moved on, but I'm just seeing this right now

I'm not sure what the curl problem is either, looks like maybe the GTDB certificate was temporarily expired maybe. And you're right that the main gtotree program should be pulling from the stored DB, but i added that functionality later and still haven't tied them together yet :(

Thanks for pushing me to finally tie these together properly. And yes, it definitely should be able to just keep going, i'm not sure why it's not actually. I will be digging into it this weekend.

Sorry again for the ridiculously slow response!

AstrobioMike commented 2 years ago

Heya!

So this is updated as of version v1.7.00 (updated in conda). The main GToTree program now finally uses the already stored GTDB reference files if they exist, or sets them up if they don't. And most importantly, it checks right at the start of the run and exits then telling us if there is a problem, rather than not checking until a lot of work was done already and then possibly resulting in a problem.

Thanks for the note about this and sorry for the annoyance! -Mike

gabriellerocap commented 2 years ago

Thanks Mike-I have downloaded v.1.700 and the -D flag to add GTDB labels is working for me now. Gabrielle