AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
192 stars 25 forks source link

GTDB Download - Latest path file extensions changed #82

Closed gdangelo8 closed 6 months ago

gdangelo8 commented 7 months ago

The file extension on the latest GTDB download path has changed for the metadata files from metadata.tar.gz to metadata.tsv.gz so any reference to this file path in the GToTree scripts that call GTDB fails.

I was receiving a HTTP: 404 error when running any of the scripts involving GTDB. One example is in the gtt-get-accessions-from-GTDB script with the lines below that call this url https://data.gtdb.ecogenomic.org/releases/latest/

def gen_gtdb_tab(location):

    # getting archaea
    arc_tar_gz = urllib.request.urlopen("https://data.gtdb.ecogenomic.org/releases/latest/ar53_metadata.tar.gz")
    arc_tab = pd.read_csv(arc_tar_gz, sep="\t", compression="gzip", on_bad_lines = 'skip', header=0, low_memory=False)
    arc_tab.rename(columns={arc_tab.columns[0]:"accession"}, inplace=True)
    arc_tab.dropna(inplace=True, how="all")

I tried uninstalling the conda environment and reinstalling it but ran into the same issue while trying to initialize the GTDB database the first time so I ended up just manually changing the urls (from .tar.gz to .tsv.gz) in all the GTDB scripts and it ran without issue. A colleague was able to run his installation (with the url as .tar.gz) without issue but maybe because his GTDB was already initialized before this file extension was changed in September.

AstrobioMike commented 7 months ago

Hey there, @gdangelo8 :)

Thanks for writing in about this, but i think i covered this in the 1.8.3 update last month. Can you check or confirm that the version you have that is giving you trouble still is maybe older (with GToTree -v), and if it is, see if installing 1.8.3 solves the problem?

mamba create -y -n gtotree -c astrobiomike -c conda-forge -c bioconda -c defaults gtotree==1.8.3

And let me know what happens?

MicroPat007 commented 7 months ago

Hi Mike, I had the same issue and could solve it be installing the latest version of GToTree... at least I dont get an error message now when doing gtt-get-accessions-from-GTDB However, when trying to run a tree now with the new rep-accs.txt file, none of the accessions are being found resulting in a tree only with "my" MAGs. Any idea if this has to do with the same underlying problem?

Thanks a lot for all your help, hope you are well!

Pat

AstrobioMike commented 7 months ago

Hey there, Pat :)

Thanks for confirming the above!

And sorry it's giving you trouble! But na, I don't think that'd be related. Once we have the accessions list it is getting them from NCBI. I just tried and can't recreate the problem. Maybe you can pass me the accessions file that gave you trouble and I can try with the same exact file? You can email it to me if you'd prefer MikeLeebmsisorg

Hope all is well in your world!

gdangelo8 commented 6 months ago

Thanks for the response! I think my issue was in trying to update the existing installation, I ended up uninstalling my conda environment completely and running the line you suggested and it all runs fine with the newest version now. Thank you :)

AstrobioMike commented 6 months ago

Ah great, thanks!