AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
192 stars 25 forks source link

Is this gzip step necessary? #52

Closed joshuakirsch closed 2 years ago

joshuakirsch commented 2 years ago

Hi,

Thanks for creating this great tool! Because of the way our linux system is setup, gzip is frustrating to run without sudo privileges and even then, it's inconsistent. I am able to run the GToTree pipeline but on each genome I get these "warnings":

curl: /usr/local/MATLAB/MATLAB_Runtime/v911/bin/glnxa64/libldap_r-2.4.so.2: no version information available (required by /usr/lib/x86_64-linux-gnu/libcurl.so.4)
gzip: 1648060085.gtotree.tmpdir/GCF_000007785.1_genes2.tmp: Operation not permitted
gzip: 1648060085.gtotree.tmpdir/GCF_000007785.1_genes2.tmp: Operation not permitted
      Performing HMM search...
        Found 118 of the targeted 119 genes.
        Est. % comp: 99.16; Est. % redund: 1.68

It seems like curl is working correctly but gzip doesn't like to work in this step. Is this step critical for the pipeline? I can also download the genomes and unzip them separately from the pipeline, but I'd prefer to be able to use the full pipeline.

Thanks, Josh

joshuakirsch commented 2 years ago

I ran a smaller batch of my genomes and it seemed to work, as I recovered a tree I think is accurate.

AstrobioMike commented 2 years ago

Hey there, Josh :)

Sorry for the odd gzip issue. I'm confused as to why it's saying it's not permitted, but it is still getting decompressed somehow and moving forward ¯_(ツ)_/¯

Unfortunately, it is essential if we are downloading from NCBI, because all of their stuff is gzipped and needs to be decompressed for many of the other programs. So I don't think there is anything I can do on GToTree's end.

If you do end up needing to download them separately, to decompress them yourself and then provide them as local fasta inputs to GToTree, I have the download code from GToTree as a standalone program in my bit package. You can conda install that, then just give the bit-dl-ncbi-assemblies program the same accession list you were giving GToTree to download them.

Sorry i don't have any better help to suggest!

joshuakirsch commented 2 years ago

Hi, thanks for the supportive words. I tested this more this morning and found that even though gzip gives this error, the file still is extracted. Other extracting programs (gunzip, tar, etc) have issues giving utime, so this might be where the error arises from. The rest of the pipeline finished and no errors were raised. Thanks again!