AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
192 stars 25 forks source link

Test run error #67

Closed asuria25 closed 1 year ago

asuria25 commented 1 year ago

Hi, I just installed GToTree following the Conda quickstart instructions and had errors with the test run script. I am using an iMac with macOS Mojave v10.14.5 (Intel Core i5 processor) and Miniconda2. Any help getting this running would be greatly appreciated!

I have attached the output from gtt-test.sh. The first error I get is:

Error: File format problem in trying to open HMM file 1665671572.gtotree.tmpdir/all_pfam_targets.hmm.
File exists, but appears to be empty?

cat: 1665671572.gtotree.tmpdir/GCA_900473895.1_N32_genomic_hit_counts.tmp: No such file or directory

When I tried to visualize the tree in iTOL, I got an error that something was wrong with the tree file (there were zero branch lengths). This is the Aligned_SCGs_mod_names.faa file:

>Bacteroides_fragilis_YCH46
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCA_000012825.1_ASM1282v1_genomic
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>Alteromonas_macleodii_SPECIAL
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>ROOT
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_900473895.1_Bacteria_Cyanobacteria_Cyanobacteriia_Synechococcus_E_sp002724845
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_000012505.1_Bacteria_Cyanobacteria_Cyanobacteriia_Synechococcus_E_sp000012505
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_000013045.1_Bacteria_Bacteroidota_Rhodothermia_Salinibacter_ruber
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_000020585.3_ASM2058v3_protein
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_000153765.1_Bacteria_Proteobacteria_Zetaproteobacteria_Mariprofundus_ferrooxydans
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>SPECIAL
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_001886455.1_ASM188645v1_protein
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-
>GCF_900162675.1_Bacteria_Proteobacteria_Gammaproteobacteria_Halospina_utahensis_A
-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-XXXXX-

20221012_gtotree_test_run.txt

AstrobioMike commented 1 year ago

Hi there, @asuria25 :)

Sorry you're having trouble, thanks for writing in!

I know at least for the first problem, the pfam problem, that's related to the pfam site changing very recently as it is being passed off to interpro to host. I have that fixed in later versions of GToTree (it's up to 1.7.05 at this point), but your installed version is 1.6.34, so it makes sense that part's failing at least.

I can't seem to reproduce the empty tree file though, and I unfortunately haven't yet seen those types of errors in the log file (thanks for attaching that by the way!).

Can you try removing that environment and doing the install this way specifically, trying to get it to grab the latest version, so we can first make sure this second problem is still happening on your system

conda env remove -n gtotree

conda create -n gtotree -c conda-forge -c bioconda -c defaults -c astrobiomike gtotree=1.7.05

Once the latest version is installed, can you run the test again and let me know what happens?

If it does do the same thing, in the same location you ran the test (before removing anything), it would probably help me track down what's going if you could re-run it by running this command (just added -d flag to keep most of the intermediate files and renamed the output):

GToTree -a GToTree-test-data/ncbi_accessions.txt \
               -g GToTree-test-data/genbank_files.txt \
               -f GToTree-test-data/fasta_files.txt \
               -A GToTree-test-data/amino_acid_files.txt \
               -m GToTree-test-data/genome_to_id_map.tsv \
               -p GToTree-test-data/pfam_targets.txt \
               -H Universal -t -D -j 4 -d -o GToTree-test-output-with-debug

And then after that finishes, tar or zip up that GToTree-test-output-with-debug/ output directory, and there will be a directory that looks like 1*.gtotree.tmpdir/, and if you could tar or zip up that too and attach it here that would hopefully help me track down what's going on.

asuria25 commented 1 year ago

Hi @AstrobioMike,

Thanks for the quick reply! I removed the old environment and successfully installed the newer GToTree version. The pfam error is gone, but I'm still having the same issue with the alignment. I've attached the debug directory and script output, but the 1665680208.gtotree.tmpdir.tar.gz file is too large to attach here (70Mb). Is there another way I should send the files, or are there specific files from that folder I should attach?

Thanks for taking a look at this! GToTree-test-output-with-debug.tar.gz 20221013_gtotree_test_run2.txt

AstrobioMike commented 1 year ago

thanks, @asuria25!

I should have thought of that, ha. Could you please email it to me a MikeLee@bmsis.org?

AstrobioMike commented 1 year ago

Update on this while closing:

Turns out a different version of muscle was in front of the conda-installed one in the system PATH. I'm not sure why the conda activate gtotree wasn't putting its bin in at the front of the PATH though ¯_(ツ)_/¯