gjeunen / reference_database_creator

creating reference databases for amplicon sequencing
MIT License
27 stars 8 forks source link

Alignment file or phylogenetic tree not generated when running visualization method phylo #29

Closed andremicc closed 1 month ago

andremicc commented 1 year ago

Hello,

I have run into an issue when exploring the option visualization, method phylo. My command was: crabs visualization --method phylo --input test-tax.tsv --level genus --species species.txt --taxid nodes.dmp --name names.dmp test-tax.tsv is the taxonomy-assigned output I received from insilico_pcr, while species.txt includes 1 species of interest I defined. The output of the command is: found 1 species of interest in species.txt: ['Lithophaga_lithophaga'] generating taxonomic lineage for 1 species converting names.dmp to dictionary converting nodes.dmp to dictionary gathering data for 1 species 2 sequences in database that share the genus taxonomic rank with Lithophaga_lithophaga generating phylogenetic tree for Lithophaga_lithophaga Traceback (most recent call last): File "/home/andrea/CRABS/crabs", line 1430, in main() File "/home/andrea/CRABS/crabs", line 1427, in main args.func(args) File "/home/andrea/CRABS/crabs", line 946, in visualization muscle_cline() File "/home/andrea/miniconda3/envs/CRABS/lib/python3.6/site-packages/Bio/Application/init.py", line 569, in call raise ApplicationError(return_code, str(self), stdout_str, stderr_str) Bio.Application.ApplicationError: Non-zero return code 1 from 'muscle -in phylo_visualization/Lithophaga_lithophaga/genus/Lithophaga_lithophaga_phylo.fasta -out phylo_visualization/Lithophaga_lithophaga/genus/Lithophaga_lithophaga_align.clw -diags -log phylo_visualization/Lithophaga_lithophaga/genus/Lithophaga_lithophaga_align_log.txt -maxiters 1 -clw', message 'Invalid command line'

The only output that is produced is a "phylo_visualization" containing sequences extracted from the reference database, but no alignment file or phylogenetic tree was generated.

Thanks

gjeunen commented 1 year ago

Hello @andremicc,

Can you please let me know which version of CRABS you are using crabs --version and what platform you used to download the software, i.e., GitHub, conda, Docker?

In the meantime, have you tried running this visualisation with more than 2 sequences? Did the same error occur?

Thanks, Gert-Jan

andremicc commented 1 year ago

Hi @gjeunen , I've downloaded CRABS v. 0.1.5 from the GitHub platform and am running it on a Ubuntu WSL 2 machine.

I have tried aligning and visualzing more than 2 sequences, but still received the same issue. From the "Invalid command line" prompt I suspect MUSCLE is the issue. I'm using the latest MUSCLE version 5.1, which runs with a different code than what is scripted. However, when downgrading to v. 3.8.31 I get the error "segmentation fault".

Hope this helps. Thanks A

gjeunen commented 1 year ago

Hello @andremicc,

Indeed, it seems there is some issue with MUSCLE. Please give me some time to go over their code base to try and solve the issue.

Thanks, Gert-Jan

andremicc commented 1 year ago

Thank you for looking into this.

A

rturba commented 1 year ago

Hi Gert-Jan (sorry, me again! 😅)

I've updated CRABS in my conda env to v.0.1.7 on a HPC Linux v5.4.0-146-generic (buildd@lcy02-amd64-026) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #163-Ubuntu.

I'm trying to run the phylo module using the code:

crabs visualization --method phylo --input 12S-V5_insilico_tax_derep_clean.tsv --level family --species species_list.txt --taxid taxonomy/nodes.dmp --name taxonomy/names.dmp --output 12S-V5_insilico_tax_derep_clean_phylo.pdf

And it's not working, but it's giving me a different error than the one shared here:

gathering data for 34 species Traceback (most recent call last): File "/home/rachel.turba/envs/CRABS/bin/crabs", line 1458, in <module> main() File "/home/rachel.turba/envs/CRABS/bin/crabs", line 1455, in main args.func(args) File "/home/rachel.turba/envs/CRABS/bin/crabs", line 961, in visualization if abort == 'less': UnboundLocalError: local variable 'abort' referenced before assignment

I've also checked the muscle version and is also 5.1. I'm not sure if this error is related to this or not, though.

Thank you so much for this package! :)

gjeunen commented 1 year ago

Hello Rachel,

Apologies for the errors you are encountering!

This one is a bit strange. The error is not the same as mentioned before. It seems that a variable, that should be assigned in a previous step within the code, is not assigned in your case. Hence, the UnboundLocalError. It shouldn't matter, but can you leave out the --output parameter in your code and use something like below?

crabs visualization --method phylo --input input.tsv --level family --species species.txt --taxid nodes.dmp --name names.dmp

Given the issues encountered with this function, I'll try and rework the code to make it more flexible and hopefully less error prone. This might take a while though.

Best regards, Gert-Jan

rturba commented 1 year ago

I've left the --output option out, but I'm still seeing the same error. Also, I'm not sure if this would help in my case, since I'm working on a cluster and I'm having trouble getting the visualization window to work. I'll keep looking into that with the tech support and I'll get back to you if I figure it out :)

gjeunen commented 1 year ago

This function shouldn't output a new window, but instead save the figures immediately without the output parameter. Are you using the same output file as in your other query? The structure of the file might be causing the issue.

Best regards, Gert-Jan

rturba commented 1 year ago

Ah, I see! Yeah, it's the same input file .tsv. OK, I guess I'll look into that and let you know of any updates.

gjeunen commented 1 year ago

Please see the other issue's response for a correctly formatted document, which you can try running for this issue as well.

rturba commented 1 year ago

Hi Gert-Jan,

Thank you so much for being so responsive with these issues, and sorry for not going away 😅 But after fixing my database and all, I'm still running into this abort issue at the phylo step. Though it seems to run a bit longer it stills create an empty folder inside phylo_visualization. I'm sending attached my .log file.

crabs_phylo.log

All the other visualization steps ran without issues.

Thanks, Rachel

gjeunen commented 1 year ago

Hello Rachel,

I might need to rework this function, as others have had issues with this visualisation as well. Would it be essential for your work to get this going right now or would it be okay if I completely rewrite the function in a couple of weeks? Apologies for the inconvenience this may cause.

Best, Gert-Jan

rturba commented 1 year ago

oh, my! please, take your time with this. i already have a lot to work with and i can try to find a way to do this for at least a few species of interest. if there is anything i can help with troubleshooting this, let me know! 👍

gjeunen commented 1 month ago

Hello @rturba and @andremicc,

The latest version of CRABS (version 1.0.0) uses clustalw2 and FastTree for building the phylogenetic trees, rather than Muscle. This should resolve the issue, but please reopen the thread if the issue persists.

Best wishes, Gert-Jan