EBI-Metagenomics / EukCC

Tool to estimate genome quality of microbial eukaryotes
GNU General Public License v3.0
35 stars 9 forks source link

stop codon (*) causes failure #14

Open halexand opened 3 years ago

halexand commented 3 years ago

Hi, to start I have really enjoyed eukcc, so thank you!

I have recently discovered that if my protein files include stop codon marks (*) my job fails with the following error message:

01/02/2021 11:31:54:  Launching EukCC in debug mode
01/02/2021 11:31:54:  Starting EukCC
01/02/2021 11:31:54:  Set pplacer cores to the same as all others (8)
01/02/2021 11:31:54:  Testing if I need to run this step
01/02/2021 11:31:54:  Preparing pplacer
01/02/2021 11:31:54:  Testing if I need to run this step
01/02/2021 11:31:54:  Need to run because of file: test/workfiles/pplacer/placement.jplace
01/02/2021 11:31:54:  Preparing alignments
01/02/2021 11:32:04:  Placing proteins in tree
Uncaught exception: Failure("* is not a known base in PTHR21068_SPO-SPSG-SRF-5-20-00_k119_17585037_gene11356")
Fatal error: exception Failure("* is not a known base in PTHR21068_SPO-SPSG-SRF-5-20-00_k119_17585037_gene11356")
an error occured while executing pplacer
01/02/2021 11:32:28:  Pplacer could not finish. Exiting now
01/02/2021 11:32:28:  No estimates were written

If I sed remove all the * eukcc works fine.

I guess, it would be nice if eukcc could handle stop codons being include in .faa files. That being said, this is more a request for an improvement rather than an immediate bug... I am just going to remove all the *s for now.

Thanks!

openpaul commented 3 years ago

Thank you for reporting this. I addressed this in the documentation, but will leave the bug open until EukCC removed this stop codon itself.

For now I assume removing them ahead of submission is fine and should not break any workflow.