Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
121 stars 4 forks source link

failed to execute: /usr/bin//etraining #45

Closed jyw-atgithub closed 6 months ago

jyw-atgithub commented 7 months ago

Dear @KatharinaHoff , I tried the latest singularity of GALBA on two different servers. However, I encountered the same error.

ERROR in file /opt/GALBA/scripts/galba.pl at line 4801
failed to execute: /usr/bin//etraining --species=Phytichthys_chirus --CRF=1 --AUGUSTUS_CONFIG_PATH=/home/jenyuw/.augustus /home/jenyuw/Fish-project/result/annotation/train.gb.train 1>/home/jenyuw/Fish-project/result/annotation/crftraining.stdout 2>/home/jenyuw/Fish-project/result/annotation/errors/crftraining.stderr

The last message in GALBA.log is

# Mon Jan 22 00:53:15 2024: The accuracy in round third is 0.753
# Mon Jan 22 00:53:15 2024: Third etraining - now with CRF
/usr/bin//etraining --species=Phytichthys_chirus --CRF=1 --AUGUSTUS_CONFIG_PATH=/home/jenyuw/.augustus /home/jenyuw/Fish-project/result/annotation/train.gb.train 1>/home/jenyuw/Fish-project/result/annotation/crftraining.stdout 2>/home/jenyuw/Fish-project/result/annotation/errors/crftraining.stderr

The environment is: singularity-ce version 3.11.0-jammy, GALBA v1.0.11, Ubuntu 22.04.2 LTS May I know how to resolve this? Thank you!

jyw-atgithub commented 7 months ago

Hi @KatharinaHoff For more information, here are the last few lines of the crftraining.stderr file. On one machine:


gene 4574_XP_040020291.1 transcr. 1 in sequence ntLink_27_15031492-15045686: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
gene 4574_XP_040020291.1 transcr. 1 in sequence ntLink_27_15031492-15045686: in-frame stop codon
gene 4574_XP_040020291.1 transcr. 1 in sequence ntLink_27_15031492-15045686: in-frame stop codon
Segmentation fault (core dumped)

On the other machine:

gene 4574_XP_040020291.1 transcr. 1 in sequence ntLink_27_15031492-15045686: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
gene 4574_XP_040020291.1 transcr. 1 in sequence ntLink_27_15031492-15045686: in-frame stop codon

/usr/bin//etraining: ERROR
        FeatureCollection::esource: invalid source key: RM
KatharinaHoff commented 6 months ago

I pushed a new docker container, today. We used to use the Debian augustus package in our containers. I now changed the GALBA container to build the latest etraining from github, and that should cause fewer problems. I hope this will solve your problem, too. I tested it with the GALBA test data (adding the --CRF=1 flag to the test1 script), and it worked. (There is a small bug in GALBA that I now saw and will fix, soon, but that is only about parsing the accuracy results, not about running etraining with CRF.)

If the problem persists, please open an issue in the Augustus repository with a small data set to reproduce the etraining crash. That would be something that I cannot fix in GALBA.

This is the GALBA bug in CRF mode:

Use of uninitialized value $target_3 in numeric gt (>) at /opt/GALBA/scripts/galba.pl line 4811, <AUGOUT> line 11110.
Use of uninitialized value $target_3 in numeric gt (>) at /opt/GALBA/scripts/galba.pl line 4845, <AUGOUT> line 11110.