Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
352 stars 79 forks source link

BRAKER fails with GeneMark-ES #385

Open stefankusch opened 3 years ago

stefankusch commented 3 years ago

Hi, I am at a loss. I've been trying for some time to get BRAKER2 to run (downloaded and installed latest version). I followed the installation instructions at https://github.com/Gaius-Augustus/BRAKER, I've set the environment variables and paths as needed. Since I do not have admin rights, I've set the environment through conda as instructed. GeneMark gmes_petap.pl keeps failing, with the error

ERROR in file /home/sk893857/utilities/BRAKER/scripts/braker.pl at line 6666
Failed to execute: perl /home/sk893857/utilities/gmes_linux_64/gmes_petap.pl --verbose --sequence=/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genome.fa --ET=/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genemark_hintsfile.gff --cores=8 --gc_donor 0.001 --fungus 1>/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/GeneMark-ET.stdout 2>/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/errors/GeneMark-ET.stderr
Failed to execute: perl /home/sk893857/utilities/gmes_linux_64/gmes_petap.pl --verbose --sequence=/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genome.fa --ET=/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genemark_hintsfile.gff --cores=8 --gc_donor 0.001 --fungus 1>/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/GeneMark-ET.stdout 2>/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/errors/GeneMark-ET.stderr
The most common problem is an expired or not present file ~/.gm_key!

The .gm_key is valid (newly-downloaded, I've downloaded and unpacked Genemark-EX, GeneMark-ES Suite version 4.65_lic). Running check_install.bash in the conda env confirms that everything is in order:

(braker) sk893857@x86_64-conda_cos6-linux-gnu:~/utilities/gmes_linux_64[598]$ check_install.bash
Checking GeneMark-ES installation
Checking Perl setup
All required Perl modules were found
Checking GeneMark.hmm setup
GeneMark.hmm was found
GeneMark.hmm is set
GeneMark.hmm is executable
Performing GeneMark.hmm test run
All required components for GeneMark-ES were found

Looking further, the braker.log ends with this error:

# Fri Jun  4 11:57:51 2021: Executing gmes_petap.pl
perl /home/sk893857/utilities/gmes_linux_64/gmes_petap.pl --verbose --sequence=/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genome.fa --ET=/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genemark_hintsfile.gff --cores=8 --gc_donor 0.001 --fungus 1>/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/GeneMark-ET.stdout 2>/rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/errors/GeneMark-ET.stderr

GeneMark-ET.stderr: points to reformat_gff.pl

Can't exec "/rwthfs/rz/cluster/home/sk893857/utilities/gmes_linux_64/reformat_gff.pl": Permission denied at /home/sk893857/utilities/gmes_linux_64/gmes_petap.pl line 1473.

GeneMark-ET.stdout:

check before run
create directories
commit input data
soft_mask is in the 'auto' mode. soft_mask was set to: 1000
error on call: /rwthfs/rz/cluster/home/sk893857/utilities/gmes_linux_64/reformat_gff.pl --out data/et.gff  --trace info/dna.trace  --in /rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genemark_hintsfile.gff  --quiet

The permission denied error makes no sense to me whatsoever, not only is the GENEMARK_PATH set and pointing to the gmes_linux_64 directory where gmes_petap.pl resides, but gmes_petap.pl also works fine outside of BRAKER2. I've even used chmod to give global read and execute permissions chmod 755 *.py *.pl (both in the braker directory as instructed, I also tried this in the gmes_linux_64 directory, to no avail). Calling reformat.pl separately gives no error:

Usage: reformat_gff.pl --out [filename] --trace [filename] --in [filename] 
version 1.5
  --in     [filename] input
  --out    [filename] output
  --trace  [filename] read new sequence ID from this file
  --back   change order (new-old) in trace file 
  --quiet  no warning messages
  --v      verbose

Also, all the perl env and paths are correctly assigned for all Genemark scripts, where I used perl change_path_in_perl_scripts.pl '/home/sk893857/miniconda3/envs/braker/bin perl', successfully changing the perl path to the conda env where all the required packages are.

I've run the code as follows (with conda braker environment activated and my binaries in ~/miniconda3/envs/braker/bin):

export PATH=$PATH:/home/sk893857/utilities/BRAKER/scripts
export AUGUSTUS_BIN_PATH=/home/sk893857/miniconda3/envs/braker/bin
export AUGUSTUS_CONFIG_PATH=/home/sk893857/miniconda3/envs/braker/config
export AUGUSTUS_SCRIPTS_PATH=/home/sk893857/miniconda3/envs/braker/bin
export GENEMARK_PATH=/home/sk893857/utilities/gmes_linux_64
export PYTHON3_PATH=/home/sk893857/miniconda3/envs/braker/bin
export PROTHINT_PATH=/home/sk893857/utilities/ProtHint/bin
# run script
perl ~/utilities/BRAKER/scripts/braker.pl --cores=8 --fungus --BAMTOOLS_PATH=/home/sk893857/miniconda3/envs/braker/bin --SAMTOOLS_PATH=/home/sk893857/miniconda3/envs/braker/bin --CDBTOOLS_PATH=/home/sk893857/miniconda3/envs/braker/bin --overwrite --useexisting --species=Ssclerotiorum --genome=/home/sk893857/genomes/sckit/genome_resolved/RepeatMasker/Sckit_Skita01_v1.fa.masked --bam=/hpcwork/sk893857/RNA-seq/mapped/AOPH-15.aligned.bam --workingdir=/home/sk893857/genomes/sckit/genome_resolved/braker

I'm running my scripts on the university HPC with OS LINUX, CENTOS 7.9.

stefankusch commented 3 years ago

The problem persists. Is there any solution? Anything I overlooked? Any help appreciated!

tomasbruna commented 3 years ago

Hi @stefankusch,

sorry for the late response. It seems that the problem is somehow related to the conda environment, but I do not see what is wrong. All your steps look good.

The easiest way to proceed might be a workaround. Since GeneMark somehow works outside of BRAKER2, please try running GeneMark-ET separately, using the following command line:

gmes_petap.pl --verbose --sequence=genome.fa --ET=genemark_hintsfile.gff --cores=8 --gc_donor 0.001 --fungus

The genemark_hintsfile.gff should be present in the folder with your failed BRAKER run (in /rwthfs/rz/cluster/home/sk893857/genomes/sckit/genome_resolved/braker/genemark_hintsfile.gff for example).

Once GeneMark finishes, you can pass the genemark.gtf output to BRAKER with --geneMarkGtf=genemark.gtf option. This is illustrated in test1_restart2.sh.

Please let me know whether this worked. If not, I'll try to help further.

Best, Tomas

norfarhan commented 3 years ago

Hi @tomasbruna, I am encountering the same problem as @stefankusch. everything works perfectly fine when executed separately. I tried running GeneMark-ET separately as suggested, however, it still comes back with execution error on reformat.pl:

Can't exec "/storage/rahmada/farhan/braker-augustus/gmes_linux_64/reformat_gff.pl": Permission denied at /storage/rahmada/farhan/braker-augustus/gmes_linux_64/gmes_petap.pl line 1881. error on call: /storage/rahmada/farhan/braker-augustus/gmes_linux_64/reformat_gff.pl --out data/et.gff --trace info/dna.trace --in /storage/rahmada/farhan/braker-augustus/example/braker/genemark_hintsfile.gff --quiet

mlcossette9224 commented 3 years ago

I am having the same issue, did anyone figure out how to make it work?

norfarhan commented 3 years ago

Hi @mlcossette9224, Yes, I did. I modified the shebang of all GeneMark scripts to my running perl path. As in the userguide:

cd gm_et_linux_64/gmes_petap/ for f in bet_to_gff.pl bp_seq_select.pl build_mod.pl calc_introns_from_gtf.pl \ change_path_in_perl_scripts.pl gc_distr.pl get_sequence_from_GTF.pl \ gmes_petap.pl histogram.pl hmm_to_gtf.pl make_nt_freq_mat.pl \ parse_by_introns.pl parse_ET.pl parse_gibbs.pl parse_set.pl predict_genes.pl \ reformat_fasta.pl reformat_gff.pl rescale_gff.pl rnaseq_introns_to_gff.pl \ run_es.pl run_hmm_pbs.pl scan_for_bp.pl star_to_gff.pl verify_evidence_gmhmm.pl; do cat $f | perl -pe 's/\/usr\/bin\/perl/\/usr\/bin\/env perl/' > $f.tmp mv $f.tmp $f chmod u+x $f done

hope this works for you.

tomasbruna commented 3 years ago

Another way (also taken from the BRAKER readme) is to run

perl change_path_in_perl_scripts.pl "/usr/bin/env perl"

From the GeneMark folder

yzliu01 commented 5 months ago

hi, as in 2024 I also get such an issue. I appreciate any help with this issue.

I installed Braker3 with Conda and ran this line of code: perl change_path_in_perl_scripts.pl "/usr/bin/env perl". Strangely, it did not work even though the gmetp_linux_64/bin/gmes was added to the environment PATH. perl change_path_in_perl_scripts.pl "/usr/bin/env perl" Can't open perl script "change_path_in_perl_scripts.pl": No such file or directory

I checked all the perl scripts in gmes and gmst folders have already had shebang #!/usr/bin/env perl

I also specified the absolute path to change_path_in_perl_scripts.pl. It gave another problem. perl /home/sofwtare/gmetp_linux_64/bin/gmes/change_path_in_perl_scripts.pl "/usr/bin/env perl" error, required file not found: bed_to_gff.pl

Before I ran below codes the Braker3 env was activated. I have braker3-3.0.8 and augustus-3.5.0.

braker.pl --genome="$Andrena_marginata_softmask_simple_header_genome" --hints="$prothint_augustus_gff" \
∙     --workingdir=$braker_output_dir --threads 4 --PROTHINT_PATH=/home/user/sofwtare/gmetp_linux_64/bin/gmes/ProtHint/bin --GENEMARK_PATH=/home/user/sofwtare/gmetp_linux_64/bin
# Wed Apr 17 23:44:41 2024: Log information is stored in file /proj_data/gene_annotation/braker_results/braker.log
#*********
# WARNING: Detected | in fasta header of file /home/user/data/ref_genome/Andrena_marginata_GCA_963932335.1-softmasked.simple_header.fa. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
ERROR in file /home/yzliu/miniforge3/envs/braker3/bin/braker.pl at line 5414
Failed to execute: /home/yzliu/miniforge3/envs/braker3/bin/perl /home/user/sofwtare/gmetp_linux_64/bin/gmes/gmes_petap.pl --verbose --seq /proj_data/gene_annotation/braker_results/genome.fa --EP /proj_data/gene_annotation/braker_results/genemark_hintsfile.gff --cores=4  --gc_donor 0.001 --evidence /proj_data/gene_annotation/braker_results/genemark_evidence.gff  --soft_mask auto 1>/proj_data/gene_annotation/braker_results/GeneMark-EP.stdout 2>/proj_data/gene_annotation/braker_results/errors/GeneMark-EP.stderr
Failed to execute: /home/yzliu/miniforge3/envs/braker3/bin/perl /home/user/sofwtare/gmetp_linux_64/bin/gmes/gmes_petap.pl --verbose --seq /proj_data/gene_annotation/braker_results/genome.fa --EP /proj_data/gene_annotation/braker_results/genemark_hintsfile.gff --cores=4  --gc_donor 0.001 --evidence /proj_data/gene_annotation/braker_results/genemark_evidence.gff  --soft_mask auto 1>/proj_data/gene_annotation/braker_results/GeneMark-EP.stdout 2>/proj_data/gene_annotation/braker_results/errors/GeneMark-EP.stderr !

The lines 5411-5415 are as below and don't know what the problem is.

            system("$perlCmdString") == 0
                or clean_abort("$AUGUSTUS_CONFIG_PATH/species/$species",
                    $useexisting, "ERROR in file " . __FILE__ ." at line "
                    . __LINE__ ."\nFailed to execute: $perlCmdString\n"
                    . "Failed to execute: $perlCmdString !\n");

I also ran the gmes_petap.pl separately as the above suggested I got other errors. gmes_petap.pl --verbose --seq /proj_data/gene_annotation/braker_results/genome.fa --EP /proj_data/gene_annotation/braker_results/genemark_hintsfile.gff --cores=4 --gc_donor 0.001

# check before the run
# hard_mask is in the 'auto' mode. hard_mask was set to: 100
# creat directories
# commit input data
error, output file is empty data/ep.gff
error on call: /home/user/sofwtare/gmetp_linux_64/bin/gmes/reformat_gff.pl --out data/ep.gff  --trace info/dna.trace  --in /proj_data/gene_annotation/braker_results/genemark_hintsfile.gff  --quiet
yzliu01 commented 5 months ago

I have run my data successfully after the header lines of genome fasta file are simplified, e.g. replacing "|" with "_". >ENA|OZ010661|OZ010661.1 to>ENA_OZ010661_OZ010661.1

See the warning in the output,

_# WARNING: Detected | in fasta header of file /home/user/data/ref_genome/Andrena_marginata_GCA_963932335.1-softmasked.simple_header.fa. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!

*****_

I did not understand why the GenaMaker program has generated a new file without " " and "|", but the subsequent run cannot be continued with the new file it created? Or the pipeline kept using the old fasta file?