Braker3/GeneMark-ETP: file not found: complete.gtf, complete.id, complete_uniq.gtf

JohnUrban commented 1 year ago

Hello,

Thank you for all the great tools coming from this team.

I gave Braker3 a shot, but am running into an error at the moment. I will report below how I installed Braker3, and how I used it in case it helps reproduce the error.

I would be grateful for any guidance you can provide, and am eager to get Braker3 working at some point in the near future, but fully understand that you are busy. I am mainly reporting this issue in case it helps your development.

First, here was the command used.

braker.pl --genome=${ASM} --UTR=on --stranded=+,- --bam=${FWD},${REV} --prot_seq=${PROTEINS} --workingdir=braker3 --threads=16

Second, here are the errors as reported.

This was reported to stdout/stderr.

# Fri Feb 24 08:59:35 2023: Creating directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3.
# Fri Feb 24 08:59:35 2023:Both protein and RNA-Seq libraries in input detected. BRAKER will be executed in ETP mode.
#*********
# Fri Feb 24 08:59:38 2023: Log information is stored in file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/braker.log
#*********
# WARNING: Detected whitespace in fasta header of file /central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
ERROR in file /home/jurban/software/braker2/braker3/BRAKER/scripts/braker.pl at line 5486
Failed to execute: /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl --cfg /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_config.yaml --workdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP --bam /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/ --cores 16 --softmask 1>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stdout 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stderr
Failed to execute: /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl --cfg /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_config.yaml --workdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP --bam /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/ --cores 16 --softmask 1>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stdout 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stderr
The most common problem is an expired or not present file ~/.gm_key!

This is from braker.log


#**********************************************************************************
#                               BRAKER CONFIGURATION                               
#**********************************************************************************
# BRAKER CALL: /home/jurban/software/braker2/braker3/BRAKER/scripts/braker.pl --genome=/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/longest.fa.masked --UTR=on --stranded=+,- --bam=/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/forward.bam,/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/reverse.bam --prot_seq=/central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta --workingdir=braker3 --threads=16
# Fri Feb 24 08:59:35 2023: braker.pl version 3.0.0
# Fri Feb 24 08:59:35 2023: Creating directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3.
# Fri Feb 24 08:59:35 2023: Creating directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3.
# Fri Feb 24 08:59:35 2023:Both protein and RNA-Seq libraries in input detected. BRAKER will be executed in ETP mode.
#*********
# Fri Feb 24 08:59:35 2023: Configuring of BRAKER for using external tools...
# Fri Feb 24 08:59:35 2023: Trying to set $AUGUSTUS_CONFIG_PATH...
# Fri Feb 24 08:59:35 2023: Found environment variable $AUGUSTUS_CONFIG_PATH.
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config/ as potential path for $AUGUSTUS_CONFIG_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $AUGUSTUS_CONFIG_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config/!
# Fri Feb 24 08:59:35 2023: Trying to set $AUGUSTUS_BIN_PATH...
# Fri Feb 24 08:59:35 2023: Found environment variable $AUGUSTUS_BIN_PATH.
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/ as potential path for $AUGUSTUS_BIN_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $AUGUSTUS_BIN_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/!
# Fri Feb 24 08:59:35 2023: Trying to set $AUGUSTUS_SCRIPTS_PATH...
# Fri Feb 24 08:59:35 2023: Found environment variable $AUGUSTUS_SCRIPTS_PATH.
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/ as potential path for $AUGUSTUS_SCRIPTS_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $AUGUSTUS_SCRIPTS_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/!
# Fri Feb 24 08:59:35 2023: Trying to set $PYTHON3_PATH...
# Fri Feb 24 08:59:35 2023: Did not find environment variable $PYTHON3_PATH.
# Fri Feb 24 08:59:35 2023: Trying to guess PYTHON3_PATH from location of python3 executable that is available in your $PATH
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $PYTHON3_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $PYTHON3_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:35 2023: Trying to set $JAVA_PATH...
# Fri Feb 24 08:59:35 2023: Did not find environment variable $JAVA_PATH.
# Fri Feb 24 08:59:35 2023: Trying to guess JAVA_PATH from location of java executable that is available in your $PATH
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $JAVA_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $JAVA_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $GUSHR_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $GUSHR_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess GUSHR_PATH from location of gushr.py executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $GUSHR_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $GUSHR_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $GENEMARK_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $GENEMARK_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess GENEMARK_PATH from location of gmetp.pl executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin as potential path for $GENEMARK_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $GENEMARK_PATH to /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $BAMTOOLS_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $BAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess BAMTOOLS_PATH from location of bamtools executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $BAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $BAMTOOLS_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $SAMTOOLS_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $SAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess SAMTOOLS_PATH from location of samtools executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools as potential path for $SAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $SAMTOOLS_PATH to /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools!
# Fri Feb 24 08:59:36 2023: Trying to set $DIAMOND_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $DIAMOND_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess DIAMOND_PATH from location of diamond executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools as potential path for $DIAMOND_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $DIAMOND_PATH to /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools!
# Fri Feb 24 08:59:36 2023: Trying to set $PROTHINT_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $PROTHINT_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess PROTHINT_PATH from location of prothint.py executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/deps/prothint/ProtHint-2.6.0/bin as potential path for $PROTHINT_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $PROTHINT_PATH to /central/groups/carnegie_poc/jurban/software/braker2/deps/prothint/ProtHint-2.6.0/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $TSEBRA_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $TSEBRA_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess TSEBRA_PATH from location of tsebra.py executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $TSEBRA_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $TSEBRA_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $CDBTOOLS_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $CDBTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess CDBTOOLS_PATH from location of cdbfasta executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $CDBTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $CDBTOOLS_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
#*********
# IMPORTANT INFORMATION: no species for identifying the AUGUSTUS  parameter set that will arise from this BRAKER run was set. BRAKER will create an AUGUSTUS parameter set with name Sp_1. This parameter set can be used for future BRAKER/AUGUSTUS prediction runs for the same species. It is usually not necessary to retrain AUGUSTUS with novel extrinsic data if a high quality parameter set already exists.
#*********
#**********************************************************************************
#                               CREATING DIRECTORY STRUCTURE                       
#**********************************************************************************
# Fri Feb 24 08:59:38 2023: creating file that contains citations for this BRAKER run at /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/what-to-cite.txt...
# Fri Feb 24 08:59:38 2023: create working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP.
mkdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP
# Fri Feb 24 08:59:38 2023: create working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/species
mkdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/species
# Fri Feb 24 08:59:38 2023: create working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors
mkdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors
# Fri Feb 24 08:59:38 2023: changing into working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3
cd /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3
# Fri Feb 24 08:59:38 2023: getting GC content of the genome
/central/groups/carnegie_poc/jurban/software/braker2/braker3/BRAKER/scripts/get_gc_content.py --sequences /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/longest.fa.masked --print_sequence_length 1> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/gc_content.out 2> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/gc_content.stderr
# Fri Feb 24 08:59:40 2023: Creating parameter template files for AUGUSTUS with new_species.pl
# Fri Feb 24 08:59:40 2023: new_species.pl will create parameter files for species Sp_1 in /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config//species/Sp_1
/central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/new_species.pl --species=Sp_1 --AUGUSTUS_CONFIG_PATH=/central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config/ 1> /dev/null 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/new_species.stderr
# Fri Feb 24 08:59:40 2023: check_fasta_headers(): Checking fasta headers of file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/longest.fa.masked
# Fri Feb 24 08:59:40 2023: check_fasta_headers(): Checking fasta headers of file /central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta
# Fri Feb 24 08:59:40 2023: Assuming that this is not a DNA fasta file because other characters than A, T, G, C, N, a, t, g, c, n were contained. If this is supposed to be a DNA fasta file, check the content of your file! If this is supposed to be a protein fasta file, please ignore this message!
#*********
# WARNING: Detected whitespace in fasta header of file /central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
# Fri Feb 24 08:59:44 2023: Assuming that this is not a protein fasta file because other characters than AaRrNnDdCcEeQqGgHhIiLlKkMmFfPpSsTtWwYyVvBbZzJjOoUuXx were contained. If this is supposed to be DNA fasta file, please ignore this message.
#**********************************************************************************
#                               PROCESSING HINTS                                   
#**********************************************************************************
#**********************************************************************************
#                              RUNNING GENEMARK-EX                                 
#**********************************************************************************
# Fri Feb 24 09:00:15 2023: Preparing genemark_evidence file hints from manual hints...
# Fri Feb 24 09:00:15 2023: Running GeneMark-ETP
# Fri Feb 24 09:00:15 2023: changing into GeneMark-ETP directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP
cd /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP
# Fri Feb 24 09:00:16 2023: sorting RNA-Seq BAM files
samtools sort /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/forward.bam -o /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/forward.bam 1> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.forward.stdout 2> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.forward.stderr
samtools sort /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/reverse.bam -o /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/reverse.bam 1> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.reverse.stdout 2> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.reverse.stderr
# Fri Feb 24 09:00:32 2023: Running gmetp.pl
/central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl --cfg /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_config.yaml --workdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP --bam /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/ --cores 16 --softmask 1>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stdout 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stderr


------------------------------

>> **This is from GeneMark-ETP.stderr.**

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created. error, file not found: option --f1 complete.gtf error on open file complete.id: No such file or directory mv: cannot stat ‘complete_uniq.gtf’: No such file or directory error on open file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf: No such file or directory error on create_regions.pl at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl line 2162.


------------------------------

------------------------------

**Third, here is how I installed it.**

------------------------------

> First, I installed dependencies with Mamba (conda) using a YML file.

mamba env create -f braker3-deps.yml

I will copy/paste the `braker3-deps.yml` at the very bottom.

> Second, I installed GeneMark-ETP via git clone.

git clone https://github.com/gatech-genemark/GeneMark-ETP.git


>Third, I cloned BRAKER and checked out the braker3 branch.

git clone https://github.com/Gaius-Augustus/BRAKER.git cd BRAKER git checkout braker3


> Fourth, the run evironment is set by:

conda activate braker3-deps2 export PATH=${BRAKER3}:${GENEMARK_ETP_BIN}:${GENEMARK_ETP_TOOLS}:${PROTHINT2}:${PATH}


---------------------------------------
---------------------------------------
---------------------------------------
**YML File**

name: braker3-deps2 channels:

eumetsat
conda-forge
bioconda
defaults dependencies:
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=2_gnu
alsa-lib=1.2.7.2=h166bdaf_0
augustus=3.4.0=pl5262h5a9fe7b_2
bamtools=2.5.1=hd03093a_10
bedtools=2.30.0=h468198e_3
biopython=1.81=py310h1fa729e_0
blast=2.13.0=hf3cf87c_0
boost-cpp=1.74.0=h75c5d50_8
braker2=2.1.6=hdfd78af_5
bzip2=1.0.8=h7f98852_4
c-ares=1.18.1=h7f98852_0
ca-certificates=2022.12.7=ha878542_0
cairo=1.16.0=ha61ee94_1014
cdbtools=0.99=hd03093a_7
curl=7.87.0=h6312ad2_0
diamond=2.1.3=hb97b32f_0
entrez-direct=16.2=he881be0_1
exonerate=2.4.0=h09da616_5
expat=2.5.0=h27087fc_0
font-ttf-dejavu-sans-mono=2.37=hab24e00_0
font-ttf-inconsolata=3.000=h77eed37_0
font-ttf-source-code-pro=2.038=h77eed37_0
font-ttf-ubuntu=0.83=hab24e00_0
fontconfig=2.14.2=h14ed4e7_0
fonts-conda-ecosystem=1=0
fonts-conda-forge=1=0
freetype=2.12.1=hca18f0e_1
gawk=5.1.0=h7f98852_0
gemoma=1.6.4=hdfd78af_1
genomethreader=1.7.1=h87f3376_4
gettext=0.21.1=h27087fc_0
gffread=0.12.7=hd03093a_1
giflib=5.2.1=h36c2ea0_2
glib=2.74.1=h6239696_1
glib-tools=2.74.1=h6239696_1
gmp=6.2.1=h58526e2_0
graphite2=1.3.13=h58526e2_1001
gsl=2.6=he838d99_2
harfbuzz=5.3.0=h418a68e_0
hisat2=2.2.1=h87f3376_4
htslib=1.12=h9093b5e_1
icu=70.1=h27087fc_0
jbig=2.1=h7f98852_2003
jpeg=9e=h0b41bf4_3
keyutils=1.6.1=h166bdaf_0
krb5=1.20.1=hf9c8cef_0
lcms2=2.12=hddcbb42_0
ld_impl_linux-64=2.40=h41732ed_0
lerc=2.2.1=h9c3ff4c_0
libblas=3.9.0=16_linux64_openblas
libcblas=3.9.0=16_linux64_openblas
libcups=2.3.3=h36d4200_3
libcurl=7.87.0=h6312ad2_0
libdeflate=1.7=h7f98852_5
libedit=3.1.20191231=he28a2e2_2
libev=4.33=h516909a_1
libffi=3.4.2=h7f98852_5
libgcc-ng=12.2.0=h65d4601_19
libgfortran-ng=12.2.0=h69a702a_19
libgfortran5=12.2.0=h337968e_19
libglib=2.74.1=h606061b_1
libgomp=12.2.0=h65d4601_19
libhwloc=2.8.0=h32351e8_1
libiconv=1.17=h166bdaf_0
libidn2=2.3.4=h166bdaf_0
liblapack=3.9.0=16_linux64_openblas
libnghttp2=1.51.0=hdcd2b5c_0
libnsl=2.0.0=h7f98852_0
libopenblas=0.3.21=pthreads_h78a6416_3
libpng=1.6.39=h753d276_0
libssh2=1.10.0=haa6b8db_3
libstdcxx-ng=12.2.0=h46fd767_19
libtiff=4.3.0=hf544144_1
libunistring=0.9.10=h7f98852_0
libuuid=2.32.1=h7f98852_1000
libwebp-base=1.2.4=h166bdaf_0
libxcb=1.13=h7f98852_1004
libxml2=2.9.14=h22db469_4
libzlib=1.2.13=h166bdaf_4
lp_solve=5.5.2.5=h14c3975_1001
makehub=1.0.5=1
metis=5.1.0=h58526e2_1006
mmseqs2=13.45111=h95f258a_1
mpfr=4.1.0=h9202a9a_1
mysql-connector-c=6.1.11=h6eb9d5d_1007
ncbi-vdb=3.0.2=h87f3376_0
ncurses=6.2=h58526e2_4
numpy=1.24.2=py310h8deb116_0
openjdk=8.0.332=h166bdaf_0
openssl=1.1.1t=h0b41bf4_0
ossuuid=1.6.2=hf484d3e_1000
pcre=8.45=h9c3ff4c_0
pcre2=10.40=hc3806b6_0
perl=5.26.2=h36c2ea0_1008
perl-apache-test=1.40=pl526_1
perl-app-cpanminus=1.7044=pl526_1
perl-archive-tar=2.32=pl526_0
perl-base=2.23=pl526_1
perl-business-isbn=3.004=pl526_0
perl-business-isbn-data=20140910.003=pl526_0
perl-carp=1.38=pl526_3
perl-class-data-inheritable=0.08=pl526_1
perl-class-load=0.25=pl526_0
perl-class-load-xs=0.10=pl526h6bb024c_2
perl-class-method-modifiers=2.12=pl526_0
perl-clone-choose=0.010=pl526_0
perl-common-sense=3.74=pl526_2
perl-compress-raw-bzip2=2.087=pl526he1b5a44_0
perl-compress-raw-zlib=2.087=pl526hc9558a2_0
perl-constant=1.33=pl526_1
perl-cpan-meta=2.150010=pl526_0
perl-cpan-meta-requirements=2.140=pl526_0
perl-cpan-meta-yaml=0.018=pl526_0
perl-data-dumper=2.173=pl526_0
perl-data-optlist=0.110=pl526_2
perl-dbi=1.642=pl526_0
perl-devel-globaldestruction=0.14=pl526_0
perl-devel-overloadinfo=0.005=pl526_0
perl-devel-stacktrace=2.04=pl526_0
perl-dist-checkconflicts=0.11=pl526_2
perl-encode=2.88=pl526_1
perl-eval-closure=0.14=pl526h6bb024c_4
perl-exception-class=1.44=pl526_0
perl-exporter=5.72=pl526_1
perl-exporter-tiny=1.002001=pl526_0
perl-extutils-cbuilder=0.280230=pl526_1
perl-extutils-makemaker=7.36=pl526_1
perl-extutils-manifest=1.72=pl526_0
perl-extutils-parsexs=3.35=pl526_0
perl-file-homedir=1.004=pl526_2
perl-file-path=2.16=pl526_0
perl-file-spec=3.48_01=pl526_1
perl-file-temp=0.2304=pl526_2
perl-file-which=1.23=pl526_0
perl-getopt-long=2.50=pl526_1
perl-hash-merge=0.300=pl526_0
perl-inline=0.80=pl526_2
perl-io-compress=2.087=pl526he1b5a44_0
perl-io-zlib=1.10=pl526_2
perl-ipc-cmd=1.02=pl526_0
perl-json=4.02=pl526_0
perl-json-pp=4.04=pl526_0
perl-json-xs=2.34=pl526h6bb024c_3
perl-list-moreutils=0.428=pl526_1
perl-list-moreutils-xs=0.428=pl526_0
perl-list-util=1.38=pl526_1
perl-locale-maketext-simple=0.21=pl526_2
perl-logger-simple=2.0=pl526_0
perl-math-utils=1.13=pl526_0
perl-mce=1.837=pl526_0
perl-mime-base64=3.15=pl526_1
perl-module-build=0.4224=pl526_3
perl-module-corelist=5.20190524=pl526_0
perl-module-implementation=0.09=pl526_2
perl-module-load=0.32=pl526_1
perl-module-load-conditional=0.68=pl526_2
perl-module-metadata=1.000036=pl526_0
perl-module-runtime=0.016=pl526_1
perl-module-runtime-conflicts=0.003=pl526_0
perl-moo=2.003004=pl526_0
perl-moose=2.2011=pl526hf484d3e_1
perl-mro-compat=0.13=pl526_0
perl-object-insideout=4.05=pl526_0
perl-package-deprecationmanager=0.17=pl526_0
perl-package-stash=0.38=pl526hf484d3e_1
perl-package-stash-xs=0.28=pl526hf484d3e_1
perl-parallel-forkmanager=2.02=pl526_0
perl-params-check=0.38=pl526_1
perl-params-util=1.07=pl526h6bb024c_4
perl-parent=0.236=pl526_1
perl-pathtools=3.75=pl526h14c3975_1
perl-perl-ostype=1.010=pl526_1
perl-posix=1.38_03=pl526_1
perl-role-tiny=2.000008=pl526_0
perl-scalar-list-utils=1.52=pl526h516909a_0
perl-scalar-util-numeric=0.40=pl526_1
perl-socket=2.027=pl526_1
perl-storable=3.15=pl526h14c3975_0
perl-sub-exporter=0.987=pl526_2
perl-sub-exporter-progressive=0.001013=pl526_0
perl-sub-identify=0.14=pl526h14c3975_0
perl-sub-install=0.928=pl526_2
perl-sub-name=0.21=pl526_1
perl-sub-quote=2.006003=pl526_1
perl-test-harness=3.42=pl526_0
perl-test-pod=1.52=pl526_0
perl-text-abbrev=1.02=pl526_0
perl-text-parsewords=3.30=pl526_0
perl-time-hires=1.9760=pl526h14c3975_1
perl-try-tiny=0.30=pl526_1
perl-types-serialiser=1.0=pl526_2
perl-uri=1.76=pl526_0
perl-version=0.9924=pl526_0
perl-xml-libxml=2.0132=pl526h7ec2d77_1
perl-xml-namespacesupport=1.12=pl526_0
perl-xml-sax=1.02=pl526_0
perl-xml-sax-base=1.09=pl526_0
perl-xsloader=0.24=pl526_0
perl-yaml=1.29=pl526_0
perl-yaml-xs=0.74=pl526h14c3975_0
pip=23.0.1=pyhd8ed1ab_0
pixman=0.40.0=h36c2ea0_0
pthread-stubs=0.4=h36c2ea0_1001
python=3.10.2=h62f1059_0_cpython
python_abi=3.10=3_cp310
readline=8.1=h46c0cb4_0
samtools=1.12=h9aed4be_1
setuptools=67.4.0=pyhd8ed1ab_0
spaln=2.4.7=pl5262h9a82719_0
sqlite=3.37.0=h9cd32fc_0
sra-tools=3.0.3=h87f3376_0
stringtie=2.2.1=h3198e80_0
suitesparse=5.10.1=h9e50725_1
tar=1.34=hb2e2bae_1
tbb=2021.7.0=h924138e_1
tk=8.6.12=h27826a3_0
tzdata=2022g=h191b570_0
ucsc-bedtobigbed=377=ha8a8165_3
ucsc-fatotwobit=377=ha8a8165_5
ucsc-genepredcheck=377=ha8a8165_3
ucsc-genepredtobed=377=ha8a8165_5
ucsc-genepredtobiggenepred=377=ha8a8165_3
ucsc-gtftogenepred=377=ha8a8165_5
ucsc-hggcpercent=377=ha8a8165_3
ucsc-ixixx=377=ha8a8165_3
ucsc-twobitinfo=377=ha8a8165_3
ucsc-wigtobigwig=377=ha8a8165_3
wget=1.20.3=ha56f1ee_1
wheel=0.38.4=pyhd8ed1ab_0
xorg-fixesproto=5.0=h7f98852_1002
xorg-inputproto=2.3.2=h7f98852_1002
xorg-kbproto=1.0.7=h7f98852_1002
xorg-libice=1.0.10=h7f98852_0
xorg-libsm=1.2.3=hd9c2040_1000
xorg-libx11=1.7.2=h7f98852_0
xorg-libxau=1.0.9=h7f98852_0
xorg-libxdmcp=1.1.3=h7f98852_0
xorg-libxext=1.3.4=h0b41bf4_2
xorg-libxfixes=5.0.3=h7f98852_1004
xorg-libxi=1.7.10=h7f98852_0
xorg-libxrender=0.9.10=h7f98852_1003
xorg-libxtst=1.2.3=h7f98852_1002
xorg-recordproto=1.14.2=h7f98852_1002
xorg-renderproto=0.11.1=h7f98852_1002
xorg-xextproto=7.3.0=h0b41bf4_1003
xorg-xproto=7.0.31=h7f98852_1007
xz=5.2.6=h166bdaf_0
zlib=1.2.13=h166bdaf_4
zstd=1.5.2=h3eb15da_6

NOTE: This conda environment was originally created on a separate system this way:
mamba create -n braker3-deps2 -c bioconda braker2 hisat2 stringtie bedtools sra-tools gffread
conda activate braker3-deps2
mamba install -c eumetsat perl-yaml-xs
mamba install -c conda-forge openjdk=8
And YML file obtained by:
conda env export > braker3-deps2.yml

JohnUrban commented 1 year ago

I am currently debugging.

The problem seems to be in gmetp.pl at this line:

PrepareGenomeTraining($proc) if 1;

(I know this is technically not a Braker problem at this point, but I will keep you updated)

JohnUrban commented 1 year ago

And specifically at this line in PrepareGenomeTraining inside the if statement:

    if ( CreateThis("hc_regions.gtf"))
    {
            ## breaks with the following line
        system( "$bin/create_regions.pl --hcc $hcc_genes --hcp $hcp_genes --out hc_regions.gtf --margin $margin" )
        and die "error on create_regions.pl";
    }

BenAawf commented 1 year ago

Hi, @JohnUrban first, thanks to you. I can now use conda instead of singularity container.

I have the same issue but using a singularity image from here: https://hub.docker.com/r/teambraker/braker3

And this is the command with singularity: singularity exec -B ${PWD}:${PWD} ${BRAKER_SIF} braker.pl --genome=scaffolds_vf.EDTA_RM_masked.fa --prot_seq=Aves_taxid_8782_3044547_prot.fasta --bam=rna_sorted.bam --softmasking --workingdir=run2 \ --GENEMARK_PATH=${ETP} --PROTHINT_PATH=${ETP}/gmes/ProtHint/bin --threads 72

The same issue as mentioned before :

 WARNING: Detected | in fasta header of file /media/ben/Data2TB/test-p/annotation/BRAKER3/Aves_taxid_8782_3044547_prot.fasta. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
ERROR in file /opt/BRAKER/scripts/braker.pl at line 5484
Failed to execute: /usr/bin/perl /media/ben/Data2TB/test-p/annotation/BRAKER3/GeneMark-ETP/bin/etp_release.pl --cfg /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_config.yaml --workdir /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP --bam /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_data/ --cores 72 --softmask 1>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stdout 2>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stderr
Failed to execute: /usr/bin/perl /media/ben/Data2TB/test-p/annotation/BRAKER3/GeneMark-ETP/bin/etp_release.pl --cfg /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_config.yaml --workdir /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP --bam /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_data/ --cores 72 --softmask 1>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stdout 2>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stderr
The most common problem is an expired or not present file ~/.gm_key!

It is clearly related to the GeneMark-ETB LICENCE, which is, in fact not available for use

KatharinaHoff commented 1 year ago

You can download the license for GeneMark-EP from their webserver, and place it in ~/.gm_key .

However, Lars is still working on updating Braker code.

BenAawf @.***> schrieb am Mo. 27. Feb. 2023 um 10:24:

Hi, @JohnUrban https://github.com/JohnUrban first, thank you. I can now use conda instead of singularity container.

I have the same issue but using a singularity image from here: https://hub.docker.com/r/teambraker/braker3

And this is the command with singularity: singularity exec -B ${PWD}:${PWD} ${BRAKER_SIF} braker.pl --genome=scaffolds_vf.EDTA_RM_masked.fa --prot_seq=Aves_taxid_8782_3044547_prot.fasta --bam=rna_sorted.bam --softmasking --workingdir=run2 \ --GENEMARK_PATH=${ETP} --PROTHINT_PATH=${ETP}/gmes/ProtHint/bin --threads 72

The same issue as mentioned before :

WARNING: Detected | in fasta header of file /media/ben/Data2TB/test-p/annotation/BRAKER3/Aves_taxid_8782_3044547_prot.fasta. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!

*****

ERROR in file /opt/BRAKER/scripts/braker.pl at line 5484 Failed to execute: /usr/bin/perl /media/ben/Data2TB/test-p/annotation/BRAKER3/GeneMark-ETP/bin/etp_release.pl --cfg /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_config.yaml --workdir /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP --bam /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_data/ --cores 72 --softmask 1>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stdout 2>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stderr Failed to execute: /usr/bin/perl /media/ben/Data2TB/test-p/annotation/BRAKER3/GeneMark-ETP/bin/etp_release.pl --cfg /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_config.yaml --workdir /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP --bam /media/ben/Data2TB/test-p/annotation/BRAKER3/run2/GeneMark-ETP/etp_data/ --cores 72 --softmask 1>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stdout 2>/media/ben/Data2TB/test-p/annotation/BRAKER3/run2/errors/GeneMark-ETP.stderr The most common problem is an expired or not present file ~/.gm_key!

It is clearly related to the GeneMark-ETB LICENCE, which is, in fact not available for use

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/577#issuecomment-1445979978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JGBVLEMNGWKYHFTAJTWZRXFFANCNFSM6AAAAAAVHE6PTA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JohnUrban commented 1 year ago

I have a valid license for GeneMark-ES/ET/EP ver 4.71_lic. It works for both Braker1 and Braker2 runs. It does not work for Braker3 runs. Perhaps that is b/c that license doesn't also work for GeneMark-ETP, which was downloaded separately from: https://github.com/gatech-genemark/GeneMark-ETP .

KatharinaHoff commented 1 year ago

The container has today been updated to contain GeneMark-ETP. You still have to install the license key file in your home directory (license key of GeneMark-ES/ET/EP works) as file ~/.gm_key. Otherwise, it should be very easy to run, now.

JohnUrban commented 1 year ago

Is there a way to get GeneMark-ETP if one is not using the container?

KatharinaHoff commented 1 year ago

Look at the Dockerfile… I think that will answer the question.

John Urban @.***> schrieb am Do. 2. März 2023 um 18:51:

Is there a way to get GeneMark-ETP if one is not using the container?

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/577#issuecomment-1452277426, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JHUWFPICZAQDYCG6LLW2DM2BANCNFSM6AAAAAAVHE6PTA . You are receiving this because you commented.Message ID: @.***>

JohnUrban commented 1 year ago

Uh oh - time to learn Docker.

KatharinaHoff commented 1 year ago

No, a keyword search will do for this. There’s a link.

John Urban @.***> schrieb am Do. 2. März 2023 um 18:57:

Uh oh - time to learn Docker.

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/577#issuecomment-1452291086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JDD3Q2BQVE5SVMVWU3W2DNSNANCNFSM6AAAAAAVHE6PTA . You are receiving this because you commented.Message ID: @.***>

JohnUrban commented 1 year ago

Got it! Thanks!

wget  http://topaz.gatech.edu/GeneMark/etp.for_braker.tar.gz && \
    tar -xzf etp.for_braker.tar.gz && \
    mv etp.for_braker ETP && \
    chmod a+x /opt/ETP/bin/*py /opt/ETP/bin/*pl /opt/ETP/tools/*

JohnUrban commented 1 year ago

Well, I gave Braker3 a try with the GeneMark-ETP copy found here http://topaz.gatech.edu/GeneMark/etp.for_braker.tar.gz -- but gmetp.pl gives the same error that I opened up this thread with -- essentially complete.gtf and complete_uniq.gtf not made/found:

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created.
error, file not found: option --f1 complete.gtf
error on open file complete.id: No such file or directory
mv: cannot stat ‘complete_uniq.gtf’: No such file or directory
error on open file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf: No such file or directory
error on create_regions.pl at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmetp.pl line 2162.

I am not using the container, so I know that makes this report extra annoying, and I do apologize for that! I personally couldn't get the whole singularity thing working, it kept complaining about root stuff (I'm on a remote cluster without root/sudo privileges). But if necessary, I will give that route another try in the future -- I'm sure there are local options to learn about.

As for my conda-supported approach detailed in this thread, and especially after getting the same error with the new GeneMark-ETP copy, I am skeptical that this is a ~/.gm_key problem unless I need an entirely new ~/.gm_key for GeneMark-ETP. It works fine for the other GeneMark software with Braker1/Braker2. GeneMark-ETP does run a little bit, but seems to fail to create/find/open those files.

KatharinaHoff commented 1 year ago

The rootless warnings of singularity can be safely ignored.

John Urban @.***> schrieb am Do. 2. März 2023 um 19:26:

Well, I gave Braker3 a try with the GeneMark-ETP copy found here http://topaz.gatech.edu/GeneMark/etp.for_braker.tar.gz -- but gmetp.pl gives the same error that I opened up this thread with -- essentially complete.gtf and complete_uniq.gtf not made/found:

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created.

error, file not found: option --f1 complete.gtf

error on open file complete.id: No such file or directory

mv: cannot stat ‘complete_uniq.gtf’: No such file or directory

error on open file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf: No such file or directory

error on create_regions.pl at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmetp.pl line 2162.

I am not using the container, so I know that makes this report extra annoying, and I do apologize for that! I personally couldn't get the whole singularity thing working, it kept complaining about root stuff (I'm on a remote cluster without root/sudo privileges). But if necessary, I will give that route another try in the future -- I'm sure there are local options to learn about.

As for my conda-supported approach detailed in this thread, and especially after getting the same error with the new GeneMark-ETP copy, I am skeptical that this is a ~/.gm_key problem unless I need an entirely new ~/.gm_key for GeneMark-ETP. It works fine for the other GeneMark software with Braker1/Braker2. GeneMark-ETP does run a little bit, but seems to fail to create/find/open those files.

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/577#issuecomment-1452341605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JHQ5VOCZH5TLU3BWK3W2DQ57ANCNFSM6AAAAAAVHE6PTA . You are receiving this because you commented.Message ID: @.***>

KatharinaHoff commented 1 year ago

Maybe I misunderstood. You very likely need root privileges to install Singularity. If it is not on your cluster, conda is the next best approach.

Did you update Braker? Did you switch to master branch?

Katharina Hoff @.***> schrieb am Do. 2. März 2023 um 19:32:

The rootless warnings of singularity can be safely ignored.

John Urban @.***> schrieb am Do. 2. März 2023 um 19:26:

Well, I gave Braker3 a try with the GeneMark-ETP copy found here http://topaz.gatech.edu/GeneMark/etp.for_braker.tar.gz -- but gmetp.pl gives the same error that I opened up this thread with -- essentially complete.gtf and complete_uniq.gtf not made/found:

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created.

error, file not found: option --f1 complete.gtf

error on open file complete.id: No such file or directory

mv: cannot stat ‘complete_uniq.gtf’: No such file or directory

error on open file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf: No such file or directory

error on create_regions.pl at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmetp.pl line 2162.

I am not using the container, so I know that makes this report extra annoying, and I do apologize for that! I personally couldn't get the whole singularity thing working, it kept complaining about root stuff (I'm on a remote cluster without root/sudo privileges). But if necessary, I will give that route another try in the future -- I'm sure there are local options to learn about.

As for my conda-supported approach detailed in this thread, and especially after getting the same error with the new GeneMark-ETP copy, I am skeptical that this is a ~/.gm_key problem unless I need an entirely new ~/.gm_key for GeneMark-ETP. It works fine for the other GeneMark software with Braker1/Braker2. GeneMark-ETP does run a little bit, but seems to fail to create/find/open those files.

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/577#issuecomment-1452341605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JHQ5VOCZH5TLU3BWK3W2DQ57ANCNFSM6AAAAAAVHE6PTA . You are receiving this because you commented.Message ID: @.***>

JohnUrban commented 1 year ago

Hey - is Braker3 on the master branch now? I was on git checkout braker3.

As for singularity, I installed it using conda. So I have I guess I have a local copy of it. I followed the singularity instructions on the main page (e.g. singularity build braker3.sif docker://teambraker/braker3:latest). It seemed to install, but when I tried the next part (singularity exec braker3.sif print_braker3_setup.py or singularity exec braker3.sif braker.pl), it threw an error. I erased the sif file after that, but at the moment I am re-doing the first step to try to reproduce that error (or get past it).

JohnUrban commented 1 year ago

Ok. For the Singularity errors:

Command:

singularity exec braker3.sif print_braker3_setup.py

Error:

INFO:    Converting SIF file to temporary sandbox...
FATAL:   while extracting braker3.sif: root filesystem extraction failed: extract command failed: ERROR  : Failed to create user namespace: user namespace disabled
: exit status 1

Other Command:

singularity exec braker3.sif braker.pl

Same error:

INFO:    Converting SIF file to temporary sandbox...
FATAL:   while extracting braker3.sif: root filesystem extraction failed: extract command failed: ERROR  : Failed to create user namespace: user namespace disabled
: exit status 1

Thamos commented 1 year ago

The container is not working because you didn't install singularity as root. There are possibilities to get it working without root I think, but it depends on the kernel version, see https://docs.sylabs.io/guides/3.5/admin-guide/installation.html#install-nonsetuid The best option would be to ask your cluster admin to install singularity (if possible).

KatharinaHoff commented 1 year ago

I will ask our HPC admin. Possibly fakeroot has to be enabled in Singularity, but I am not sure. I vaguely recall that we discussed that a long time ago.

Edit: Oh, yes, you need to be root to install Singularity.

KatharinaHoff commented 1 year ago

Hey - is Braker3 on the master branch now? I was on git checkout braker3.

Yes, checkout master. We finally merged. And git pull to update to the latest code.

JohnUrban commented 1 year ago

Ok - I should have mentioned this for the braker3 branch already, but I didn't want to bombard you with issues (more so than I have).

Line 2295 in braker.pl that tries to assess the java version causes an error. The approach to getting the java version changed and the fix is simply changing it back to the old way.

New way:

$cmdString = "java -version 2>&1 | grep \"openjdk version\" | awk -F[\"\.] -v OFS=. '{print \$2,\$3}'";

Old way:

$cmdString = "java -version 2>&1 | awk -F[\\\"\\\.] -v OFS=. 'NR==1{print \$2,\$3}'";

Full context:

####################### set_JAVA_PATH #######################################
# * set path to java
# * also checks whether java version 1.8 is present
################################################################################

sub set_JAVA_PATH {
    my @required_files = ('java');
    $JAVA_PATH = set_software_PATH($java_path, "JAVA_PATH",
                    \@required_files, 'exit');

    #$cmdString = "java -version 2>&1 | grep \"openjdk version\" | awk -F[\"\.] -v OFS=. '{print \$2,\$3}'";
    $cmdString = "java -version 2>&1 | awk -F[\\\"\\\.] -v OFS=. 'NR==1{print \$2,\$3}'";
    my @javav = `$cmdString` or die("Failed to execute: $cmdString");
    if(not ($javav[0] =~ m/1\.8/ )){
        $prtStr = "\# " . (localtime) . " ERROR: in file " . __FILE__
            ." at line ". __LINE__ ."\n"
            . "You have installed java version $javav[0]. GUSHR requires version 1.8!\n"
            . "You can switch between java versions on your system with:\n"
            . "sudo update-alternatives --config java\n";
        $logString .= $prtStr;
        print STDERR $logString;
        exit(1);
    }

}

JohnUrban commented 1 year ago

Alright.

I ran Braker3 again with every thing up-to-date -- noting that I did have to change the java version line back to the old approach as described above.

I still get the same error I opened up this issue with (from braker3/errors/GeneMark-ETP.stderr).

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created.
error, file not found: option --f1 complete.gtf
error on open file complete.id: No such file or directory
mv: cannot stat ‘complete_uniq.gtf’: No such file or directory
error on open file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf: No such file or directory
error on create_regions.pl at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmetp.pl line 2162.

Here are the contents of the GeneMark-ETP subdirectory inside the Braker3 working directory:

> ls braker3/GeneMark-ETP/

arx  data  etp_config.yaml  etp_data  filter_gmst.log  proteins.fa  prothint_gmst.log  rnaseq

> ls braker3/GeneMark-ETP/*/

braker3/GeneMark-ETP/arx/:
chr.names  genome.fa

braker3/GeneMark-ETP/data/:
genome.fasta  genome.softmasked.fasta  genome.softmasked.fasta.fai  proteins.fa

braker3/GeneMark-ETP/etp_data/:
forward.bam  reverse.bam

braker3/GeneMark-ETP/proteins.fa/:

braker3/GeneMark-ETP/rnaseq/:
gmst  hints  hisat2  stringtie

> ls braker3/GeneMark-ETP/*/*/

braker3/GeneMark-ETP/rnaseq/gmst/:
GeneMark_hmm.mod  genome_gmst_for_HC.gtf  genome_gmst.gtf  gms.log  transcripts_merged.fasta.gff

braker3/GeneMark-ETP/rnaseq/hints/:
bam2hints_forward.gff  bam2hints_merged.gff  bam2hints_reverse.gff  hintsfile_merged.gff  proteins.fa

braker3/GeneMark-ETP/rnaseq/hisat2/:
mapping_forward.bam  mapping_reverse.bam

braker3/GeneMark-ETP/rnaseq/stringtie/:
transcripts_forward.gff  transcripts_merged.fasta  transcripts_merged.gff  transcripts_reverse.gff

> ls braker3/GeneMark-ETP/*/*/*

braker3/GeneMark-ETP/rnaseq/gmst/GeneMark_hmm.mod              braker3/GeneMark-ETP/rnaseq/hints/bam2hints_forward.gff  braker3/GeneMark-ETP/rnaseq/hisat2/mapping_reverse.bam
braker3/GeneMark-ETP/rnaseq/gmst/genome_gmst_for_HC.gtf        braker3/GeneMark-ETP/rnaseq/hints/bam2hints_merged.gff   braker3/GeneMark-ETP/rnaseq/stringtie/transcripts_forward.gff
braker3/GeneMark-ETP/rnaseq/gmst/genome_gmst.gtf               braker3/GeneMark-ETP/rnaseq/hints/bam2hints_reverse.gff  braker3/GeneMark-ETP/rnaseq/stringtie/transcripts_merged.fasta
braker3/GeneMark-ETP/rnaseq/gmst/gms.log                       braker3/GeneMark-ETP/rnaseq/hints/hintsfile_merged.gff   braker3/GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff
braker3/GeneMark-ETP/rnaseq/gmst/transcripts_merged.fasta.gff  braker3/GeneMark-ETP/rnaseq/hisat2/mapping_forward.bam   braker3/GeneMark-ETP/rnaseq/stringtie/transcripts_reverse.gff

braker3/GeneMark-ETP/rnaseq/hints/proteins.fa:
log  prothint  tmp

...and so on

JohnUrban commented 1 year ago

It cannot find a file located here: braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf

So I decided to look at what is there:

> ls braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/

log  prothint  tmp

I wonder if the log file (braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/log) holds the answer or a hint - contents:

02-Mar-23 11:36:56 - INFO: Starting the GMST filtering and classification.
02-Mar-23 11:36:56 - INFO: Running the following system call: /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/GeneMarkSTFiltering/gms2hints.pl --tseq /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/stringtie/transcripts_merged.fasta --ggtf /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff                --tgff /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/gmst/transcripts_merged.fasta.gff --out /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/tmp/gmsttbny79f0.gtf                  
02-Mar-23 11:36:56 - INFO: Making diamond database from /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/proteins.fa
02-Mar-23 11:36:56 - INFO: Running the following system call: diamond makedb --in /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/proteins.fa -d /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/tmp/diamondDBbtbbubj0.dmnd
02-Mar-23 11:37:16 - ERROR: Program exited due to an error in command: diamond makedb --in /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/proteins.fa -d /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/tmp/diamondDBbtbbubj0.dmnd
02-Mar-23 11:37:16 - ERROR: Check stderr for more details.

When I run the diamond makedb command to see the error:

> diamond makedb --in /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/proteins.fa -d /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/tmp/diamondDBbtbbubj0.dmnd

diamond v2.1.3.157 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 32
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/proteins.fa
Opening the database file...  [0.143s]
Loading sequences...  [3.726s]
Masking sequences...  [2.614s]
Writing sequences...  [0.562s]
Hashing sequences...  [0.239s]
Loading sequences... Error: Error reading input stream at line 7477106: Invalid character (.) in sequence

JohnUrban commented 1 year ago

And looking more into that, for some reason many of the OrthoDB protein sequences end with a period... e.g.:

>307491_0:000000
MLAYADNIVVMGETKDINSTSKLISSNNFKYLGVNINNKIGMHIEINERITNGNSCYFSIIKFLRS.

I downloaded orthodb proteins like this:

wget --no-check-certificate https://v100.orthodb.org/download/odb10_metazoa_fasta.tar.gz
tar -xzf odb10_metazoa_fasta.tar.gz 
cat metazoa/Rawdata/* > metazoan-proteins-orthoDB.fasta
rm -r metazoa

I will look into removing these periods.... what strikes me funny though is that this did not cause an error when running Braker2. Or maybe it caused an error that went silent/undetected...?

Thamos commented 1 year ago

I just checked the sequence in orthodb 11 and there it seems to be okay. So maybe if you update to 11 the problem fixes itself.

>307491_0:000000        307491_0
MLAYADNIVVMGETKDINSTSKLISSNNFKYLGVNINNKIGMHIEINERITNGNSCYFSIIKFLRS

JohnUrban commented 1 year ago

I re-downloaded the OrthoDB proteins to confirm its not just my copy somehow, and indeed 17014 of the 8266016 metazoan proteins end with a period. But this was still v10 -- thanks to @Thamos for telling me about v11. I didn't realize there had been an update.

Nonetheless, I removed the periods at the end of ODBv10 seqs the following way:

 awk '{sub(/\.$/,""); print}'  proteins.fa > proteins.fixed.fa

That definitely solved the diamond makedb problem. And that in turn might solve my whole issue.... waiting to find out still.

I am still scratching my head as to why Braker2 didn't fail b/c of these period-containing sequences though. I'm somewhat guessing that it might have "failed silently" and perhaps I should be skeptical of those results.

Thamos commented 1 year ago

Do you have a (non empty) "prothint.gff" file in your braker2 directories? I think if "diamond makedb" didn't work there shouldn't be one, as prothint uses diamond. E.g. in my case with orthodb plants it's 54MB.

tomasbruna commented 1 year ago

Hi @JohnUrban, no need to be worried about that. BRAKER2 calls DIAMOND only via ProtHint which sanitizes the protein input (https://github.com/gatech-genemark/ProtHint/commit/19ef04c93bfa691bd6583017189b6e340a4513df).

In BRAKER3, DIAMOND is also called "directly" on raw protein input. That's definitely something to fix, thanks for pointing out.

tomasbruna commented 1 year ago

I'll open an issue about this in GeneMark-ETP.

JohnUrban commented 1 year ago

@Thamos the previous runs with the "dirty" protein sequences did not have that file. Now that I have a "pre-sanitized" protein file, Diamond is happily working at the moment, and I suspect I will get the "prothint.gff" file when it finishes up.

@tomasbruna glad I could point out a real issue here. I will keep you posted on whether or not this allows Braker3 to finish.

JohnUrban commented 1 year ago

@Thamos is there a link like https://v100.orthodb.org/download/odb10_metazoa_fasta.tar.gz for v11?

Else, I could download the whole v11 db here: https://data.orthodb.org/download/ ...do you have any tips on how to filter the entire DB to get just metazoan proteins?

tomasbruna commented 1 year ago

https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/ :)

JohnUrban commented 1 year ago

Well, I can confirm that "sanitizing" the protein file allowed Braker3 to get past that step.

The complete.gtf, complete.id, complete_uniq.gtf files are now created/found/openable -- so the main thrust of this issue/thread has been solved.

Nonetheless, a new error arose, and I will continue to diagnose and push through here.

#*********
# WARNING: Number of reliable training genes is low (10). Recommended are at least 600 genes
#*********
ERROR: in file /central/groups/carnegie_poc/jurban/software/braker2/braker3/masterbranch/BRAKER/scripts/./helpMod.pm at line 307
 found neither /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin//cfg/tsebra/braker3.cfg nor /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/scripts//cfg/tsebra/braker3.cfg nor /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/scripts//cfg/tsebra/braker3.cfg nor /central/groups/carnegie_poc/jurban/software/braker2/braker3/masterbranch/BRAKER/scripts//cfg/tsebra/braker3.cfg!
Please Check the environment variables AUGUSTUS_CONFIG_PATH and command line options AUGUSTUS_BIN_PATH and AUGUSTUS_SCRIPTS_PATH or install AUGUSTUS, again!

Note that I am running this on a single Mb "toy sample" to get through all the bugs before launching it on the 500 Mb genome -- hence the warning of a low number of training genes... although the same message for Braker1 was Number of reliable training genes is low (33). and for Braker2 was WARNING: Number of reliable training genes is low (38).... so Braker3 had 3-4-fold fewer training genes at this stage.......

JohnUrban commented 1 year ago

This looks fixable by using a git cloned version of TSEBRA instead of Conda:

> ls ~/software/braker2/associated_software/TSEBRA/config/

braker3.cfg  default.cfg  keep_ab_initio.cfg  pref_braker1.cfg

JohnUrban commented 1 year ago

Or actually.... it seems like it is expected to be in BRAKER/scripts/cfg/tsebra but the tsebra subdir is not present :

> ls BRAKER/scripts/cfg/ 
ep.cfg  ep_utr.cfg  etp.cfg  gth.cfg  gth_utr.cfg  rnaseq.cfg  rnaseq_utr.cfg

JohnUrban commented 1 year ago

I just also noticed that braker3/errors/GeneMark-ETP.stderr is full of new errors:

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/masterbranch/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created.
Use of uninitialized value $ph1 in addition (+) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph0 in addition (+) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph2 in addition (+) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph0 in division (/) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 208.
Illegal division by zero at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 208.
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GC.mat: No such file or directory
cat: GC.mat: No such file or directory
cat: GC.mat: No such file or directory
Illegal division by zero at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/train_super.pl line 184.
error, file not found: option --f1 prothint/prothint.gff
grep: prothint/evidence.gff: No such file or directory
grep: prothint/evidence.gff: No such file or directory
Traceback (most recent call last):
  File "/central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/printRnaAlternatives.py", line 353, in <module>
    main()
  File "/central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/printRnaAlternatives.py", line 289, in main
    candidates = loadIntrons(args.genemark)
  File "/central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/printRnaAlternatives.py", line 193, in loadIntrons
    for row in csv.reader(open(inputFile), delimiter='\t'):
FileNotFoundError: [Errno 2] No such file or directory: 'pred_m/genemark.gtf'
error, file not found: option --f1 prothint/prothint.gff
grep: prothint/evidence.gff: No such file or directory
grep: prothint/evidence.gff: No such file or directory
Died at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/format_back.pl line 14.
Died at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/format_back.pl line 14.
Use of uninitialized value $ph1 in addition (+) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph0 in addition (+) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph2 in addition (+) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph0 in division (/) at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 208.
Illegal division by zero at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/gmes/parse_set.pl line 208.
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GC.mat: No such file or directory
cat: GC.mat: No such file or directory
cat: GC.mat: No such file or directory
Illegal division by zero at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/train_super.pl line 184.
error, file not found: option --f1 prothint/prothint.gff
grep: prothint/evidence.gff: No such file or directory
grep: prothint/evidence.gff: No such file or directory
Died at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/format_back.pl line 14.
Died at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/alt/ETP/bin/format_back.pl line 14.

I'm not sure if these are upstream or downstream of helpMod.pm not finding braker3.cfg.

tomasbruna commented 1 year ago

I've seen these errors when the input is too small and GeneMark does not have enough sequence to train itself on. Did you try testing with our small example at https://github.com/Gaius-Augustus/BRAKER/blob/master/example/tests/test3.sh?

JohnUrban commented 1 year ago

Yes - that is a better idea. I will start using the provided tests. My bad.

Thamos commented 1 year ago

I got the same erros when I tried the new GeneMark-ETP version about two weeks ago (on a whole genome) but thought it's probably because braker wasn't (back then) updatet yet. I'm running it with a full genome currently at the ProtHint step, so we'll see if I also get these erros again.

JohnUrban commented 1 year ago

Alright - so I tested test3.sh.

For it to finish, I need to put the latest github version of TSEBRA in the PATH like this:

TSEBRA=~/software/braker2/associated_software/TSEBRA/
TSEBRA_BIN=${TSEBRA}/bin
TSEBRA_CFG=${TSEBRA}/config
export PATH=${BRAKER3}:${GENEMARK_ETP_BIN}:${GENEMARK_ETP_TOOLS}:${PROTHINT2}:${TSEBRA}:${TSEBRA_BIN}:${TSEBRA_CFG}:${PATH}

I'm not totally sure if you need all 3 TSEBRA paths or just to the bin...

When using the tsebra installed as part of Braker3 dependencies with Conda (as described above), it fails with this message:

# WARNING: Number of reliable training genes is low (114). Recommended are at least 600 genes
#*********
ERROR: in file /central/groups/carnegie_poc/jurban/software/braker2/braker3/masterbranch/BRAKER/scripts/./helpMod.pm at line 307
 found neither /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin//cfg/tsebra/braker3.cfg nor /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/scripts//cfg/tsebra/braker3.cfg nor /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/scripts//cfg/tsebra/braker3.cfg nor /central/groups/carnegie_poc/jurban/software/braker2/braker3/masterbranch/BRAKER/scripts//cfg/tsebra/braker3.cfg!
Please Check the environment variables AUGUSTUS_CONFIG_PATH and command line options AUGUSTUS_BIN_PATH and AUGUSTUS_SCRIPTS_PATH or install AUGUSTUS, again!

I do not see the extensive error messages in test3/errors/GeneMark-ETP.stderr as I did with my own sample though.

LarsGab commented 1 year ago

Hi,

the TSEBRA/bin path is enough for BRAKER. Are you sure you are using the latest version of TSEBRA from GitHub? There should be a configuration file TSEBRA/config/braker3.cfg.

JohnUrban commented 1 year ago

Yes. TSEBRA/bin in the PATH helped solve the braker3.cfg issue. I happened to also include TSEBRA/ and TSEBRA/config in the PATH, but now I have my answer that those latter 2 are not needed.

JohnUrban commented 1 year ago

I have Braker3 working on test3 with UTR=on and my "toy example" (without UTR=on). It is now running on the full genome and all RNA-seq and protein data. I suspect it will finish. When it does, I will report back, and we can close this issue.

KatharinaHoff commented 1 year ago

Please do not run BRAKER3 with UTR=on ... it is not a good idea.

In case of BRAKER3, UTRs could be inferred from the StrinTie assembly. That would be sane to do. It's not smart to use the old GUSHR approach, here. And it's not supported in the container.

JohnUrban commented 1 year ago

Thanks @KatharinaHoff - I have just seen your UTR=on comment. The following is from a UTR=on run from this past weekend. It ended with some weird results that I do not think was from UTR=on, so I will report below.

The Braker3 run did finish on the 500 Mb genome, but the braker3.gtf file did not look right. For example, there are no gene lines. Here is an example:

primary_contig_1    AUGUSTUS    stop_codon  102906  102908  .   -   0   transcript_id "g4.t1"; gene_id "g4"; supported "False";
primary_contig_1    AUGUSTUS    CDS 102906  103304  1   -   0   transcript_id "g4.t1"; gene_id "g4"; cds_type "single";
primary_contig_1    AUGUSTUS    start_codon 103302  103304  .   -   0   transcript_id "g4.t1"; gene_id "g4"; supported "True";
###
primary_contig_1    AUGUSTUS    stop_codon  160846  160848  .   -   0   transcript_id "g9.t1"; gene_id "g9"; supported "False";
primary_contig_1    AUGUSTUS    CDS 160846  161190  1   -   0   transcript_id "g9.t1"; gene_id "g9"; cds_type "single";
primary_contig_1    AUGUSTUS    start_codon 161188  161190  .   -   0   transcript_id "g9.t1"; gene_id "g9"; supported "True";
###
primary_contig_1    AUGUSTUS    stop_codon  330311  330313  .   -   0   transcript_id "g12.t1"; gene_id "g12"; supported "True";
primary_contig_1    AUGUSTUS    CDS 330311  330982  0.93    -   0   transcript_id "g12.t1"; gene_id "g12"; cds_type "single";
primary_contig_1    AUGUSTUS    start_codon 330980  330982  .   -   0   transcript_id "g12.t1"; gene_id "g12"; supported "False";
###
primary_contig_1    AUGUSTUS    start_codon 343504  343506  .   +   0   transcript_id "g13.t1"; gene_id "g13"; supported "False";
primary_contig_1    AUGUSTUS    CDS 343504  344241  1   +   0   transcript_id "g13.t1"; gene_id "g13"; cds_type "single";
primary_contig_1    AUGUSTUS    stop_codon  344239  344241  .   +   0   transcript_id "g13.t1"; gene_id "g13"; supported "True";
###
primary_contig_1    AUGUSTUS    stop_codon  403823  403825  .   -   0   transcript_id "g21.t1"; gene_id "g21"; supported "True";
primary_contig_1    AUGUSTUS    CDS 403823  404656  1   -   0   transcript_id "g21.t1"; gene_id "g21"; cds_type "single";
primary_contig_1    AUGUSTUS    start_codon 404654  404656  .   -   0   transcript_id "g21.t1"; gene_id "g21"; supported "True";
###
primary_contig_1    AUGUSTUS    stop_codon  414686  414688  .   -   0   transcript_id "g23.t1"; gene_id "g23"; supported "False";
primary_contig_1    AUGUSTUS    CDS 414686  415387  0.63    -   0   transcript_id "g23.t1"; gene_id "g23"; cds_type "single";
primary_contig_1    AUGUSTUS    start_codon 415385  415387  .   -   0   transcript_id "g23.t1"; gene_id "g23"; supported "True";
###
primary_contig_1    AUGUSTUS    stop_codon  457197  457199  .   -   0   transcript_id "g27.t1"; gene_id "g27"; supported "True";
primary_contig_1    AUGUSTUS    CDS 457197  457547  0.82    -   0   transcript_id "g27.t1"; gene_id "g27"; cds_type "single";
primary_contig_1    AUGUSTUS    start_codon 457545  457547  .   -   0   transcript_id "g27.t1"; gene_id "g27"; supported "False";
###
primary_contig_1    AUGUSTUS    stop_codon  476245  476247  .   -   0   transcript_id "g30.t1"; gene_id "g30"; supported "False";
primary_contig_1    AUGUSTUS    CDS 476245  477099  1   -   0   transcript_id "g30.t1"; gene_id "g30"; cds_type "single";

The augustus.hints.gtf file looks right though:

primary_contig_97   AUGUSTUS    gene    1347    1898    0.57    -   .   g1
primary_contig_97   AUGUSTUS    transcript  1347    1898    0.57    -   .   g1.t1
primary_contig_97   AUGUSTUS    stop_codon  1347    1349    .   -   0   transcript_id "g1.t1"; gene_id "g1";
primary_contig_97   AUGUSTUS    CDS 1347    1898    0.57    -   0   transcript_id "g1.t1"; gene_id "g1";
primary_contig_97   AUGUSTUS    exon    1347    1898    .   -   .   transcript_id "g1.t1"; gene_id "g1";
primary_contig_97   AUGUSTUS    start_codon 1896    1898    .   -   0   transcript_id "g1.t1"; gene_id "g1";
primary_contig_97   AUGUSTUS    gene    18226   18546   1   +   .   g2
primary_contig_97   AUGUSTUS    transcript  18226   18546   1   +   .   g2.t1
primary_contig_97   AUGUSTUS    start_codon 18226   18228   .   +   0   transcript_id "g2.t1"; gene_id "g2";
primary_contig_97   AUGUSTUS    CDS 18226   18546   1   +   0   transcript_id "g2.t1"; gene_id "g2";
primary_contig_97   AUGUSTUS    exon    18226   18546   .   +   .   transcript_id "g2.t1"; gene_id "g2";
primary_contig_97   AUGUSTUS    stop_codon  18544   18546   .   +   0   transcript_id "g2.t1"; gene_id "g2";
primary_contig_97   AUGUSTUS    gene    19143   19775   0.94    +   .   g3
primary_contig_97   AUGUSTUS    transcript  19143   19775   0.94    +   .   g3.t1
primary_contig_97   AUGUSTUS    start_codon 19143   19145   .   +   0   transcript_id "g3.t1"; gene_id "g3";
primary_contig_97   AUGUSTUS    CDS 19143   19775   0.94    +   0   transcript_id "g3.t1"; gene_id "g3";
primary_contig_97   AUGUSTUS    exon    19143   19775   .   +   .   transcript_id "g3.t1"; gene_id "g3";
primary_contig_97   AUGUSTUS    stop_codon  19773   19775   .   +   0   transcript_id "g3.t1"; gene_id "g3";
primary_contig_97   AUGUSTUS    gene    19876   20100   0.99    +   .   g4
primary_contig_97   AUGUSTUS    transcript  19876   20100   0.99    +   .   g4.t1
primary_contig_97   AUGUSTUS    start_codon 19876   19878   .   +   0   transcript_id "g4.t1"; gene_id "g4";
primary_contig_97   AUGUSTUS    CDS 19876   20100   0.99    +   0   transcript_id "g4.t1"; gene_id "g4";
primary_contig_97   AUGUSTUS    exon    19876   20100   .   +   .   transcript_id "g4.t1"; gene_id "g4";
primary_contig_97   AUGUSTUS    stop_codon  20098   20100   .   +   0   transcript_id "g4.t1"; gene_id "g4";
primary_contig_97   AUGUSTUS    gene    20221   22170   0.65    +   .   g5
primary_contig_97   AUGUSTUS    transcript  20221   22170   0.65    +   .   g5.t1
primary_contig_97   AUGUSTUS    start_codon 20221   20223   .   +   0   transcript_id "g5.t1"; gene_id "g5";
primary_contig_97   AUGUSTUS    CDS 20221   22170   0.65    +   0   transcript_id "g5.t1"; gene_id "g5";
primary_contig_97   AUGUSTUS    exon    20221   22170   .   +   .   transcript_id "g5.t1"; gene_id "g5";
primary_contig_97   AUGUSTUS    stop_codon  22168   22170   .   +   0   transcript_id "g5.t1"; gene_id "g5";

This seems similar to the issue reported today:

https://github.com/Gaius-Augustus/BRAKER/issues/582

I will follow some of the debug hints therein, and get back.

JohnUrban commented 1 year ago

RE: UTR=on and using the StringTie assembly to infer UTRs.

Does there exist any tools to use the StringTie assembly to update the GTF with UTR info? Or are we on our own for that for now?
Do you know how a lack of UTRs in transcript sequences affects RNA-seq quantification and differential expression analyses?

JohnUrban commented 1 year ago

@LarsGab thanks for updating the code to remove dots (".") at the end of protein sequences.

I noticed ODB10 and ODB11 also have "*" in tens of thousands of the protein sequences. Does the Braker code also strip those, and if not, could they be causing a problem?

Moreover, ODB10/ODB11 uses an expanded alphabet to represent ambiguous postions. That is to say, in addition to the 20 normal AAs, it also uses some other letters to represent ambiguity: B, Z, X, and J. See the following link for info on those letters: https://www.ddbj.nig.ac.jp/ddbj/code-e.html It also uses "U" for a rare AA (not part of the normal 20): Selenocysteine.

I'd also be curious if these other letters may cause any issues in protein steps.

Overall, I'm trying to diagnose what is causing problems with GeneMark in Braker3 (see #582 as well for example).

Edit: These characters also show up in proteomes downloaded from NCBI.

JohnUrban commented 1 year ago

Update: providing a raw fastq, instead of a STAR produced BAM, solved GeneMark-ETP issues we were having. See #582

JohnUrban commented 1 year ago

Another update:

Cleaning up the protein sequences to have only the 20 canonical amino acid letters did not solve the issue.

JohnUrban commented 1 year ago

Update:

Providing HiSat2 BAMs does not have the error-causing effects of STAR BAMs. Braker3 (and GeneMark) finish fine. So it is specific to STAR BAMs or BAMs produced by aligners other than hisat2.

hisat2 -x ../index/toy -U ../../subsamp.fastq.gz --dta -p 16 | samtools sort --threads 16 > subsam.bam

Thamos commented 1 year ago

As far as i remember hisat, by default, produces some sam tags that are needed for stringtie to work. Star doesn't include those tags by default, but I think through some options it can. Did you use those options or did you run it more or less default? Because if default my guess would be that that's the reason for the problems with the star bams.

Edit: Found it again, it's described here in section "Input files". https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual

Gaius-Augustus / BRAKER

Braker3/GeneMark-ETP: file not found: complete.gtf, complete.id, complete_uniq.gtf #577

*********