Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
363 stars 81 forks source link

Braker3/GeneMark-ETP: file not found: complete.gtf, complete.id, complete_uniq.gtf #577

Open JohnUrban opened 1 year ago

JohnUrban commented 1 year ago

Hello,

Thank you for all the great tools coming from this team.

I gave Braker3 a shot, but am running into an error at the moment. I will report below how I installed Braker3, and how I used it in case it helps reproduce the error.

I would be grateful for any guidance you can provide, and am eager to get Braker3 working at some point in the near future, but fully understand that you are busy. I am mainly reporting this issue in case it helps your development.



First, here was the command used.


braker.pl --genome=${ASM} --UTR=on --stranded=+,- --bam=${FWD},${REV} --prot_seq=${PROTEINS} --workingdir=braker3 --threads=16


Second, here are the errors as reported.


This was reported to stdout/stderr.

# Fri Feb 24 08:59:35 2023: Creating directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3.
# Fri Feb 24 08:59:35 2023:Both protein and RNA-Seq libraries in input detected. BRAKER will be executed in ETP mode.
#*********
# Fri Feb 24 08:59:38 2023: Log information is stored in file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/braker.log
#*********
# WARNING: Detected whitespace in fasta header of file /central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
ERROR in file /home/jurban/software/braker2/braker3/BRAKER/scripts/braker.pl at line 5486
Failed to execute: /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl --cfg /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_config.yaml --workdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP --bam /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/ --cores 16 --softmask 1>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stdout 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stderr
Failed to execute: /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl --cfg /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_config.yaml --workdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP --bam /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/ --cores 16 --softmask 1>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stdout 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stderr
The most common problem is an expired or not present file ~/.gm_key!

This is from braker.log


#**********************************************************************************
#                               BRAKER CONFIGURATION                               
#**********************************************************************************
# BRAKER CALL: /home/jurban/software/braker2/braker3/BRAKER/scripts/braker.pl --genome=/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/longest.fa.masked --UTR=on --stranded=+,- --bam=/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/forward.bam,/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/reverse.bam --prot_seq=/central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta --workingdir=braker3 --threads=16
# Fri Feb 24 08:59:35 2023: braker.pl version 3.0.0
# Fri Feb 24 08:59:35 2023: Creating directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3.
# Fri Feb 24 08:59:35 2023: Creating directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3.
# Fri Feb 24 08:59:35 2023:Both protein and RNA-Seq libraries in input detected. BRAKER will be executed in ETP mode.
#*********
# Fri Feb 24 08:59:35 2023: Configuring of BRAKER for using external tools...
# Fri Feb 24 08:59:35 2023: Trying to set $AUGUSTUS_CONFIG_PATH...
# Fri Feb 24 08:59:35 2023: Found environment variable $AUGUSTUS_CONFIG_PATH.
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config/ as potential path for $AUGUSTUS_CONFIG_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $AUGUSTUS_CONFIG_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config/!
# Fri Feb 24 08:59:35 2023: Trying to set $AUGUSTUS_BIN_PATH...
# Fri Feb 24 08:59:35 2023: Found environment variable $AUGUSTUS_BIN_PATH.
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/ as potential path for $AUGUSTUS_BIN_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $AUGUSTUS_BIN_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/!
# Fri Feb 24 08:59:35 2023: Trying to set $AUGUSTUS_SCRIPTS_PATH...
# Fri Feb 24 08:59:35 2023: Found environment variable $AUGUSTUS_SCRIPTS_PATH.
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/ as potential path for $AUGUSTUS_SCRIPTS_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $AUGUSTUS_SCRIPTS_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/!
# Fri Feb 24 08:59:35 2023: Trying to set $PYTHON3_PATH...
# Fri Feb 24 08:59:35 2023: Did not find environment variable $PYTHON3_PATH.
# Fri Feb 24 08:59:35 2023: Trying to guess PYTHON3_PATH from location of python3 executable that is available in your $PATH
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $PYTHON3_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $PYTHON3_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:35 2023: Trying to set $JAVA_PATH...
# Fri Feb 24 08:59:35 2023: Did not find environment variable $JAVA_PATH.
# Fri Feb 24 08:59:35 2023: Trying to guess JAVA_PATH from location of java executable that is available in your $PATH
# Fri Feb 24 08:59:35 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $JAVA_PATH.
# Fri Feb 24 08:59:35 2023: Success! Setting $JAVA_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $GUSHR_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $GUSHR_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess GUSHR_PATH from location of gushr.py executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $GUSHR_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $GUSHR_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $GENEMARK_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $GENEMARK_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess GENEMARK_PATH from location of gmetp.pl executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin as potential path for $GENEMARK_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $GENEMARK_PATH to /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $BAMTOOLS_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $BAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess BAMTOOLS_PATH from location of bamtools executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $BAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $BAMTOOLS_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $SAMTOOLS_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $SAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess SAMTOOLS_PATH from location of samtools executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools as potential path for $SAMTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $SAMTOOLS_PATH to /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools!
# Fri Feb 24 08:59:36 2023: Trying to set $DIAMOND_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $DIAMOND_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess DIAMOND_PATH from location of diamond executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools as potential path for $DIAMOND_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $DIAMOND_PATH to /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/tools!
# Fri Feb 24 08:59:36 2023: Trying to set $PROTHINT_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $PROTHINT_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess PROTHINT_PATH from location of prothint.py executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/braker2/deps/prothint/ProtHint-2.6.0/bin as potential path for $PROTHINT_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $PROTHINT_PATH to /central/groups/carnegie_poc/jurban/software/braker2/deps/prothint/ProtHint-2.6.0/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $TSEBRA_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $TSEBRA_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess TSEBRA_PATH from location of tsebra.py executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $TSEBRA_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $TSEBRA_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
# Fri Feb 24 08:59:36 2023: Trying to set $CDBTOOLS_PATH...
# Fri Feb 24 08:59:36 2023: Did not find environment variable $CDBTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Trying to guess CDBTOOLS_PATH from location of cdbfasta executable that is available in your $PATH
# Fri Feb 24 08:59:36 2023: Checking /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin as potential path for $CDBTOOLS_PATH.
# Fri Feb 24 08:59:36 2023: Success! Setting $CDBTOOLS_PATH to /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin!
#*********
# IMPORTANT INFORMATION: no species for identifying the AUGUSTUS  parameter set that will arise from this BRAKER run was set. BRAKER will create an AUGUSTUS parameter set with name Sp_1. This parameter set can be used for future BRAKER/AUGUSTUS prediction runs for the same species. It is usually not necessary to retrain AUGUSTUS with novel extrinsic data if a high quality parameter set already exists.
#*********
#**********************************************************************************
#                               CREATING DIRECTORY STRUCTURE                       
#**********************************************************************************
# Fri Feb 24 08:59:38 2023: creating file that contains citations for this BRAKER run at /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/what-to-cite.txt...
# Fri Feb 24 08:59:38 2023: create working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP.
mkdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP
# Fri Feb 24 08:59:38 2023: create working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/species
mkdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/species
# Fri Feb 24 08:59:38 2023: create working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors
mkdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors
# Fri Feb 24 08:59:38 2023: changing into working directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3
cd /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3
# Fri Feb 24 08:59:38 2023: getting GC content of the genome
/central/groups/carnegie_poc/jurban/software/braker2/braker3/BRAKER/scripts/get_gc_content.py --sequences /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/longest.fa.masked --print_sequence_length 1> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/gc_content.out 2> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/gc_content.stderr
# Fri Feb 24 08:59:40 2023: Creating parameter template files for AUGUSTUS with new_species.pl
# Fri Feb 24 08:59:40 2023: new_species.pl will create parameter files for species Sp_1 in /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config//species/Sp_1
/central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/new_species.pl --species=Sp_1 --AUGUSTUS_CONFIG_PATH=/central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/config/ 1> /dev/null 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/new_species.stderr
# Fri Feb 24 08:59:40 2023: check_fasta_headers(): Checking fasta headers of file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/longest.fa.masked
# Fri Feb 24 08:59:40 2023: check_fasta_headers(): Checking fasta headers of file /central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta
# Fri Feb 24 08:59:40 2023: Assuming that this is not a DNA fasta file because other characters than A, T, G, C, N, a, t, g, c, n were contained. If this is supposed to be a DNA fasta file, check the content of your file! If this is supposed to be a protein fasta file, please ignore this message!
#*********
# WARNING: Detected whitespace in fasta header of file /central/groups/carnegie_poc/jurban/software/braker2/protein/gfas1-and-hexacorallia-and-metazoan-proteins-orthoDB.fasta. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
# Fri Feb 24 08:59:44 2023: Assuming that this is not a protein fasta file because other characters than AaRrNnDdCcEeQqGgHhIiLlKkMmFfPpSsTtWwYyVvBbZzJjOoUuXx were contained. If this is supposed to be DNA fasta file, please ignore this message.
#**********************************************************************************
#                               PROCESSING HINTS                                   
#**********************************************************************************
#**********************************************************************************
#                              RUNNING GENEMARK-EX                                 
#**********************************************************************************
# Fri Feb 24 09:00:15 2023: Preparing genemark_evidence file hints from manual hints...
# Fri Feb 24 09:00:15 2023: Running GeneMark-ETP
# Fri Feb 24 09:00:15 2023: changing into GeneMark-ETP directory /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP
cd /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP
# Fri Feb 24 09:00:16 2023: sorting RNA-Seq BAM files
samtools sort /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/forward.bam -o /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/forward.bam 1> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.forward.stdout 2> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.forward.stderr
samtools sort /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/data/toy/reverse.bam -o /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/reverse.bam 1> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.reverse.stdout 2> /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/samtools.sort.reverse.stderr
# Fri Feb 24 09:00:32 2023: Running gmetp.pl
/central/groups/carnegie_poc/jurban/software/conda/anaconda3/envs/braker3-deps2/bin/perl /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl --cfg /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_config.yaml --workdir /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP --bam /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/etp_data/ --cores 16 --softmask 1>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stdout 2>/central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/errors/GeneMark-ETP.stderr

------------------------------

>> **This is from GeneMark-ETP.stderr.**

FASTA index file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/data/genome.softmasked.fasta.fai created. error, file not found: option --f1 complete.gtf error on open file complete.id: No such file or directory mv: cannot stat ‘complete_uniq.gtf’: No such file or directory error on open file /central/groups/carnegie_poc/jurban/data/coral/scratch/toyanno/braker3/braker3/GeneMark-ETP/rnaseq/hints/proteins.fa/complete.gtf: No such file or directory error on create_regions.pl at /central/groups/carnegie_poc/jurban/software/braker2/braker3/deps/genemark-etp/GeneMark-ETP/bin/gmetp.pl line 2162.


------------------------------

------------------------------

**Third, here is how I installed it.**

------------------------------

> First, I installed dependencies with Mamba (conda) using a YML file.

mamba env create -f braker3-deps.yml

I will copy/paste the `braker3-deps.yml` at the very bottom.

> Second, I installed GeneMark-ETP via git clone.

git clone https://github.com/gatech-genemark/GeneMark-ETP.git


>Third, I cloned BRAKER and checked out the braker3 branch.

git clone https://github.com/Gaius-Augustus/BRAKER.git cd BRAKER git checkout braker3


> Fourth, the run evironment is set by:

conda activate braker3-deps2 export PATH=${BRAKER3}:${GENEMARK_ETP_BIN}:${GENEMARK_ETP_TOOLS}:${PROTHINT2}:${PATH}


---------------------------------------
---------------------------------------
---------------------------------------
**YML File**

name: braker3-deps2 channels:

NOTE: This conda environment was originally created on a separate system this way:

mamba create -n braker3-deps2 -c bioconda braker2 hisat2 stringtie bedtools sra-tools gffread
conda activate braker3-deps2
mamba install -c eumetsat perl-yaml-xs
mamba install -c conda-forge openjdk=8

And YML file obtained by:

conda env export > braker3-deps2.yml
JohnUrban commented 1 year ago

Many thanks @Thamos - I had a feeling certain flags were missing, so I messed around yesterday with --outSAMattributes All for STAR. I haven't had a chance yet to see how that run went. Nonetheless, it looks like @alexlomsadze figured it out over in another issue: https://github.com/Gaius-Augustus/BRAKER/issues/582

STAR should be run with --outSAMstrandField intronMotif for compatibility with StringTie.

I will run my own tests on that later today.