Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
360 stars 80 forks source link

Protein-based annotation not working properly #628

Open cerd9235 opened 1 year ago

cerd9235 commented 1 year ago

Hello,

I have been trying to run BRAKER for a while and I am running into some problems relating to protein-based annotation. I successfully ran BRAKER with rna-seq only, but when I run protein only or rna-seq + protein BRAKER stops at the prothint file creation steps but creates no error file.

Here is the log file:

**

BRAKER CONFIGURATION

**

BRAKER CALL: /opt/BRAKER/scripts/braker.pl --genome=Cuscuta_gronovii.chrm.asm.softmasked.fasta --prot_seq=combined_proteins.dedup.faa --threads 40 --AUGUSTUS_ab_initio --species=cuscuta_gronovii

Mon Apr 24 15:44:42 2023: braker.pl version 3.0.2

Mon Apr 24 15:44:42 2023: Only Protein input detected, BRAKER will be executed in EP mode (BRAKER2).

Mon Apr 24 15:44:42 2023: Configuring of BRAKER for using external tools...

Mon Apr 24 15:44:42 2023: Trying to set $AUGUSTUS_CONFIG_PATH...

Mon Apr 24 15:44:42 2023: Found environment variable $AUGUSTUS_CONFIG_PATH.

Mon Apr 24 15:44:42 2023: Checking /usr/share/augustus/config/ as potential path for $AUGUSTUS_CONFIG_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $AUGUSTUS_CONFIG_PATH to /usr/share/augustus/config/!

Mon Apr 24 15:44:42 2023: Trying to set $AUGUSTUS_BIN_PATH...

Mon Apr 24 15:44:42 2023: Found environment variable $AUGUSTUS_BIN_PATH.

Mon Apr 24 15:44:42 2023: Checking /usr/bin/ as potential path for $AUGUSTUS_BIN_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $AUGUSTUS_BIN_PATH to /usr/bin/!

Mon Apr 24 15:44:42 2023: Trying to set $AUGUSTUS_SCRIPTS_PATH...

Mon Apr 24 15:44:42 2023: Found environment variable $AUGUSTUS_SCRIPTS_PATH.

Mon Apr 24 15:44:42 2023: Checking /usr/share/augustus/scripts/ as potential path for $AUGUSTUS_SCRIPTS_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $AUGUSTUS_SCRIPTS_PATH to /usr/share/augustus/scripts/!

Mon Apr 24 15:44:42 2023: Trying to set $PYTHON3_PATH...

Mon Apr 24 15:44:42 2023: Did not find environment variable $PYTHON3_PATH.

Mon Apr 24 15:44:42 2023: Trying to guess PYTHON3_PATH from location of python3 executable that is available in your $PATH

Mon Apr 24 15:44:42 2023: Checking /opt/conda/bin as potential path for $PYTHON3_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $PYTHON3_PATH to /opt/conda/bin!

Mon Apr 24 15:44:42 2023: Trying to set $GENEMARK_PATH...

Mon Apr 24 15:44:42 2023: Found environment variable $GENEMARK_PATH.

Mon Apr 24 15:44:42 2023: Checking /opt/ETP/bin as potential path for $GENEMARK_PATH.

*****

WARNING: Couldn't find gmes_petap.pl in /opt/ETP/bin. Will not set $GENEMARK_PATH to /opt/ETP/bin!

*****

Mon Apr 24 15:44:42 2023: Checking /opt/ETP/bin/gmes/ as potential path for $GENEMARK_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $GENEMARK_PATH to /opt/ETP/bin/gmes/!

Mon Apr 24 15:44:42 2023: Trying to set $DIAMOND_PATH...

Mon Apr 24 15:44:42 2023: Did not find environment variable $DIAMOND_PATH.

Mon Apr 24 15:44:42 2023: Trying to guess DIAMOND_PATH from location of diamond executable that is available in your $PATH

Mon Apr 24 15:44:42 2023: Checking /opt/ETP/tools as potential path for $DIAMOND_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $DIAMOND_PATH to /opt/ETP/tools!

Mon Apr 24 15:44:42 2023: Trying to set $PROTHINT_PATH...

Mon Apr 24 15:44:42 2023: Did not find environment variable $PROTHINT_PATH.

Mon Apr 24 15:44:42 2023: Trying to guess PROTHINT_PATH from location of prothint.py executable that is available in your $PATH

Mon Apr 24 15:44:42 2023: Checking /opt/ETP/bin/gmes/ProtHint/bin as potential path for $PROTHINT_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $PROTHINT_PATH to /opt/ETP/bin/gmes/ProtHint/bin!

Mon Apr 24 15:44:42 2023: Trying to set $TSEBRA_PATH...

Mon Apr 24 15:44:42 2023: Did not find environment variable $TSEBRA_PATH.

Mon Apr 24 15:44:42 2023: Trying to guess TSEBRA_PATH from location of tsebra.py executable that is available in your $PATH

Mon Apr 24 15:44:42 2023: Checking /opt/TSEBRA/bin as potential path for $TSEBRA_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $TSEBRA_PATH to /opt/TSEBRA/bin!

Mon Apr 24 15:44:42 2023: Trying to set $CDBTOOLS_PATH...

Mon Apr 24 15:44:42 2023: Did not find environment variable $CDBTOOLS_PATH.

Mon Apr 24 15:44:42 2023: Trying to guess CDBTOOLS_PATH from location of cdbfasta executable that is available in your $PATH

Mon Apr 24 15:44:42 2023: Checking /opt/cdbfasta as potential path for $CDBTOOLS_PATH.

Mon Apr 24 15:44:42 2023: Success! Setting $CDBTOOLS_PATH to /opt/cdbfasta!

Mon Apr 24 15:44:42 2023: BRAKER will execute GeneMark-EP for training GeneMark and generating a training gene set for AUGUSTUS, using protein information as sole extrinsic evidence source.

**

CREATING DIRECTORY STRUCTURE

**

Mon Apr 24 15:44:42 2023: create working directory /home/jovyan/braker.

mkdir /home/jovyan/braker

Mon Apr 24 15:44:42 2023: creating file that contains citations for this BRAKER run at /home/jovyan/braker/what-to-cite.txt...

Mon Apr 24 15:44:42 2023: create working directory /home/jovyan/braker/GeneMark-EP.

mkdir /home/jovyan/braker/GeneMark-EP

Mon Apr 24 15:44:42 2023: create working directory /home/jovyan/braker/GeneMark-ES.

mkdir /home/jovyan/braker/GeneMark-ES

Mon Apr 24 15:44:42 2023: create working directory /home/jovyan/braker/species

mkdir /home/jovyan/braker/species

Mon Apr 24 15:44:42 2023: create working directory /home/jovyan/braker/errors

mkdir /home/jovyan/braker/errors

Mon Apr 24 15:44:42 2023: changing into working directory /home/jovyan/braker

cd /home/jovyan/braker

Mon Apr 24 15:44:42 2023: getting GC content of the genome

/opt/BRAKER/scripts/get_gc_content.py --sequences /home/jovyan/Cuscuta_gronovii.chrm.asm.softmasked.fasta --print_sequence_length 1> /home/jovyan/braker/gc_content.out 2> /home/jovyan/braker/errors/gc_content.stderr

Mon Apr 24 15:46:44 2023: Creating parameter template files for AUGUSTUS with new_species.pl

Mon Apr 24 15:46:44 2023: new_species.pl will create parameter files for species cuscuta_gronovii in /usr/share/augustus/config//species/cuscuta_gronovii

/usr/bin/perl /usr/share/augustus/scripts/new_species.pl --species=cuscuta_gronovii --AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config/ 1> /dev/null 2>/home/jovyan/braker/errors/new_species.stderr

Mon Apr 24 15:46:44 2023: check_fasta_headers(): Checking fasta headers of file /home/jovyan/Cuscuta_gronovii.chrm.asm.softmasked.fasta

Mon Apr 24 15:47:33 2023: check_fasta_headers(): Checking fasta headers of file /home/jovyan/combined_proteins.dedup.faa

Mon Apr 24 15:47:33 2023: Assuming that this is not a DNA fasta file because other characters than A, T, G, C, N, a, t, g, c, n were contained. If this is supposed to be a DNA fasta file, check the content of your file! If this is suppo

sed to be a protein fasta file, please ignore this message!

**

PROCESSING HINTS

**

Mon Apr 24 15:47:46 2023: Running ProtHint to produce hints from protein sequence file (this may take a couple of hours)...

Mon Apr 24 15:47:46 2023: Running Genemark-ES for ProtHint...

Mon Apr 24 15:47:46 2023: Executing GeneMark-ES

Mon Apr 24 15:47:46 2023: changing into GeneMark-ES directory /home/jovyan/braker/GeneMark-ES

cd /home/jovyan/braker/GeneMark-ES

Mon Apr 24 15:47:46 2023: Executing gmes_petap.pl

/usr/bin/perl /opt/ETP/bin/gmes//gmes_petap.pl --verbose --cores=40 --ES --gc_donor 0.001 --sequence=/home/jovyan/braker/genome.fa --soft_mask auto 1>/home/jovyan/braker/GeneMark-ES.stdout 2>/home/jovyan/braker/errors/GeneMark-ES.stderr

Mon Apr 24 16:56:53 2023: change to working directory /home/jovyan/braker

cd /home/jovyan/braker

Mon Apr 24 16:56:53 2023: Calling prothint.py...

Mon Apr 24 16:56:53 2023: starting prothint.py

/opt/ETP/bin/gmes/ProtHint/bin/prothint.py --threads=40 --geneMarkGtf /home/jovyan/braker/GeneMark-ES/genemark.gtf /home/jovyan/braker/genome.fa /home/jovyan/braker/proteins.fa

Mon Apr 24 16:58:05 2023: Appending hints from /home/jovyan/braker/prothint_augustus.gff to /home/jovyan/braker/hintsfile.gff

Mon Apr 24 16:58:05 2023: Generating hints with ProtHint finished.

Mon Apr 24 16:58:05 2023: Preparing hints for running GeneMark

Mon Apr 24 16:58:05 2023: Filtering intron hints for GeneMark from /home/jovyan/braker/hintsfile.gff...

mv /home/jovyan/braker/genemark_hintsfile.gff.prot.tmp /home/jovyan/braker/genemark_hintsfile.gff

The following files are empty: evidence.gff genemark_hintsfile.gff.prot genemark_hintsfile.gff.rnaseq prothint.gff

The following files only have two entries in them: prothint_augustus.gff nuc.fasta top_chains.gff hintsfile.gff diamond/diamond.out

I should mention that i am using Docker image for this. I introduced the gmes key directly into the container to run it. I am not sure what could be causing this issue. Could it be something related to the input files?

KatharinaHoff commented 1 year ago

It looks like you did not provide sufficient input data. What ind of data did you use?

cerd9235 commented 1 year ago

I used orthodb with 596590 sequences. I should mentioned that I ran it again with an unmasked genome and it looks like it ran, however the busco scores are off. For protein only: 10% complete buscos, for rna only: 0% buscos and for both prot + rna: 92% buscos.