Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
350 stars 79 forks source link

ERROR in file /opt/BRAKER/scripts/braker.pl at line 3818 #756

Open LliliansCalvo opened 7 months ago

LliliansCalvo commented 7 months ago

Hi Im running braker on the cluster using both protein data and RNA-seq data. For the RNA-seq data we have around 300 different individuals, so I am randomly selecting only 50 individuals to run braker with.I keep getting this error that i dont really know how to troubleshoot. So hope someone can help.

#!/bin/bash
#SBATCH --job-name=Braker_only_50
#SBATCH --mail-type ALL
#SBATCH --export=NONE
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem-per-cpu=5G
#SBATCH -t 2-23:00:00

module load  gcc/10.4.0
module load genemark-et/4.69
module load gcc singularity
export SINGULARITY_BIND="/scratch,/work,/users"

# Set the path to your Singularity image
export SINGULARITY_BIND="/scratch,/work,/users"
SINGULARITY_IMAGE=/work/FAC/FBM/braker3.sif

# Set the path to your genome file
GENOME_FILE=/work/FAC/FBM/mod_GCA_030586385.1_ASM3058638v1_genomic.fa

# Set the path to your RNA-Seq files directory
RNA_SEQ_DIR=/scratch/lcalvogo/Braker/fastq

# Find all forward reads in the RNA-Seq directory
forward_reads=($(find ${RNA_SEQ_DIR} -name "*_1.fastq"))

# Shuffle the forward reads and select 50 random reads
shuf -n 50 -e "${forward_reads[@]}" | while read FORWARD_READ; do
    # Extract the sample ID from the forward read without the _1.fastq suffix
    SAMPLE_ID=$(basename ${FORWARD_READ%%_*})

    # Construct the path to the reverse read without the _1.fastq suffix
    REVERSE_READ=${RNA_SEQ_DIR}/${SAMPLE_ID}_2.fastq

    # Print information for debugging
    echo "Sample ID: ${SAMPLE_ID}"
    echo "Forward Read: ${FORWARD_READ}"
    echo "Reverse Read: ${REVERSE_READ}"

 **# Execute the Braker command for each sample
    singularity exec ${SINGULARITY_IMAGE} braker.pl --species=Cfellah --useexisting  --gff3  --genome=${GENOME_FILE} --rnaseq_sets_ids=${SAMPLE_ID} --rnaseq_sets_dirs=${FORWARD_READ},${REVERSE_READ} --prot_seq=/scratch/lcalvogo/Braker/proteins.fasta --softmasking  --threads 8 --gm_max_intergenic 10000 --skipOptimize
done**

ERROR

#**********************************************************************************
#                               BRAKER CONFIGURATION                               
#**********************************************************************************
# BRAKER CALL: /opt/BRAKER/scripts/braker.pl --species=Cfellah --useexisting --gff3 --genome=/work/FAC/FBM/mod_GCA_030586385.1_ASM3058638v1_genomic.fa --rnaseq_sets_ids=SRR24635227 --rnaseq_sets_dirs=/scratch/lcalvogo/Braker/fastq/SRR24635227_1.fastq,/scratch/lcalvogo/Braker/fastq/SRR24635227_2.fastq --prot_seq=/scratch/lcalvogo/Braker/proteins.fasta --softmasking --threads 8 --gm_max_intergenic 10000 --skipOptimize
# Thu Feb  8 14:26:26 2024: braker.pl version 3.0.6
# Thu Feb  8 14:26:26 2024:Both protein and RNA-Seq data in input detected. BRAKER will be executed in ETP mode (BRAKER3).
#*********
# Thu Feb  8 14:26:26 2024: Configuring of BRAKER for using external tools...
# Thu Feb  8 14:26:26 2024: Searching for local files of RNA-Seq sets in /scratch/lcalvogo/Braker/fastq/SRR24635227_1.fastq, /scratch/lcalvogo/Braker/fastq/SRR24635227_2.fastq ...
# Thu Feb  8 14:26:26 2024: Couldn't find local RNA-Seq library for SRR24635227, will try to download it from SRA later.
# Thu Feb  8 14:26:26 2024: Trying to set $AUGUSTUS_CONFIG_PATH...
# Thu Feb  8 14:26:26 2024: Found environment variable $AUGUSTUS_CONFIG_PATH.
# Thu Feb  8 14:26:26 2024: Checking /usr/share/augustus/config/ as potential path for $AUGUSTUS_CONFIG_PATH.
# Thu Feb  8 14:26:26 2024: Success! Setting $AUGUSTUS_CONFIG_PATH to /usr/share/augustus/config/!
**# Thu Feb  8 14:26:26 2024: WARNING: in file /opt/BRAKER/scripts/braker.pl at line 1894**
AUGUSTUS_CONFIG_PATH/species (in this case /usr/share/augustus/config//species) is not writeable. BRAKER will try to copy the AUGUSTUS config directory to a writeable location.
# Thu Feb  8 14:26:26 2024:Both protein and RNA-Seq data in input detected. BRAKER will be executed in ETP mode (BRAKER3).
#*********
# Thu Feb  8 14:26:27 2024: Log information is stored in file /work/FAC/FBM/braker.log
****ERROR in file /opt/BRAKER/scripts/braker.pl at line 3818**
Could not close output fasta file /work/FAC/FBM/genome.fa!**
Warning: unable to close filehandle properly: Disk quota exceeded during global destruction.
Sample ID: SRR24635220
Forward Read: /scratch/lcalvogo/Braker/fastq/SRR24635220_1.fastq
Reverse Read: /scratch/lcalvogo/Braker/fastq/SRR24635220_2.fastq
KatharinaHoff commented 7 months ago

This appears to be a problem with your local file quota (see bottom of the BRAKER log).

LliliansCalvo commented 7 months ago

That is true. Solved now. Thanks a lot !

yzliu01 commented 5 months ago

Hi Katharina,

I just tried Docker that you recommended to get gene annotation done but I still got errors in the terminal where I tested the singularity exec braker3.sif braker.pl . As you suggested for this post, "see bottom of the BRAKER log". I checked that but did not see any errors there. I have no idea what wrong is. Can you help figure it out? Thanks a lot.

My commands: singularity exec braker3.sif braker.pl --genome="$Andrena_marginata_softmask_simple_header_genome" --hints="$prothint_augustus_gff" --workingdir=$braker_output_dir --threads 4 --PROTHINT_PATH=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/sofwtare/gmetp_linux_64/bin/gmes/ProtHint/bin --GENEMARK_PATH=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/sofwtare/gmetp_linux_64/bin/gmes

Output in terminal

# Thu Apr 18 21:07:06 2024: Log information is stored in file /faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results/braker.log
ERROR in file /opt/BRAKER/scripts/braker.pl at line 6005
Failed to create new species with new_species.pl, check write permissions in /opt/Augustus/config//species directory! Command was /usr/bin/perl /opt/Augustus/scripts/new_species.pl --species=Sp_1 --AUGUSTUS_CONFIG_PATH=/opt/Augustus/config/ 1> /dev/null 2>/faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results/errors/new_species.stderr

Was it caused by write permissions in /opt/Augustus/config//species directory? I could not find this directory.

baker.log

#**********************************************************************************
#                               BRAKER CONFIGURATION                               
#**********************************************************************************
# BRAKER CALL: /opt/BRAKER/scripts/braker.pl --genome=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/data/ref_genome/Andrena_marginata_GCA_963932335.1-softmasked.simple_header.fa --hints=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/data/bee_proj_data/gene_annotation/prothint_results/prothint_augustus.gff --workingdir=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results --threads 4 --PROTHINT_PATH=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/sofwtare/gmetp_linux_64/bin/gmes/ProtHint/bin --GENEMARK_PATH=/home/yzliu/eDNA/faststorage/yzliu/DK_proj/sofwtare/gmetp_linux_64/bin/gmes
# Thu Apr 18 21:07:05 2024: braker.pl version 3.0.8
# Thu Apr 18 21:07:05 2024: Checking whether hints from RNA-Seq and/or proteins are present in hintsfile
# Thu Apr 18 21:07:05 2024: Only Protein input detected, BRAKER will be executed in EP mode (BRAKER2).
# Thu Apr 18 21:07:05 2024: Configuring of BRAKER for using external tools...
# Thu Apr 18 21:07:05 2024: Trying to set $AUGUSTUS_CONFIG_PATH...
# Thu Apr 18 21:07:05 2024: Found environment variable $AUGUSTUS_CONFIG_PATH.
# Thu Apr 18 21:07:05 2024: Checking /opt/Augustus/config/ as potential path for $AUGUSTUS_CONFIG_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $AUGUSTUS_CONFIG_PATH to /opt/Augustus/config/!
# Thu Apr 18 21:07:05 2024: Trying to set $AUGUSTUS_BIN_PATH...
# Thu Apr 18 21:07:05 2024: Found environment variable $AUGUSTUS_BIN_PATH.
# Thu Apr 18 21:07:05 2024: Checking /opt/Augustus/bin/ as potential path for $AUGUSTUS_BIN_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $AUGUSTUS_BIN_PATH to /opt/Augustus/bin/!
# Thu Apr 18 21:07:05 2024: Trying to set $AUGUSTUS_SCRIPTS_PATH...
# Thu Apr 18 21:07:05 2024: Found environment variable $AUGUSTUS_SCRIPTS_PATH.
# Thu Apr 18 21:07:05 2024: Checking /opt/Augustus/scripts/ as potential path for $AUGUSTUS_SCRIPTS_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $AUGUSTUS_SCRIPTS_PATH to /opt/Augustus/scripts/!
# Thu Apr 18 21:07:05 2024: Trying to set $PYTHON3_PATH...
# Thu Apr 18 21:07:05 2024: Did not find environment variable $PYTHON3_PATH.
# Thu Apr 18 21:07:05 2024: Trying to guess PYTHON3_PATH from location of python3 executable that is available in your $PATH
# Thu Apr 18 21:07:05 2024: Checking /opt/conda/bin as potential path for $PYTHON3_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $PYTHON3_PATH to /opt/conda/bin!
# Thu Apr 18 21:07:05 2024: Trying to set $GENEMARK_PATH...
# Thu Apr 18 21:07:05 2024: Found command line argument $GENEMARK_PATH.
# Thu Apr 18 21:07:05 2024: Checking /home/yzliu/eDNA/faststorage/yzliu/DK_proj/sofwtare/gmetp_linux_64/bin/gmes as potential path for $GENEMARK_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $GENEMARK_PATH to /home/yzliu/eDNA/faststorage/yzliu/DK_proj/sofwtare/gmetp_linux_64/bin/gmes!
# Thu Apr 18 21:07:05 2024: Trying to set $DIAMOND_PATH...
# Thu Apr 18 21:07:05 2024: Did not find environment variable $DIAMOND_PATH.
# Thu Apr 18 21:07:05 2024: Trying to guess DIAMOND_PATH from location of diamond executable that is available in your $PATH
# Thu Apr 18 21:07:05 2024: Checking /opt/ETP/tools as potential path for $DIAMOND_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $DIAMOND_PATH to /opt/ETP/tools!
# Thu Apr 18 21:07:05 2024: Trying to set $TSEBRA_PATH...
# Thu Apr 18 21:07:05 2024: Did not find environment variable $TSEBRA_PATH.
# Thu Apr 18 21:07:05 2024: Trying to guess TSEBRA_PATH from location of tsebra.py executable that is available in your $PATH
# Thu Apr 18 21:07:05 2024: Checking /opt/TSEBRA/bin as potential path for $TSEBRA_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $TSEBRA_PATH to /opt/TSEBRA/bin!
# Thu Apr 18 21:07:05 2024: Trying to set $CDBTOOLS_PATH...
# Thu Apr 18 21:07:05 2024: Did not find environment variable $CDBTOOLS_PATH.
# Thu Apr 18 21:07:05 2024: Trying to guess CDBTOOLS_PATH from location of cdbfasta executable that is available in your $PATH
# Thu Apr 18 21:07:05 2024: Checking /opt/cdbfasta as potential path for $CDBTOOLS_PATH.
# Thu Apr 18 21:07:05 2024: Success! Setting $CDBTOOLS_PATH to /opt/cdbfasta!
# Thu Apr 18 21:07:05 2024: Checking if input file /home/yzliu/eDNA/faststorage/yzliu/DK_proj/data/bee_proj_data/gene_annotation/prothint_results/prothint_augustus.gff is in gff format
# Thu Apr 18 21:07:06 2024: BRAKER will execute GeneMark-EP for training GeneMark and generating a training gene set for AUGUSTUS, using protein information as sole extrinsic evidence source.
#*********
# IMPORTANT INFORMATION: no species for identifying the AUGUSTUS  parameter set that will arise from this BRAKER run was set. BRAKER will create an AUGUSTUS parameter set with name Sp_1. This parameter set can be used for future BRAKER/AUGUSTUS prediction runs for the same species. It is usually not necessary to retrain AUGUSTUS with novel extrinsic data if a high quality parameter set already exists.
#*********
#**********************************************************************************
#                               CREATING DIRECTORY STRUCTURE                       
#**********************************************************************************
# Thu Apr 18 21:07:06 2024: creating file that contains citations for this BRAKER run at /faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results/what-to-cite.txt...
# Thu Apr 18 21:07:06 2024: changing into working directory /faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results
cd /faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results
# Thu Apr 18 21:07:06 2024: getting GC content of the genome
/opt/BRAKER/scripts/get_gc_content.py --sequences /home/yzliu/eDNA/faststorage/yzliu/DK_proj/data/ref_genome/Andrena_marginata_GCA_963932335.1-softmasked.simple_header.fa --print_sequence_length 1> /faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results/gc_content.out 2> /faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results/errors/gc_content.stderr
# Thu Apr 18 21:07:22 2024: Creating parameter template files for AUGUSTUS with new_species.pl
# Thu Apr 18 21:07:22 2024: new_species.pl will create parameter files for species Sp_1 in /opt/Augustus/config//species/Sp_1
/usr/bin/perl /opt/Augustus/scripts/new_species.pl --species=Sp_1 --AUGUSTUS_CONFIG_PATH=/opt/Augustus/config/ 1> /dev/null 2>/faststorage/project/eDNA/yzliu/DK_proj/data/bee_proj_data/gene_annotation/braker_results/errors/new_species.stderr