Closed gchevignon closed 1 year ago
Hi,
could you compare the predictions from Augustus located at Augustus/augustus.hints.gtf
from your runs? If they contain introns, there may be an issue with the hints generated from your protein data. Can you ensure that the hintsfile.gff
is not empty?
Best, Lars
Hi Lars Thanks for your reply ! My Augustus/augustus.hints.gtf do not contain intron and the hintsfile.gff is not empty. Best
Germain
This is very unusual. Can you provide the version of BRAKER you're using and the command line you executed for running BRAKER?
Best, Lars
This Braker3 v3.0.3
Here is the PBS command:
. /etc/profile.d/modules.sh module purge module load singularity/3.4.1
BRAKER_SIF=/appli/bioinfo/braker3/3.0.3/singularity/braker3.sif
GENOME=Genome_polished.fasta
OUTDIR=/operation/14-braker3-6/
BAM=Genome_polished.bam.sort.bam
PROT=refseq_db_Euk.faa
singularity run -B $DATAWORK ${BRAKER_SIF} braker.pl --species=BonamiaOstreae --genome=${GENOME} --bam=${BAM} --threads ${NCPUS} --workingdir=${OUTDIR} --prot_seq=${PROT} --gff3 >& ${OUTDIR}braker3.e${PBS_JOBID}_Bo 2>&1
Thank you for the details. It seems like something went wrong during training of the gene finders. I assume the Genemark-ETP prediciton (GeneMark-ETP/genemark.gtf
) has also no introns, did GeneMark-ETP produce any error message (errors/GeneMark-ETP.stderr)?
My suspicion is that there might have gone wrong during the GeneMark-ETP step. Does your BAM file conform with the required format (here)? Alternatively, would it be possible for you to run the pipeline once more using the original FASTQ files?
Indeed there is an error at the GeneMark-ETP: My BAM should be OK ... I'll re-run with FASTQ Best
FASTA index file /home/datawork-sgmm-nos/DATA_MINION_LGPMM/ONT_Parasites/Bonamia/Purification/operation/14-braker3-2/GeneMark-ETP/data/genome.softmasked.fasta.fai created.
Use of uninitialized value $ph1 in addition (+) at /opt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph0 in addition (+) at /opt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph2 in addition (+) at /opt/ETP/bin/gmes/parse_set.pl line 205.
Use of uninitialized value $ph0 in division (/) at /opt/ETP/bin/gmes/parse_set.pl line 208.
Illegal division by zero at /opt/ETP/bin/gmes/parse_set.pl line 208.
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GT.mat: No such file or directory
cat: AG.mat: No such file or directory
cat: GC.mat: No such file or directory
cat: GC.mat: No such file or directory
cat: GC.mat: No such file or directory
Illegal division by zero at /opt/ETP/bin/train_super.pl line 184.
error, file not found: option --f1 prothint/prothint.gff
grep: prothint/evidence.gff: No such file or directory
grep: prothint/evidence.gff: No such file or directory
Traceback (most recent call last):
File "/opt/ETP/bin/printRnaAlternatives.py", line 353, in
In my experience, this usually indicates that the BAM file may not be formatted correctly or there's insufficient RNA-Seq data for GeneMark-ETP. That said, the issue here is not entirely clear. Rerunning with the FASTQ files could help.
OK I'll re-run with fastq ! but is the problem could be that I have too much data in my RNAseq (I have around 40gb IIllumina RNAseq data where my ref genome is only 12mb long ...) ?
So I have rerunned with fastq files and it works. Thanks a lot for your help ! Best
Hello,
There is something I don't understand : why with RNAseq only I got lot of predicted genes with introns whereas when I add protein from other species from the same supergroup or even from eukaryote odb10 there are no more intron in the predicted genes ... I know the specie I work on should possess intron .
You'll find both type of predictions in the screenshot.
Best
Germain
Braker3_example.pdf