Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
350 stars 79 forks source link

GeneMark-ETP stderr output #744

Open mengyuan09876 opened 8 months ago

mengyuan09876 commented 8 months ago

Hi, I runed braker.pl --genome ./genome/soft_genome.fa --prot_seq ./protein/homo_pr.fa --bam A50S51_sorted_byname.bam,A51S52_sorted_byname.bam,A52S53_sorted_byname.bam,A53S54_sorted_byname.bam --threads 20 --skipOptimize --gff3 , then I got these error: FASTA index file /media/stefano/superM/Tundo_RNA/filtered_reads/for_braker/braker/GeneMark-ETP/data/genome.softmasked.fasta.fai created. Use of uninitialized value $ph1 in addition (+) at /home/stefano/programs/anaconda3/envs/brakerenv/GeneMark-ETP/bin/gmes/parse_set.pl line 205. Use of uninitialized value $ph0 in addition (+) at /home/stefano/programs/anaconda3/envs/brakerenv/GeneMark-ETP/bin/gmes/parse_set.pl line 205. Use of uninitialized value $ph2 in addition (+) at /home/stefano/programs/anaconda3/envs/brakerenv/GeneMark-ETP/bin/gmes/parse_set.pl line 205. Use of uninitialized value $ph0 in division (/) at /home/stefano/programs/anaconda3/envs/brakerenv/GeneMark-ETP/bin/gmes/parse_set.pl line 208. Illegal division by zero at /home/stefano/programs/anaconda3/envs/brakerenv/GeneMark-ETP/bin/gmes/parse_set.pl line 208. Illegal division by zero at /home/stefano/programs/anaconda3/envs/brakerenv/GeneMark-ETP/bin/train_super.pl line 184. error, file/folder not found: /media/stefano/superM/Tundo_RNA/filtered_reads/for_braker/braker/GeneMark-ETP/proteins.fa/model/output.mod Thanks

atengertrolander commented 8 months ago

Hi, I also got this error message on my most recent run of braker3 (actually exactly the same as above). Interestingly, I have successfully run BRAKER3 on a different genome with no problems.

The genome for which I received this error is a bit smaller (392 vs 500 MB) and has more contigs (308 vs 18). I also have more RNAseq data (86G versus 20G) for the run that did not work. The braker3 error report said to check the GeneMark-ETP.stderr output but that, "The most common problem is that GeneMark-ETP didn't receive enough evidence from the input data" which seems unlikely to me because I have more RNAseq data for this run than the previous run which worked. As I mentioned above, the GeneMark-ETP.stderr message is identical to mengyuan09876.

But unlike my first run, the gmst and hisat folders in /GeneMark-ETP/rnaseq are still present. According to the log files (filter_gmst.log and prothint_gmst.log) GMST filtering and classification and prothint analysis completed without issues. In the GeneMark-ETP/proteins.fa folder the genemark.gtf files are missing as well as the penalty/ and cds/ folders. The proteins.fa file is identical to my previous run as the species are close relatives.

Any ideas for what might be going wrong would be appreciated! Thanks!

atengertrolander commented 8 months ago

I realized I completely forgot to add the --outSAMstrandField intronMotif to my STAR run.

From braker3 github"Please note that we generally assume that bam files were generated with HiSat2 because that is the aligner that would also be executed by BRAKER3 with fastq input. If you want for some reason to generate the bam files with STAR, use the option --outSAMstrandField intronMotif of STAR to produce files that are compatible wiht StringTie in BRAKER3."

Hope this helps other people who forgot the same thing!

KatharinaHoff commented 7 months ago

I currently assume that @mengyuan09876 's problem is still open and hope that @alexlomsadze will address this.