Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
350 stars 79 forks source link

Issue with filterGenesIn_mRNAname.pl when running BRAKER with singularity #828

Open cvargas88 opened 4 months ago

cvargas88 commented 4 months ago

Hi! Thank you for developing such a useful tool! I am running BRAKER using singularity, however the pipeline was interrupted when filtering train.gb for "good" mRNAs. The error at that point was the following:

Sat May 18 23:29:42 2024: Genbank format file LEPN/braker/train.gb contains 9181 genes. \# Sat May 18 23:29:42 2024: Filtering train.gb for "good" mRNAs: /usr/bin/perl miniconda3/bin/filterGenesIn_mRNAname.pl LEPN/braker/traingenes.gtf LEPN/braker/train.gb > LEPN/braker/train.f.gb 2>LEPN/braker/errors/filterGenesIn_mRNAname.stderr \# Sat May 18 23:29:44 2024: Genbank format file LEPN/braker/train.f.gb contains 0 genes. \# Sat May 18 23:29:44 2024: ERROR: in file /opt/BRAKER/scripts/braker.pl at line 6249 \# Training gene file in genbank format LEPN/braker/train.f.gb does not contain any training genes. Possible known causes: \# (a) The AUGUSTUS script filterGenesIn_mRNAname.pl is not up-to-date with this version of BRAKER. To solve this issue, either get the latest AUGUSTUS from its master branch with git clone git@github.com:Gaius-Augustus/Augustus.git or download the latest version of filterGenesIn_mRNAname.pl from https://github.com/Gaius-Augustus/Augustus/blob/master/scripts/filterGenesIn_mRNAname.pl and replace the old script in your AUGUSTUS installation folder. \# (b) No training genes with sufficient extrinsic evidence support or of sufficient length were produced by GeneMark-EX. If you think this is the cause for your problem, consider running BRAKER with different evidence or without any evidence (--esmode) for training.

I have checked and the version seems to be 20.02.2018, so I believe that is not the issue. However, the gb file contains 9181 genes and I have a large amount of RNA-seq data so I don't believe that there are no genes with evidence support. How could I verify if the issue is with this file? Thanks a lot!

KatharinaHoff commented 4 months ago

If you provided BAM files as input, it is possible that you ran an aligner that does not perform spliced alignment. That would lead to this. But it's a guess, there's to little information to confirm.

cvargas88 commented 4 months ago

Dear Katharina, Thank you very much for your prompt reply. I provided several paired-end libraries so the alignments were performed by the pipeline. I believe that the issue is with the filterGenesIn_mRNAname.pl script. It seems that the version I was using had the following condition: if ( $_ =~ m/transcript_id \"(.*)\"/ ) {

While the version in the github has the following: if ( $_ =~ m/transcript_id \"([^"]*)\"/ ) {

I tried with the version in the github and indeed it produces the gb file. I will try changing it and seeing if I can relaunch it. Thanks a lot!

KatharinaHoff commented 4 months ago

I think you are not using a very recent image. Did you build the container, yourself, or did you pull from dockerhub? For a while, we installed AUGUSTUS from debian in the container (for convenience). However, that does come with outdated scripts. I changed it a while ago that we clone from github... if you build the image from our docker repository, this should not happen.