Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

Merging exons to genes #795

Open RonOLab opened 2 months ago

RonOLab commented 2 months ago

Hi, I run BRAKER3 to annotate a new reference genome I had assembled. I used proteome from the closest species I found as a training set. BRAKER call was braker.pl --gff3 --threads=12 --species=Ver --genome=Ver_masked_genome_v1.fasta --prot_seq=Psa_v1a_prot.fa I got approximately 180K genes. The same training set was applied to Maker and I got ~ 34K, which is more reasonable number, and the number of exons was 144K. I realize that BRAKER simply did not merged exons to genes. Do you have any idea where to look to run BRAKER to merge the exons to genes? Thanks, Ron

KatharinaHoff commented 2 months ago

There is not BRAKER solution to this.

RonOLab commented 2 months ago

There is not BRAKER solution to this.

Yet, I believe that 180K is a problematic result. You may agree with my assessment that what I got here are exons and not full genes. In previous runs of BRAKER on other genomes, I got reasonable results. So what went wrong here? Is it the training set? Maybe the evolutionary distance of the training set, which is too close or too distant? Is it the absence of the argument of running GenomeThreader that was in BRAKER2 but removed in BRAKER3? I'm interested in your opinion on what went wrong, even if you don't think that the solution lies within the BRAKER3 pipeline.

KatharinaHoff commented 2 months ago

I agree that 180K gene models are most likely a bit excessive. But the reasons can vary. There's no one-for-all answer, particularly not without deeply diving into your data.

I can only recommend visualizing gene models in context with the hints in a genome browser. Often, one then spots systematic problems.

One can also reciprocally BLAST the proteins to identify excessive repeats (revealing undermasking).