Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

augustus.hints.gtf has 25k genes more than braker.gtf #813

Open smallfishcui opened 2 months ago

smallfishcui commented 2 months ago

Hi,

I was running braker3 with etp mode, and get the final braker.gtf file. Compared to the Augustus/augustus.hints.gtf which predicted 68857 genes, the final braker.gtf preserved only 39896 genes. The final genes seem to be all of good quality, however, a large proportion of potential genes were filtered out in the TSEBRA merging step, even though some of them looks okay, see the igv pic.

Screenshot 2024-04-24 at 14 44 38

My question is, if I would like to use the augustus.hints.gtf and genemark.gtf instead of the final braker.gtf file in the evidencemodler, is it feasible? Can I still use the Augustus script gtf2gff.pl for the conversion?

Thanks, Cui

KatharinaHoff commented 2 months ago

We generally do not advise to use EvidenceModeler with BRAKER output and evidence since EvidenceModeler largely relies on coverage information while BRAKER prioritizes junction information. This is described in the TSEBRA publication.

68857 sounds a bit excessive. You never know. Use the gene set that you think according to your criteria is the best gene set. We have tried to develop our workflows in a way that likely false positives are removed whenever possible. But maybe you are more focused on sensitivity.