Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
347 stars 79 forks source link

How to evaluate the predict result? #741

Open sunnycqcn opened 7 months ago

sunnycqcn commented 7 months ago

Hello Developers, I predicted the plant genome and obtained about 48,000 and 70,000 genes using braker and funannotation pipeline RNA-seq and protein, respectively. Other people did another genome with the same species about 60,000 genes. I used BUSCO to evaluate the complete genome using protein sequence. The results are almost the same, which is 96%. I am confused to choose which one is the best. Thanks, Fuyou

KatharinaHoff commented 6 months ago

BUSCO is only a sensitivity measure for a relatively small number of core genes of a clade. You can try OMArk to have a larger marker gene set, however, the problem remains, it's only a sensitivity measure.

You can do all kinds of other things... for example, add functional annotation and compare how many transcripts have a reasonable functional annotation. Use a genome browser and browse the gene structures in context with evidence, sample some loci, compare what gene set is more convincing, etc.