Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

Higher BUSCO in annotation than assembly? #822

Closed YingChen94 closed 1 month ago

YingChen94 commented 1 month ago

Hello, I used BRAKER3 to annotate my frog genome (4Gb) and I ran best_by_compleasm.py due to lower BUSCO genes in BRAKER output than GeneMark and Augustus. However, this leads to higher BUSCO in annotation than the assembly.

assembly: C:90.4%[S:89.1%,D:1.3%],F:2.6%,M:7.0%,n:5310

braker output (keep longest isoform per gene): C:90.2%[S:88.2%,D:2.0%],F:1.2%,M:8.6%,n:5310

best_by_compleasm.py output (keep longest isoform per gene): C:94.4%[S:92.2%,D:2.2%],F:1.5%,M:4.1%,n:5310

I am not sure how to interpret this. Does this mean the BUSCOs added by best_by_compleasm.py are not actually in my genome? Or is this an artefact from BUSCO assessments on the assembly? Any insights are appreciated!

KatharinaHoff commented 1 month ago

BUSCO (the tool) has 3 strategies to find BUSCOs (the marker genes) in a genome: miniprot, metaeuk, or AUGUSTUS-PPX. Each of them has a certain sensitivity. How many BUSCOs you detect in the genome depends both on the tool that you chose to run, and on its sensivitiy.

I would not worry if an annotation has more BUSCOs than you find on genome level. It's a good sign, the annotation process picked up a lot of BUSCOs.

YingChen94 commented 1 month ago

That makes sense! Thank you Katharina!