Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
367 stars 81 forks source link

Suggestions of input datasets I used for braker2/3 #885

Open hungweichen0327 opened 1 week ago

hungweichen0327 commented 1 week ago

Dear community,

I used braker2 and braker3 respectively to do gene prediction in a tree genome (genome size ~900Mb).

The input data:

The gene number and BUSCO result is shown below:

image

Based on the result, the number of genes in the braker.gtf from braker3 is quite less (25943) than braker2 (48396). I know that braker3 only contained those transcripts in the result with very high support by the RNA-Seq and protein evidence. Thus, it's normal that the gene number in braker3 is lower than that in braker2.

However, the gene number in the phylogenetically closed related species is about 40000-50000 genes.

I regarded that the lower number of genes in braker3 is probably related to the RNA-seq data. We only have the RNA-seq from leaves. It's common for people to use RNA-seq from leaves, roots, flowers, and seeds. But it's difficult for us to obtain the RNA-seq from other tissues in addition to the leaves. Do you have any suggestions how to improve the result?

Thank you.