BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
367
stars
81
forks
source link
Suggestions of input datasets I used for braker2/3 #885
I used braker2 and braker3 respectively to do gene prediction in a tree genome (genome size ~900Mb).
The input data:
RNA-seq: RNA extracted from the leaves of the targeted species.
Protein database: orthoDB11- Viridiplantae
The gene number and BUSCO result is shown below:
Based on the result, the number of genes in the braker.gtf from braker3 is quite less (25943) than braker2 (48396). I know that braker3 only contained those transcripts in the result with very high support by the RNA-Seq and protein evidence. Thus, it's normal that the gene number in braker3 is lower than that in braker2.
However, the gene number in the phylogenetically closed related species is about 40000-50000 genes.
I regarded that the lower number of genes in braker3 is probably related to the RNA-seq data. We only have the RNA-seq from leaves. It's common for people to use RNA-seq from leaves, roots, flowers, and seeds. But it's difficult for us to obtain the RNA-seq from other tissues in addition to the leaves. Do you have any suggestions how to improve the result?
Dear community,
I used braker2 and braker3 respectively to do gene prediction in a tree genome (genome size ~900Mb).
The input data:
The gene number and BUSCO result is shown below:
Based on the result, the number of genes in the braker.gtf from braker3 is quite less (25943) than braker2 (48396). I know that braker3 only contained those transcripts in the result with very high support by the RNA-Seq and protein evidence. Thus, it's normal that the gene number in braker3 is lower than that in braker2.
However, the gene number in the phylogenetically closed related species is about 40000-50000 genes.
I regarded that the lower number of genes in braker3 is probably related to the RNA-seq data. We only have the RNA-seq from leaves. It's common for people to use RNA-seq from leaves, roots, flowers, and seeds. But it's difficult for us to obtain the RNA-seq from other tissues in addition to the leaves. Do you have any suggestions how to improve the result?
Thank you.