Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
47 stars 5 forks source link

benchmarking TSEBRA with BUSCO: my results are not good #6

Closed amvarani closed 3 years ago

amvarani commented 3 years ago

Hi there, I would like to describe my experience using TSEBRA with a plant genome, using BUSCO as benchmark I have a repeat masked genome and BRAKER1 and BRAKER2 annotation results My results are:

BRAKER1: C:97.7%[S:87.2%,D:10.5%],F:1.4%,M:0.9%,n:1614 BRAKER2: C:97.7%[S:72.6%,D:25.1%],F:0.9%,M:1.4%,n:1614

TSEBRA: C:78.4%[S:75.1%,D:3.3%],F:10.8%,M:10.8%,n:1614

The same annotated genome deposited at Phytozome: C:99.7%[S:66.9%,D:32.8%],F:0.1%,M:0.2%,n:1614

Why TSEBRA is messing up the BRAKER1 and BRAKER2 annotation ? Maybe I need to tune up the Configuration File ? Any help ?

Thanks a lot

LarsGab commented 3 years ago

Hi,

I'm sorry that TSEBRA didn't work properly for your annotation. The problem could be that the default configuration filters too many transcripts out. I included a more inclusive configuration into the repository at TSEBRA/config/pref_braker1.cfg, which you can use instead of the default.cfg. I hope this improves your results. Best, Lars

amvarani commented 3 years ago

Hi Lars, Thanks a lot for your reply. However, changing the conf file to "pref_braker1.cfg" the busco results still not good:

C:79.6%[S:76.1%,D:3.5%],F:9.4%,M:11.0%,n:1614

LarsGab commented 3 years ago

It seems to me that there are quite a few transcripts in your BRAKER results that are not supported by RNA-seq or protein evidence. TSEBRA removes all of these transcripts. I added another configuration file (keep_ab_initio.cfg) to the repository that keeps these transcripts.

amvarani commented 3 years ago

Hi there! Well, still not good: C:79.0%[S:75.1%,D:3.9%],F:10.8%,M:10.2%,n:1614 Can I send my files for you to take a look, if possible ?

LarsGab commented 3 years ago

Hi, yes, please send me the files so I can take a look at the issue. My email is lars.gabriel@uni-greifswald.de Best, Lars

amvarani commented 3 years ago

Hi there, Finally, with the kindly help of @LarsGab, I have found the problem ! I was using the EvidenceModeler scripts: "augustus_GTF_to_EVM_GFF3.pl" and "gff3_file_to_proteins.pl" to convert the TSEBRA GTF file to GFF3 and them fasta protein format, respectively I noticed that the conversion made by these scripts did not work proper, when we run Braker with the option "--alternatives-from-evidence=true" For a solution, the best strategy is to use the Augustus scripts "gtf2gff.pl" and "gtf2aa.pl", respectively. Using these scripts, I finally got a reasonable BUSCO scores:

C:98.4%[S:93.6%,D:4.8%],F:0.6%,M:1.0%,n:1614

smallfishcui commented 2 years ago

@amvarani Thank you for sharing! It's important to know, because I also use the two perl scripts which you use before to convert the files to measure busco. I will try the way you suggested.