Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
47 stars 5 forks source link

How to improve the BUSCO score of the resulted predictions? #19

Open bioinformaticspcj opened 2 years ago

bioinformaticspcj commented 2 years ago

Dear Lars,

Thanks for your valuable advice. I have tried the latest version of TSEBRA and BRAKER. The BUSCO score have improved a lot but I still have some questions. The results are as fellows: braker1: C:96.1%[S:80.3%,D:15.8%],F:1.8%,M:2.1%,n:2586 braker2 prothint: C:62.6%[S:52.0%,D:10.6%],F:16.0%,M:21.4%,n:2586 braker2 GenomeThreader: C:54.9%[S:47.3%,D:7.6%],F:24.4%,M:20.7%,n:2586 TSEBRA: C:93.4%[S:89.5%,D:3.9%],F:2.9%,M:3.7%,n:2586

Question1: The BUSCO score of braker2 is still very low even I uses different aligners. I just used three closely relative species peps to run braker2 following the examples. I do not kown why. Could you give me some advice to improve it?

Question2: The TSEBRA resulted 47 953 predicted genes, which I think is much more than expected. Do you think I should alter the config file to remove some genes? If so, I am afraid the BUSCO score will decrease. Could you give me some suggestions?

Thanks again.

Best, Bob

LarsGab commented 2 years ago

Hi Bob,

BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether.

If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0).

I hope this helps. Best, Lars

bioinformaticspcj commented 2 years ago

Hi Lars,

Many thanks for your timely reply. I have tried the braker2 as you suggested. I downloaded more than 4,500,000 vertebrata pep sequences from  OrthoDB and combined my 3 closely related species to them. In total 4,832,878 pep sequences were used to run braker2. But the BUSCO score had only improved a little as follows: C:64.2%[S:58.4%,D:5.8%],F:14.5%,M:21.3%,n:2586

The commond I used are as follows: braker.pl --species=Pcar --genome=Pcar.genome.fa.maskered --prot_seq=all.orthodb.pep.1.fa --softmasking --cores=48  --nocleanup --gff3  --workingdir=braker2_out --epmode --useexisting Could you give me some more advice to improve it?

Thanks a lot, Best,  Bob

------------------ 原始邮件 ------------------ 发件人: "Gaius-Augustus/TSEBRA" @.>; 发送时间: 2022年6月27日(星期一) 下午3:31 @.>; @.**@.>; 主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19)

Hi Bob,

BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether.

If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0).

I hope this helps. Best, Lars

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

LarsGab commented 2 years ago

Hi Bob,

it looks like you didn't train BRAKER again with the new database. You have to give it a new species name for '--species' and remove the '--useexisting' option.

Best, Lars

bioinformaticspcj commented 2 years ago

Hi Lars,

Thanks for your timely reply. I have tried to train BRAKER using new species name for '--species' and remove the '--useexisting' option as you suggested. However, the result is still not good:

C:64.7%[S:58.6%,D:6.1%],F:13.5%,M:21.8%,n:2586

Could you give me other advice for improving? Maybe, I should change the default parameters to others?

Best, Bob

------------------ 原始邮件 ------------------ 发件人: "Gaius-Augustus/TSEBRA" @.>; 发送时间: 2022年7月1日(星期五) 晚上6:07 @.>; @.**@.>; 主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19)

Hi Bob,

it looks like you didn't train BRAKER again with the new database. You have to give it a new species name for '--species' and remove the '--useexisting' option.

Best, Lars

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

LarsGab commented 2 years ago

Hi Bob,

if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA. I created this cfg file for a project where I had a similar situation. However, I haven't tested it on different species, so analyzing the result and visually inspecting it is all the more important here.

Best, Lars

bioinformaticspcj commented 2 years ago

Hi Lars,

Many thanks for your advice. I have tried to use the pref_braker1.cfg file and achieved a reasonable result as follows:

C:96.1%[S:90.1%,D:6.0%],F:1.9%,M:2.0%,n:2586

However, even I increase the 'intron_support' parameter to 1.0, there still are too many predicted genes (43 256). Could you give me more idea about how to decrease the gene counts ? 

Thanks again.

Best, Bob ------------------ 原始邮件 ------------------ 发件人: "Gaius-Augustus/TSEBRA" @.>; 发送时间: 2022年7月8日(星期五) 晚上6:15 @.>; @.**@.>; 主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19)

Hi Bob,

if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA. I created this cfg file for a project where I had a similar situation. However, I haven't tested it on different species, so analyzing the result and visually inspecting it is all the more important here.

Best, Lars

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>