Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
46 stars 5 forks source link

New default.cfg increased duplicated BUSCO #29

Closed Huiting120 closed 10 months ago

Huiting120 commented 1 year ago

Hi,

I tested the new default.cfg and the old version (from the publication) on the same dataset and saw pretty big difference on the %single and %duplicated BUSCOs. Of course the current default.cfg was used with the current software, and an older version (forgot when we dockerized it, but probably last August) of the software was ran with the old default.cfg.

Input_file | Dataset | Complete | Single | Duplicated | Fragmented | Missing
TSEBRA.old.default.faa | eudicots_odb10 | 98.1 | 57.5 | 40.6 | 0.4 | 1.5
TSEBRA.new.default.faa | eudicots_odb10 | 98.1 | 47.7 | 50.4 | 0.3 | 1.6

The results from the old.default is closer to assembly BUSCO of the target genome, as well as the annotation BUSCO of other genomes from the same species.

LarsGab commented 1 year ago

Hi,

thanks for sharing your results. We made another update to TSEBRA, would you mind rerunning it with the new update? In our benchmarks, we observed an improvement across various species compared to the old TSEBRA version. We increased the mandatory coverage of extrinsic evidence in the new configuration. One explanation for your results could be that your coverage is not very high. In this case, I would recommend decreasing the value for 'intron_support' in default.cfg.

Best, Lars