Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
46 stars 5 forks source link

Does get_longest_isoform.py keep ab initio predictions? #52

Closed tinameiring closed 3 weeks ago

tinameiring commented 1 month ago

I want to combine two gene sets and retain the ab initio predictions, but also keep only the longest coding region for each cluster of overlapping transcripts.

Does the get_longest_isoform.py retain the ab initio predictions?

Or do I need to combine the gene sets with tsebra.py (with the keep_ab_initio.cfg config file) and then run get_longest_isoform.py?

LarsGab commented 3 weeks ago

This issue has been answered via email.

The 'get_longest_isoforms.py' script selects the transcript alternative with the longest coding sequence for each gene locus, including ab initio predictions. It does not evaluate alternative transcripts based on extrinsic evidence. If your goal is to retain only one alternative per gene locus, using 'get_longest_isoforms.py' is a reasonable approach to combine the gene sets.