Closed tinameiring closed 3 weeks ago
This issue has been answered via email.
The 'get_longest_isoforms.py' script selects the transcript alternative with the longest coding sequence for each gene locus, including ab initio predictions. It does not evaluate alternative transcripts based on extrinsic evidence. If your goal is to retain only one alternative per gene locus, using 'get_longest_isoforms.py' is a reasonable approach to combine the gene sets.
I want to combine two gene sets and retain the ab initio predictions, but also keep only the longest coding region for each cluster of overlapping transcripts.
Does the get_longest_isoform.py retain the ab initio predictions?
Or do I need to combine the gene sets with tsebra.py (with the keep_ab_initio.cfg config file) and then run get_longest_isoform.py?