GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)
GNU General Public License v3.0
125 stars 24 forks source link

Help understanding tama_remove_fragment_models.py discard result #138

Open idarolti opened 2 months ago

idarolti commented 2 months ago

Hi Richard,

I have a test bed file with the following two transcripts that differ only in the size of the last exon (PB.6480.7 is 702 bp longer than PB.6480.5):

Chr1 113145948 113213177 PB.6480;PB.6480.7 40 + 113145948 113145948 255,0,0 7 368,1014,161,107,260,167,1066 0,53232,56316,59800,63135,65470,66163 Chr1 113145948 113212475 PB.6480;PB.6480.5 40 + 113145948 113145948 255,0,0 7 368,1014,161,107,260,167,364 0,53232,56316,59800,63135,65470,66163

When running tama_remove_fragment_models.py with default parameters, transcript PB.6480.5 is being discarded. Could you please explain why that is? If I understand correctly, tama_remove_fragment_models.py should remove fragment models that differ from the longer model on both the 5' and 3' ends up to a certain length threshold. By default, exon ends threshold/ splice junction threshold is 10bp and trans ends wobble threshold is 500bp. The two transcripts in my bed file only differ at one end, and the difference exceeds the threshold, so shouldn't both be kept? I have tried changing to lower thresholds but still the same result. Any clarifications for why this is the case?

Many thanks! Iulia Darolti