GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)
GNU General Public License v3.0
125 stars 24 forks source link

Help understanding the no_cap option #109

Closed Caffeinated-Code closed 10 months ago

Caffeinated-Code commented 1 year ago

Hi, I recently ran TAMA collapse on an internal nanopore sample and used the "no_cap" option (ECC collapse) to mimic Isoseq's TOFU collapse (--do-not-collapse-extra-5exons) The Isoseq collapse allows only one transcript model for each transcript unlike TAMA, where a single transcript gets assigned to multiple transcript models.

TAMA Documentation: _"If you used nocap mode for collapsing there may be multiple lines for a single read. This happens when a 5' degraded read can match to multiple 5' longer transcript models."

Questions:

  1. Wouldn't this make it more of a TSSC type of collapse rather than an ECC?

  2. Is the longest transcript model to be picked if ECC is desired?

Command used:

python2.7 /home/ec2-user/environment/tama/tama_collapse.py \
-s /data/sample.sorted.sam \
-f /home/ec2-user/environment/annotations/minimap2/hg38as.fa \
-p /data/tama_collapse/sample \
-x no_cap

Can you please help me understand this option better? Any help will be appreciated.

GenomeRIK commented 10 months ago

Hello,

Massive apologies for responding to this so late. It has been a hectic few months for me. I hope my response is still of some help at this time point.

Questions:

Wouldn't this make it more of a TSSC type of collapse rather than an ECC?

I am not sure I understand this question but this talk I did might help: https://www.youtube.com/watch?v=c9fh0mlly68&t=672s

Is the longest transcript model to be picked if ECC is desired?

Yes the 5' longest transcript will be chosen as the final model but the splice junctions will depend on read coverage and other factors.

Hope this helps and again sorry for the delay, Richard