Help understanding the no_cap option

GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)

GNU General Public License v3.0

125 stars 24 forks source link

Hi, I recently ran TAMA collapse on an internal nanopore sample and used the "no_cap" option (ECC collapse) to mimic Isoseq's TOFU collapse (--do-not-collapse-extra-5exons) The Isoseq collapse allows only one transcript model for each transcript unlike TAMA, where a single transcript gets assigned to multiple transcript models.

TAMA Documentation: _"If you used nocap mode for collapsing there may be multiple lines for a single read. This happens when a 5' degraded read can match to multiple 5' longer transcript models."

Questions:

Wouldn't this make it more of a TSSC type of collapse rather than an ECC?
Is the longest transcript model to be picked if ECC is desired?

Command used:

python2.7 /home/ec2-user/environment/tama/tama_collapse.py \
-s /data/sample.sorted.sam \
-f /home/ec2-user/environment/annotations/minimap2/hg38as.fa \
-p /data/tama_collapse/sample \
-x no_cap

Can you please help me understand this option better? Any help will be appreciated.

GenomeRIK / tama

Help understanding the no_cap option #109