GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)
GNU General Public License v3.0
125 stars 24 forks source link

Remove single read models #112

Closed mainciburu closed 10 months ago

mainciburu commented 10 months ago

Hi! Thanks for building and maintaining TAMA, it's being very useful for my work. I'm using tama_remove_single_read_models_levels to filter every transcript with only 1 read support and came across something confusing. I'm using the option -l transcript and I've noticed that, when I do so (instead of -l gene), the option -k switches to keep_multi by default. Within my filtered set of transcripts, I detected transcripts supported by 1 read, all of them with >1 exon. When I specify both -l transcript -k remove_multi, these transcripts disappear. According to what I read in the docs, I expected the latter to be the default behaviour. So I don't know if this could be a bug, or something to clarify in the documentation. Thank you!

GenomeRIK commented 10 months ago

Hello,

The reason why the default for "-l transcript" is to keep multi-exon transcripts is because reads which represent mutli-exon transcripts are theoretically more likely to be real than single exonic transcripts with single reads. It is more of my own RNA philosophy but I am super excited that you noticed this!

I explain a bit more about this in my talks which you can find on youtube.

This one in particular might be useful to you: https://www.youtube.com/watch?v=c9fh0mlly68&t=1681s

Thank you, Richard

mainciburu commented 10 months ago

Thanks for the explanation!