ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
142 stars 12 forks source link

The issue of matching between the expanded GTF and transcript types #194

Open biochristmas opened 3 months ago

biochristmas commented 3 months ago

Hi, I aligned the transcript to the reference genome, and based on the coverage of reads, I found two alternative splicing events. However, the GTF file expanded by IsoQuant shows only one transcript when viewed through IGV. Thank you! igv

andrewprzh commented 3 months ago

Dear @biochristmas

This two isoforms are quite hard to distinguish, since the second one is the a substing of the first one. IsoQuant takes into account that some reads can be truncated, and thus considers reads from a shorter isoform simply as truncated versions of the longer isoform. If the difference was on the 3' end and two isoforms had distinct polyA sites, there would be a higher chance of detecting both of them.

We are working on improving the algorithms for correctly detecting 5' and 3' ends, but this case seems quite non-trivial. Using such reads in other cases may lead to a high number of false positives.

You may try using --fl_data option, but I don't think it will make a difference in this case.

Best Andrey

biochristmas commented 3 months ago

Thank you for your reply. I also tried the '--fl_data' parameter today, and the number of transcripts in the GTF file is the same as when not using '--fl_data'. There is indeed no difference.