ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
Other
131 stars 11 forks source link

A Novel transcript without high-confidence reads support #207

Open yuyun-zhang opened 3 weeks ago

yuyun-zhang commented 3 weeks ago

Hi, thank you for developing this amazing tool.

I used 9 long-read RNA-seq samples togather to generate a gtf file and I am very interested in a novel transcript (transcript32428.chr19.nic). When I load the bam files and the isoquant gtf file (annov29_new.transcript_models.gtf) in the IGV, I found that no read can support this novel transcript. The start site, end site, and splicing sites of the reads are not exactly the same as this novel transcript, or not mostly the same as this novel transcript.

I checked the file annov29_new.transcript_model_reads.tsv.gz and found 4 reads are recorded to support transcript32428.chr19.nic. However, these 4 reads are actually different from transcript32428.chr19.nic.

Here is the snapshot of reads and novel transcript transcript32428.chr19.nic. I marked these 4 reads and novel transcript . It seems transcript32428.chr19.nic is the result of the fusion of these 4 reads.

Snipaste_2024-06-27_15-49-17

Could you please help me understand this result? Do you have any considerations for outputting results like this case?

Thank you!

Best, Yuyun

andrewprzh commented 3 weeks ago

Dear @yuyun-zhang

Thanks for the report. Do you use the reference annotation as well? If so, IsoQuant may take TSS/TES sites from the reference instead of reads, which is a known disadvantage which I working on.

If you run IsoQuant in reference-free mode, it may happen that TSS/TES positions are take from other reads, but the intron chain is taken from read4, which is kind of weird too.

To figure out in details, I'd probably need a subset of your BAM file from this particular region (if possible).

Best Andrey

yuyun-zhang commented 3 weeks ago

Hi Andrey,

Thanks for your reply! Yes, I used the reference annotation. Here are the 9 BAM files that I used to run Isoquant, and I extracted the reads from the IGV region. I showed the reads of the s1 sample in the IGV snapshot above. BAM_files.zip

Thank you very much!

Best, Yuyun