a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

Identical tpm values for different transcripts #29

Closed chloebonenfant closed 9 months ago

chloebonenfant commented 11 months ago

Hi,

I am currently using NanoCount to quantify the expression of reads that were previously mapped by minimap2. In my analysis, I utilized the tsv file generated by NanoCount.

I noticed that a considerable number of transcripts have identical values across all three columns (raw, est_count, and TPM). This seems unusual to me, as I would expect some variation in expression levels between different transcripts.

I'm wondering if you could shed some light on why I am encountering identical values for these transcripts in the output file. Is this behavior expected in certain scenarios, or could there be a potential issue with my data or analysis?

Any insights or suggestions you can provide to help me understand and resolve this matter would be greatly appreciated.

Thank you for your time and assistance.

Capture d’écran, le 2023-08-01 à 16 18 45

josiegleeson commented 10 months ago

Hello,

I think what is going on here is that there are some reads which are equally likely to belong to all of the transcripts with identical counts. And because we can't tell them apart the counts get evenly split across the transcripts.

If all of the transcripts with the identical counts are from the same gene, and look quite similar, then this is probably why. They are probably very long transcripts where only a small end fragment got sequenced.

These are also lowly expressed (<20 reads), which makes me more confident in the above. Let me know if this helps or if it doesn't seem like this is the case, and I'll take a closer look.

Thanks, Josie

josiegleeson commented 9 months ago

Please count the unique read IDs in the original and filtered BAM files as below: samtools view aligned_reads.bam | awk '{print $1}' | sort | uniq | wc -l