a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

High % of reads lost due to discarded negative strand and invalid 3' end. #30

Closed NStrowbridge closed 12 months ago

NStrowbridge commented 1 year ago

Josie,

I raised a similar issue before. Sorry, I haven't got back to you as I was busy with other work and forgot to check this. I've actually got more samples and have been trying to conduct Nanocount again. I am still getting quite a few discarded alignments. Both for negative strand alignments and invalid 3' end alignments. I.e

1;34m## Initialise Nanocount ##�[0m �[32m Parse Bam file and filter low quality alignments�[0m �[32m Summary of alignments parsed in input bam file�[0m �[32m Discarded negative strand alignments: 1,461,447�[0m �[32m Discarded alignment with invalid 3 prime end: 1,197,522�[0m �[32m Valid alignments: 402,524�[0m �[32m Discarded unmapped alignments: 56,911�[0m �[32m Discarded supplementary alignments: 7,345�[0m �[32m Summary of reads filtered�[0m �[32m Reads with valid best alignment: 202,817�[0m �[32m Valid secondary alignments: 125,693�[0m �[32m Invalid secondary alignments: 69,228�[0m �[32m Reads with low query fraction aligned: 3,466�[0m �[32m Write selected alignments to BAM file�[0m �[32m Summary of alignments written to bam�[0m �[32m Alignments skipped: 2,797,239�[0m �[32m Alignments to select: 328,510�[0m �[32m Alignments written: 328,510�[0m �[32m Generate initial read/transcript compatibility index�[0m

I've checked the read numbers before and after NanoCount and I am only getting ~1/3 of the reads used as input due to discarding. Do you have any idea why this might be happening?

josiegleeson commented 1 year ago

Hi again,

Hmm this does seem odd. Typically direct RNA sequencing produces reads on the positive strand, so I am not sure why so many would be mapping to the negative strand. Are you sure this is direct RNA and not cDNA of some kind?

And are you definitely checking the number of reads before and after, not just number of alignments? These will be quite different and it is expected that the number of alignments will be much smaller after.

Would you be comfortable sharing one of the BAM files with me if this doesn't solve the issue?

Thank you

josiegleeson commented 12 months ago

Please count the unique read IDs in the original and filtered BAM files as below: samtools view aligned_reads.bam | awk '{print $1}' | sort | uniq | wc -l