a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

High % of sequences lost in discarded negative strand alignments #23

Closed NStrowbridge closed 1 year ago

NStrowbridge commented 1 year ago

To whom it may concern,

I recently sequenced amphibian skin for direct RNA analysis. I am currently working with two samples at moment, but am intending on doing more, depending on our preliminary results. I input my sorted and indexed BAM files into Nanocount using the following basic command.

NanoCount -i path/to/input -b path/to/output/bam --extra_tx_info -o /path/to/output/counts

Unfortunately, I am losing >60% of my sequences to discarded negative strand alignments. I am not quite sure what this means for my samples. I did do the optional RT step in the direct rna prep, if this is relevant.

Also, I am losing ~25% to unmapped alignments, which is possibly due to control RNA CS.

If you could give me some advice that would be great!

Cheers,

Nic

NStrowbridge commented 1 year ago

To whom it may concern,

I recently sequenced amphibian skin for direct RNA analysis. I am currently working with two samples at moment, but am intending on doing more, depending on our preliminary results. I input my sorted and indexed BAM files into Nanocount using the following basic command.

NanoCount -i path/to/input -b path/to/output/bam --extra_tx_info -o /path/to/output/counts

Unfortunately, I am losing >60% of my sequences to discarded negative strand alignments. I am not quite sure what this means for my samples. I did do the optional RT step in the direct rna prep, if this is relevant.

Also, I am losing ~25% to unmapped alignments, which is possibly due to control RNA CS.

If you could give me some advice that would be great!

Cheers,

Nic

Oh also, I should mention I aligned to transcriptome if this is relevant.

josiegleeson commented 1 year ago

Hi Nic,

I am wondering if perhaps the 60% of alignments being lost are poor secondary alignments possibly? Is it possible for you to work out the number of actual reads before and after NanoCount in the bam files?

Otherwise you can disable the negative strand filtering and test further what is going on by enabling the flag --keep_neg_strand.

Let me know how you go, Josie.

NStrowbridge commented 11 months ago

Josie,

Sorry, I haven't got back to you as I was busy with other work and forgot to check this. I've actually got more samples and have been trying to conduct Nanocount again. I am still getting quite a few discarded alignments. Both for negative strand alignments and invalid 3' end alignments. I.e

1;34m## Initialise Nanocount ##  Parse Bam file and filter low quality alignments  Summary of alignments parsed in input bam file  Discarded negative strand alignments: 1,461,447  Discarded alignment with invalid 3 prime end: 1,197,522  Valid alignments: 402,524  Discarded unmapped alignments: 56,911  Discarded supplementary alignments: 7,345  Summary of reads filtered  Reads with valid best alignment: 202,817  Valid secondary alignments: 125,693  Invalid secondary alignments: 69,228  Reads with low query fraction aligned: 3,466  Write selected alignments to BAM file  Summary of alignments written to bam  Alignments skipped: 2,797,239  Alignments to select: 328,510  Alignments written: 328,510  Generate initial read/transcript compatibility index

I've checked the read numbers before and after NanoCount and I am only getting ~1/3 of the reads used as input due to discarding. Do you have any idea why this might be happening?