High % of sequences lost in discarded negative strand alignments

NStrowbridge commented 1 year ago

To whom it may concern,

I recently sequenced amphibian skin for direct RNA analysis. I am currently working with two samples at moment, but am intending on doing more, depending on our preliminary results. I input my sorted and indexed BAM files into Nanocount using the following basic command.

NanoCount -i path/to/input -b path/to/output/bam --extra_tx_info -o /path/to/output/counts

Unfortunately, I am losing >60% of my sequences to discarded negative strand alignments. I am not quite sure what this means for my samples. I did do the optional RT step in the direct rna prep, if this is relevant.

Also, I am losing ~25% to unmapped alignments, which is possibly due to control RNA CS.

If you could give me some advice that would be great!

Cheers,

Nic

NStrowbridge commented 1 year ago

To whom it may concern,

I recently sequenced amphibian skin for direct RNA analysis. I am currently working with two samples at moment, but am intending on doing more, depending on our preliminary results. I input my sorted and indexed BAM files into Nanocount using the following basic command.

NanoCount -i path/to/input -b path/to/output/bam --extra_tx_info -o /path/to/output/counts

Unfortunately, I am losing >60% of my sequences to discarded negative strand alignments. I am not quite sure what this means for my samples. I did do the optional RT step in the direct rna prep, if this is relevant.

Also, I am losing ~25% to unmapped alignments, which is possibly due to control RNA CS.

If you could give me some advice that would be great!

Cheers,

Nic

Oh also, I should mention I aligned to transcriptome if this is relevant.

josiegleeson commented 1 year ago

Hi Nic,

I am wondering if perhaps the 60% of alignments being lost are poor secondary alignments possibly? Is it possible for you to work out the number of actual reads before and after NanoCount in the bam files?

Otherwise you can disable the negative strand filtering and test further what is going on by enabling the flag --keep_neg_strand.

Let me know how you go, Josie.

NStrowbridge commented 11 months ago

Josie,

Sorry, I haven't got back to you as I was busy with other work and forgot to check this. I've actually got more samples and have been trying to conduct Nanocount again. I am still getting quite a few discarded alignments. Both for negative strand alignments and invalid 3' end alignments. I.e

1;34m## Initialise Nanocount ##[0m [32m Parse Bam file and filter low quality alignments[0m [32m Summary of alignments parsed in input bam file[0m [32m Discarded negative strand alignments: 1,461,447[0m [32m Discarded alignment with invalid 3 prime end: 1,197,522[0m [32m Valid alignments: 402,524[0m [32m Discarded unmapped alignments: 56,911[0m [32m Discarded supplementary alignments: 7,345[0m [32m Summary of reads filtered[0m [32m Reads with valid best alignment: 202,817[0m [32m Valid secondary alignments: 125,693[0m [32m Invalid secondary alignments: 69,228[0m [32m Reads with low query fraction aligned: 3,466[0m [32m Write selected alignments to BAM file[0m [32m Summary of alignments written to bam[0m [32m Alignments skipped: 2,797,239[0m [32m Alignments to select: 328,510[0m [32m Alignments written: 328,510[0m [32m Generate initial read/transcript compatibility index[0m

I've checked the read numbers before and after NanoCount and I am only getting ~1/3 of the reads used as input due to discarding. Do you have any idea why this might be happening?

a-slide / NanoCount

High % of sequences lost in discarded negative strand alignments #23