a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

Using with custom filtered alignments #8

Closed josiegleeson closed 4 years ago

josiegleeson commented 4 years ago

Hi, I have more of a question than an issue, I hope that's ok. I've mapped my direct RNA reads with minimap2 to the transcriptome and included -N 100 to retain more secondary alignments. Then I've written a script to filter these files based on 3' ends, alignment score, aligned fraction of reads and accuracy to give me filtered bam files. I now want to quantify these, but I don't want the primary alignments to be regarded as 'better' than the secondaries. I've tried using salmon also on these filtered bams, but it seems to only quantify the primary alignments and disregard all secondary ones. Do you know how NanoCount will work with this? Or if there's a way I can modify it to count my secondary alignments? Thanks for your help!

a-slide commented 4 years ago

Hi @josiegleeson, The current implementation of NanoCount requires the primary alignment to be provided in the source Bam file. However, since it could be a useful feature in some situation like yours, I added the following functionality in the dev branch of the repo :

  -p PRIMARY_SCORE, --primary_score PRIMARY_SCORE
                        Method to pick the best alignment for each read. By
                        default it uses the primary read defined by the
                        aligner but it can be changed to use either the best
                        alignment score ("align_score") or the best alignment
                        length ("align_len"). (default: None) [str]   

The dev version 0.2.2 is being deployed on the pip and conda at the moment. I think -p align_score should do the trick for you. Could you give it a try and report any issues ?

To install the best option is to use pip from github directly

pip install git+https://github.com/a-slide/NanoCount.git@dev
josiegleeson commented 4 years ago

Thanks I'll give it a proper go shortly. So I can still use this on my filtered bams? Or is this for the original bams only?

josiegleeson commented 4 years ago

Update: I've got it installed. It runs on my bam file from minimap2 (-ax map-ont -N 100). But if I try and run it on my bam file filtered with samtools to retain only primary and secondary alignments it no longer works.

Initialise Nanocount Parse Bam file and filter low quality hits ERROR: No valid secondary hits found in bam file. Were the reads aligned with minimap -p 0 option ? Generate initial read/transcript compatibility index

I'm also noticing in your documentation about output, the est_count is abundance primary alignments. Would it be possible to change this somehow in my case so it is abundance any alignment?

Thanks again.

a-slide commented 4 years ago

Hi @josiegleeson,

Did you run Nanocompore with -p align_score ? Would you mind rerunning it with --verbose and copy/paste the log here ? It shouldn't really matter if it's written abundance * primary alignments as primary alignments is just a proxy for number of reads. It you use -p align_score the primary alignment becomes the alignment with the best alignment score for each read.

Something to keep in mind, is that after your filtering steps you need to keep multiple alignments per initial read, as this is required by the EM algorithm to model the uncertainty of mapping.

josiegleeson commented 4 years ago

Hi Adrien, It seems to all be running fine now! The -p align_score is working, on my samtools filtered files also. But to keep the EM algorithm working well and considering unmapped and supplementaries are filtered out in NanoCount I'll run it on the alignments straight from minimap2. Thanks again.