a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

Question about normalisation #4

Closed tleonardi closed 4 years ago

tleonardi commented 4 years ago

I don't understand why at the line below the estimated counts are multiplied by the total number of reads instead of being divided.. might be related to #3?

https://github.com/a-slide/NanoCount/blob/d79d5a95be99c01ed9fbead38303200737b6d39b/NanoCount/NanoCount.py#L112

tleonardi commented 4 years ago

Having a better look at this, it seems "raw" is already normalised by the total number of reads (the sum of the 'raw' column is 1): therefore multiplying by len(self.read_dict) should give you back the (estimated) number of reads. Correct?

a-slide commented 4 years ago

So I guess we are good then.