Weeks-UNC / shapemapper2

Public repository for ShapeMapper 2 releases
Other
29 stars 16 forks source link

Histogram PDFs do not accurately represent mutation rate statistics if a target sequence longer than the amplicon is provided #52

Open jwaldern opened 2 months ago

jwaldern commented 2 months ago

When providing a target fasta longer (e.g., 1kb) than the specified amplicon (250 nt), the histogram PDF appears to calculate median mutation rate incorrectly (specifically, deflates the mutation rate in both treated and untreated). When supplying a target fasta the precise length of the target amplicon (and changing nothing else), the mutation rate is higher and appears to be more in line with expectations.

The underlying mutation counts, mapped reads, and profile.txt appear to be correct.

Psirving commented 2 months ago

@lucaskearns This seems to only affect the histograms. My guess is that this is being calculated on the entire "_mutation_rate" columns, I think it would be more useful to only use mutation rates where "HQ_profile" is not nan.

p.s. There are two small bug fixes sitting in a PR. They work with v2.1.5. I am hoping that you can incorporate and test these fixes in your new version.