Closed rohanmaddamsetti closed 1 year ago
Hi Rohan,
Sure, that is better terminology and might save someone else from re-figuring this out. I changed dispersion
to relative variance
throughout the code. That will be incorporated in the next release (>v0.37.1).
Hi Jeff,
The use of "dispersion" and its definition as Var(coverage)/Mean(coverage) in the summary output file is correct-- but the R documentation in running "?qnbinom" and the wikipedia article in the negative binomial article (https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_parameterizations) both use "dispersion" to define a very different quantity: size (dispersion) = Mean(X)^2/(Var(X) - Mean(X)).
I was wondering if you could consider using a different term like "relative variance" in the breseq output to avoid this ambiguity: https://en.wikipedia.org/wiki/Index_of_dispersion
I was having trouble reproducing the negative binomial coverage distribution in R, because it's tempting to incorrectly write:
rnbinom(n=genome.length, size=breseq.dispersion, mu=breseq.mean)
because R documentation says that "size" is the "dispersion parameter" in one of the many(!) parameterizations of the negative binomial. Wikipedia also describes this usage. But the proper parameterization in R requires the following transformation:
rnbinom(n=genome.length, size=(breseq.mean^2)/(breseq.mean*breseq.dispersion-breseq.mean), mu=breseq.mean)
I figured this out after a lot of head-scratching, but I imagine the use of "dispersion" to indicate different quantities related to the negative binomial distribution, depending on author/source, may trip others up as well. (I'm hand-rolling some CNV analyses for my data, which is why I'm digging into this stuff).
Thanks, Rohan