Use of term "dispersion" in coverage summary reporting is confusing

Hi Jeff,

The use of "dispersion" and its definition as Var(coverage)/Mean(coverage) in the summary output file is correct-- but the R documentation in running "?qnbinom" and the wikipedia article in the negative binomial article (https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_parameterizations) both use "dispersion" to define a very different quantity: size (dispersion) = Mean(X)^2/(Var(X) - Mean(X)).

I was wondering if you could consider using a different term like "relative variance" in the breseq output to avoid this ambiguity: https://en.wikipedia.org/wiki/Index_of_dispersion

I was having trouble reproducing the negative binomial coverage distribution in R, because it's tempting to incorrectly write:

rnbinom(n=genome.length, size=breseq.dispersion, mu=breseq.mean)

because R documentation says that "size" is the "dispersion parameter" in one of the many(!) parameterizations of the negative binomial. Wikipedia also describes this usage. But the proper parameterization in R requires the following transformation:

rnbinom(n=genome.length, size=(breseq.mean^2)/(breseq.mean*breseq.dispersion-breseq.mean), mu=breseq.mean)

I figured this out after a lot of head-scratching, but I imagine the use of "dispersion" to indicate different quantities related to the negative binomial distribution, depending on author/source, may trip others up as well. (I'm hand-rolling some CNV analyses for my data, which is why I'm digging into this stuff).

Thanks, Rohan

barricklab / breseq

Use of term "dispersion" in coverage summary reporting is confusing #325