Closed Flupp closed 2 weeks ago
Thanks for the report! Indeed, I'm not sure why those functions have an interface in terms of Rational
but then convert to Double
internally... seems like a recipe for trouble.
Another option could be to change those functions' types to use Double
instead of Rational
which would at least be more honest about the kind of precision available. But that would be a breaking change, and it wouldn't completely solve the issue.
I have two further ideas:
One could simply clamp the drawn number to the Rational
bounds. Of course, this slightly skews the distribution, but for the pure rounding error problem, this is probably practically irrelevant; it’s basically yet another rounding error. :D
For the second problem, where the drawn number might be out of the Double
bounds, the skewing might be larger, however, there the distribution is probably questionable in the first place anyways.
Instead of using tricky bounds for getRandomR
, one could normalize the Rational
s by dividing each of them by the sum. Considering the mentioned floating point caveats (and after a quick look in the source code), I think getRandomR
should behave benign when simply using (0, 1)
as bounds. (Unfortunately, AFAICS, there is no equivalent for uniformDouble01M
in MonadRandom
.)
Regarding switching to Double
in the first place: Besides being an interface change, which is inherently unfavorable, I agree that changing the interface type to Double
would probably be cleanest. Note that it is then important that the summing for s
and the summing for cums
happens in the same order, otherwise, due to intermediate rounding errors, one might again end up with different values for s
and the last element of cums
. However, I guess this is probably the case although the order of summing by sum
is not defined in its documentation.
… yet another idea:
Do not draw a Double
in the first place. Internally, getRandomR
uses uniformDouble01M
, which in turn uses uniformWord64
, which (AFAICS) is also what is used when drawing a Word64
using getRandom
. Hence, one could immediately draw this Word64
and use this to create the Rational
:
let s = sum (map snd xs)
w <- getRandom
let p = s * toRational (w :: Word64) / toRational (maxBound :: Word64)
This would also guarantee staying withing the bounds without the quirks related to getRandomR
for Double
.
Of course, still, the interface suggests more precision (in fact, arbitrary precision) while internally only 64 bits are used. Hence, a value with a weight smaller than 1 / (maxBound :: Word64)
might never be drawn by fromListMay
. But this is virtually impossible anyways.
@Flupp what do you think of https://github.com/byorgey/MonadRandom/commit/daa98f636bb677aa2550ab6321f10ab69c7a6066 ?
Looks good to me.
Note: I do not know about your versioning scheme, but note that, given some internal PRNG state, this might change the result of fromListMay'
and functions based on that. The successor PRNG state does probably not change since we draw the same Word64
like the previous solution did internally, but I am not 100%.
Hmm, that's a good point. I will try to do a few spot checks to see whether it drastically changes the behavior of fromListMay'
or not. In theory it should not change much; assuming that is correct, in this case I think it's worth just doing a minor version bump (avoiding all the ecosystem churn attendant on a major bump) even though technically a major version bump would be required if the output of the function has changed for some inputs.
Took me a minute to track this down, but it turns out that random Double
s are chosen by calculating u / maxBound
where u
is a uniform Word64
, then subtracting from 1! So at first, the values generated by fromListMay
and friends were completely different than the old ones (even though the ending generator state was still the same). But once I figured that out it was an easy fix; the new fromListMay
now generates exactly the same values and updated generator state as the previous versions, for all the cases I tried. In theory, the new implementation of fromListMay
and friends could still generate a different output than before in certain cases, but the probability of hitting such a case seems vanishingly small.
Yeah, sorry, I forgot to mention the “subtract from 1” thing. I already stumbled upon this myself, and this actually has its own issues (but only when using floating point; see haskell/random#166).
Anyways, I am not sure if doing this change in a minor version is a good idea. There might be people relying on generating some fixed data from a fixed seed for the PRNG. Problems might be especially subtle or might remain unnoticed for a long time if a different behavior only occurs in some very rare cases.
In any case, I suggest a release note about this.
I learned that there is some discussion about the versioning.
Let me propose another option: Doing two different changes:
I think the first could easily be achieved by simply detecting the error case and returning the last element then (which was just skipped because of rounding issues):
p <- liftM toRational $ getRandomR (0, s)
return . Just . fst $ case dropWhile ((< p) . snd) cums of
x : _ -> x
[] -> last xs
(Code untested; it’s just a sketch.)
For the second one, you do not have to consider backwards compatibility then. You don’t even need to do the “subtracting from 1“ (see https://github.com/byorgey/MonadRandom/issues/53#issuecomment-2294862625); this might then even help users to notice that actually something changed, so they cannot accidentally assume that the distribution did not change, because it changes only ever so slightly when doing the “subtracting from 1“. Also, you can delay this change to collect some more breaking changes before bumping a major version.
Released MonadRandom-0.6.1
to Hackage with this fix.
fromListMay
takesRational
s as input but usesDouble
for drawing a random number. This may theoretically result in errors. For example consider drawing weighted from[('a', w)]
wherew = let i = 55 in (2^i - 1) / 2^i
. The (single) weightw
is nearly one and when converted toDouble
it is rounded up to exactly one. TheDouble
value is used as an upper bound for drawing a random number. So, theoretically, exactly one could be drawn. For choosing from the list, the drawn number is converted back toRational
. The code ends up comparingw < 1
, which isTrue
. Because of that,head
will be applied to an empty list.Note: The problem also occurs with less crazy weights, e.g.,
1 / 5
:let x = 1 / 5 in x < toRational (fromRational x :: Double)
isTrue
.You can easily show that the error might actually occur by replacing the random draw by returning the upper bound:
Even worse: Considering the “Floating point number caveats” section in the
random
package, even values out of the bounds might be drawn.All in all, the above problem might cause a program using
fromListMay
or its friends to crash literally randomly.Unfortunately I have no clean idea to resolve the problem. The only idea I have is to redraw in case of a value out of bounds being drawn. Then, however, I fear there are edge cases that might result in endless loops.