forcedotcom / distributions

Low-level primitives for collapsed Gibbs sampling in python and C++
BSD 3-Clause "New" or "Revised" License
33 stars 25 forks source link

sample_beta ill-conditioned for small values of alpha, beta #88

Closed stephentu closed 9 years ago

stephentu commented 9 years ago

sample_beta will return NaNs when alpha, beta are small (e.g. 1/100).

A more robust implementation might look like the one in numpy (https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L183), which uses Johnk's algorithm (see e.g. page 418 of http://www.eirene.de/Devroye.pdf)

stephentu commented 9 years ago

This was discovered by @dpfau

dpfau commented 9 years ago

A simple hack that's working for me is to add the line:

if (x==0 && y==0) return sample_bernoulli(rng, alpha / (alpha + beta)) ? 1.0 : 0.0;

right before the return statement. Johnk's algorithm would suffer from the same problem as the unmodified sampler here - if alpha = 0.001 and U = 0.5, U^(1/alpha) would underflow. You can see for yourself by trying out this gist: https://gist.github.com/dpfau/5ea001f82171ff8d9e99

dpfau commented 9 years ago

Incidentally, numpy also produces NaNs quite often when alpha = beta = 0.001!

fritzo commented 9 years ago

fixed by https://github.com/forcedotcom/distributions/pull/90