amcphail / hmatrix-gsl-stats

GSL Statistics functions for Haskell hmatrix
BSD 3-Clause "New" or "Revised" License
6 stars 3 forks source link

no way of advancing RNG state #5

Closed vixr closed 9 years ago

vixr commented 9 years ago

There doesn't seem to be a way of advancing the state of RNGs. Without that, I don't see any way of sampling more than a single random number from a distribution for a given seed.

amcphail commented 9 years ago

These functions are designed in the following way: (i) you have a random number generator state that you have initialised with a random seed, (ii) to call say random_1p Gaussian <random number> 0.5 where the parameter (standard deviation) = 0.5 and <random_number> is a random number you generate with the handy random number generated in (i).

import System.Random.MWC
import Numeric.GSL.Distribution.Continuous 

getNSamples :: Int                   -- number N
     , Double                              -- parameter (standard deviation)
    -> IO [Double]  
getNSamples n sigma = replicate n $ getOneSample sigma

getOneSample :: Double         -- parameter (standard deviation) 
   -> IO Double
getOneSample sigma = do
   r <- withSystemRandom . asGenST $ \gen -> uniform gen
   return $ random_1p Gaussian r sigma
vixr commented 9 years ago

I'm not a RNG expert, but this doesn't seem quite right. My understanding RNGs are tested and validated for their random-like properties by advancing the state of a stream starting from a single seeds. Although ideally if there's not a bias in the first sample across the distribution of seeds this would work okay, but I wouldn't bet on it. Randomizing the seed with another RNG and resetting the seed with each sample is not how these samplers were tested or meant to be used and could result in artifacts.

vixr commented 9 years ago

thanks vivian!

For reference this package manages rng state with gsl - https://hackage.haskell.org/package/gsl-random Unfortunately it's incomplete in it's distribution bindings and it also isn't integrated with hmatrix.

amcphail commented 9 years ago

Hi vixr,

I'll add methods to get vectors back from the distributions as there is some overhead in setting up the RNG each function call.

vixr commented 9 years ago

That would be useful. However there are many cases such as gibbs samplin and other markov chain monte carlo based inference as well as monte carlo simulations that require repeated sampling where each sample comes form a different distribution which is dependent from prior samples. These use cases won't be covered by the ability to retrieve multiple samples from the same distribution. Ideally there would be a way of advancing the rng state with minimal overhead in those applications.

amcphail commented 9 years ago

https://github.com/amcphail/hmatrix-gsl-stats/commit/74721b0e69ed2eca1c473ab3e81fe99d54fd5b30

amcphail commented 9 years ago

To fix #5 add a RNGState-passing version of functions. Use an opaque RNGState. This will allow sampling from multiple distributions for each RNG setup bracket.

vixr commented 9 years ago

I'm getting odd behavior with the _v version of Bernoulli :

λ> random_1p Bernoulli 52344 0.9 1 λ> random_1p_v Bernoulli 523544 0.9 0 fromList [] λ> random_1p_v Bernoulli 523544 0.9 1 fromList [4294967297] λ> random_1p_v Bernoulli 523544 0.9 2 fromList [4294967297,4403557824] λ> random_1p_v Bernoulli 523544 0.9 3 fromList [4294967297,4294967297,4430595208]

Binomial looks okay though

λ> random_2p_v Binomial 523544 0.3 10 10 fromList [4,2,1,2,4,5,5,4,1,2]

amcphail commented 9 years ago

0.3.0.3: removed inline modifier from statistics-aux.c fixed CInt -> CUInt in discrete_1p_v FFI call

amcphail commented 9 years ago

Fix for #5 https://github.com/amcphail/hmatrix-gsl-stats/commit/155ba0c2b529b505668cb37216dd286035d5f6f3

0.4: added RNG type to pass between calls to RNG