MaxHalford / vose

Cython implementation of Vose's Alias method
MIT License
5 stars 5 forks source link

[Feature Request] Make seeding instance specific #2

Closed Theomat closed 3 months ago

Theomat commented 2 years ago

Hello,

The following simple test fails:

import vose

probs = np.array([0.5, 0.5])
a = vose.Sampler(probs, seed=0)
b = vose.Sampler(probs, seed=0)
for _ in range(10000):
    assert a.sample() == b.sample()

This can be explained by the fact that all instances of vose.Sampler share the same random number generator. This is counter-intuitive and greatly limit the use of seeding. Could you consider changing the behaviour so that the above test succeeds?

MaxHalford commented 2 years ago

Hello there.

Mmm I'd be open to doing that, but I don't see any easy way to do that.

May I ask why you're looking to see multiple samplers? Just curious.

Theomat commented 2 years ago

Well since Cython supports C++, it is possible to do something like this: declaration, instanciation, c++ reference for example which seems relatively simple.

Typically, we want to generate high level objects that need a sequence of integers to be randomly generated. For example we sample x1 ~ S1 then the next sampled element has a conditional probability based on the last element sampled i.e. P(X=x2 | x1) which means at least two samplers with different probability vectors. Another use case is simply a high level objects made of small objects, consider generating (int, bool), then what we do is create an integer lexicon from which the first sampler samples with some probabilities and we have another sample that just samples a boolean again with some other probabilities. In both cases, we want to generate objects in a reproducible manner, which means that two high level samplers should generate exactly the same high level objects at the same time. Moreover, those high level samplers may be instantiated at different times which means one created later could just reseed another that was used before and currently used whereas this should not have any effect. Finally, our different samplers may not share the same seed, we have no guarantee on that.

MaxHalford commented 2 years ago

I was playing devil's advocate. Your explanation makes a lot of sense. I'm aligned!

You seem like you know what you're doing. Do you feel comfortable implementing the necessary changes in a pull request?