CDCgov / ixa

Apache License 2.0
8 stars 0 forks source link

Added random module #18

Closed k88hudson-cfa closed 2 months ago

k88hudson-cfa commented 2 months ago

This PR adds a module to support creating and using random generators. It's implemented as a trait on context, with a DataPlugin to store a base random seed and a hashmap of rng instances. The design supports any rng implementation that implements rand::SeedableRandom.

The basic idea is you initialize the utility with a base random seed when setting up a context:

context.set_base_random_seed(12345678);

In a module that needs an independent random number generator, you use a macro to define an independent type, and then call context.get_rng::<YourType>() to retrieve it. For example:

define_rng!(FooRng)

let mut rng = context.get_rng::<FooRng>();
let value = rng.next_u64();

Note that an rng needs a mutable reference because it has to store internal state about previous calls in order to be reproducible.

One difference from the eosim implementation here is that I destroy existing rngs if base_seed is changed rather than using a flag on the holder.

k88hudson-cfa commented 2 months ago

So I gave explicit initialization a try (implementation here if you're curious: k88hudson_rng_explcit_init), which ends up looking something like this:

In your init function of the module, you call create_rng:

init() {
  // ...
  context.create_rng::<FooRng>();
}

And then later you call get_rng as normal:

some_other_method() {
  let mut foo_rng = context.get_rng::<FooRng>();

  foo_rng.next_u64();
  // ...
  foo_rng.next_u64();
}

The thing I don't like about this is that if you want to reset the base random seed, you have to deal with reinitialization (like maybe iterating through existing ones and replacing them?), which seems kind of not great

k88hudson-cfa commented 2 months ago

I spent a bunch of time but I couldn't figure out how to get this to work with interior mutability for the RngHolders which also having the whole HashMap in a RefCell, so I think I'm just gonna leave it as is for now

ekr-cfa commented 2 months ago

I spent a bunch of time but I couldn't figure out how to get this to work with interior mutability for the RngHolders which also having the whole HashMap in a RefCell, so I think I'm just gonna leave it as is for now

Should this say "without also"?

k88hudson-cfa commented 2 months ago

To summarize some discussion: one thing we should consider here is that we will mostly be using rngs in combination with some kind of distribution/sample function. These may be created in place by a module (like the Exp example below), and sometimes they might need to be stored as a global variable / in a data container (because they are pretty large / used in multiple places or whatever).

The design in this PR returns a RefMut you have to deal with that when you pass it to a sample function, which is maybe not ideal:

let mut foo_rng = context.get_rng::<FooRng>();
let dist = Exp::new(1 / bar).unwrap();
let value = dist.sample(&mut *foo_rng);

@jasonasher did some prototyping of an alternative design where instead of the context extension providing a get_rng method, it implements a sample that retrieves the rng internally:

let dist = context.get_data_container::<SamplerData>().unwrap();
let value = context.sample::<FooRng, usize>(|rng| dist.sample(rng));
// Aternatively, pass in the whole distribution
let value2 = context.sample_distr::<FooRng, usize>(dist);

I don't this hate honestly, if you're pretty much always using rngs this way

jasonasher commented 2 months ago

In my experience the most common way we use random sampling is either with distributions like this or by grabbing random f64/f32s to make future decisions, such as setting individual propensities to engage in various behaviors. So having a general sample method and then sample_distr and sample_xyz for various primitive xyz would cover all of the common use cases. I'd be interested to know if @confunguido agrees.

confunguido commented 2 months ago

In my experience the most common way we use random sampling is either with distributions like this or by grabbing random f64/f32s to make future decisions, such as setting individual propensities to engage in various behaviors. So having a general sample method and then sample_distr and sample_xyz for various primitive xyz would cover all of the common use cases. I'd be interested to know if @confunguido agrees.

@jasonasher. I think I agree. Sampling from distribution, making decisions, and grabbing a random individual from a group. As a side note, I think it would be nice to have the ability to specify and sample from custom probability distributions.

k88hudson-cfa commented 2 months ago

Alright, I think this is ready: updated this to expose sample and sample_distr – also added something to the macro to prevent name collisions.

jasonasher commented 2 months ago

@jasonasher. I think I agree. Sampling from distribution, making decisions, and grabbing a random individual from a group. As a side note, I think it would be nice to have the ability to specify and sample from custom probability distributions.

@confunguido These would be supported right out of the box with sample_distr: rand_distr. And, by implementing the Distribution\<T> trait we can support custom distributions.

k88hudson-cfa commented 2 months ago

So @jasonasher came up with a solution in https://github.com/CDCgov/ixa/tree/jma_rand_runtime_collision for runtime collision checking – the thing I don't like about it is that it's easy to forget to update the field that caches name collisions given the current architecture of data containers, so I think i'd still leave this as is (unless we really don't want to require paste)

jasonasher commented 2 months ago

I don't think requiring paste is much of an issue - I was more reacting to handling people potentially not using the macro. My instinct is we don't have much of an update problem here for the name field caching, as it's handled in one place and not obviously likely to change much in the future, but I could be missing something and defer to your and @ekr-cfa's judgment here.

ekr-cfa commented 2 months ago

I would prefer a compile time check even if it's imperfect. I think if people color this far outside the lines, then they can suffer.