InstituteforDiseaseModeling / laser

Light Agent Spatial modeling for ERadication
MIT License
2 stars 5 forks source link

Random seed #31

Open krosenfeld-IDM opened 1 month ago

krosenfeld-IDM commented 1 month ago

From 9a1ba81 it would be helpful to set a random seed to ensure replicability / reproducibility

clorton commented 1 month ago

Fixed in FUSION with 7234c55.

clorton commented 1 month ago

Verified that I get same results on a different machine with different CPU architecture (x64 vs. Apple M1) as long as I use nb.set_num_threads() to make sure the simulation is using the same number of threads.

krosenfeld-IDM commented 1 month ago

@clorton - thanks for taking a look at this and checking that your solution works on a different CPU architecture. I was curious whether just using a seed function failed? You are using np.random.seed which I do think would require passing the seed along. Is there a reason to not use random.random? I looked briefly for differences but didn't find anything except that random.random seems to be thread safe

If we use random.random the calls are significantly simpler.

"""
https://numba.readthedocs.io/en/stable/reference/pysupported.html#random
"""
import numba as nb
import numpy as np
import random
from pathlib import Path

ROOT_DIR = Path(__file__).resolve().parent
print(ROOT_DIR)

@nb.njit(parallel=True)
def seed(a):
    for i in nb.prange(nb.get_num_threads()):
        random.seed(a+i)

@nb.njit(
    (nb.float64[:],),
    parallel=True,
    nogil=True,
    cache=True,
)
def fill_me(arr):
    for i in nb.prange(len(arr)):
        arr[i] = random.normalvariate(mu=0, sigma=1)
    return

if __name__ == "__main__":
    seed(0)
    arr = np.zeros(100_000, dtype=np.float64)
    fill_me(arr)

    if not Path(ROOT_DIR / "random_numbers.npy").exists():
        np.save(ROOT_DIR / "random_numbers.npy", arr)
    else:
        ref_arr = np.load(ROOT_DIR / "random_numbers.npy")
        close = np.isclose(arr, ref_arr)
        print(f"Close: {close.sum()}/{len(close)}")
        print(ref_arr[:10])
        print("----")
        print(arr[:10])
        print("====")
        print(ref_arr[-10:])
        print("----")
        print(arr[-10:])
        assert np.allclose(arr, ref_arr)
krosenfeld-IDM commented 1 month ago

it's a bit hard to see what the solution was in the diff for the .ipyn but it looks like it was the same as for psyod where we call np.seed at the start of each function?

https://github.com/InstituteforDiseaseModeling/laser/blob/7234c55ddcd8a6cc67029f90c91029fd137bfe86/src/idmlaser/kmcurve.py#L115-L136

2 thoughts:

  1. Wouldn't this mean that the random number stream starts over each time you call the function?
  2. The random number streams are the same across all the other functions.

If so I'm not sure what the implications are if the random numbers are correlated/structured in this way.

Thanks - appreciate your taking a look at this!

krosenfeld-IDM commented 1 month ago

Hmm - I guess the issue becomes if you want to be using the numpy.random distributions then you won't be using the random.random generator.

krosenfeld-IDM commented 1 month ago

sorry re-opening because I still haven't figures out what happens with the 2 questions above

krosenfeld-IDM commented 1 month ago

It looks like maybe the prng still needs to be incorporated into the transmission loop?

https://github.com/InstituteforDiseaseModeling/laser/blob/2ce4764876daa91d69b2212c02d1bec3b5b8ec7e/nnmm/measles.py#L753