cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
63 stars 267 forks source link

Use correct random number gen, and make seeds reproducible. #2392

Open kosack opened 1 year ago

kosack commented 1 year ago

Describe the bug

Anywhere where random numbers are used should both use Numpy's new Generator API and also we should probably allow a configurable seed for reproducibility (and write the used seed to the provenance output as well).

We could define some core "services" of Tools/Components that have a shared config, with a common random number generator being one of them.

Supporting information

> ruff --select NPY check . --exclude=tests | grep random

ctapipe/fitting.py:79:14: NPY002 Replace legacy `np.random.choice` call with `np.random.Generator`
ctapipe/fitting.py:149:5: NPY002 Replace legacy `np.random.seed` call with `np.random.Generator`
ctapipe/image/modifications.py:73:9: NPY002 Replace legacy `np.random.seed` call with `np.random.Generator`
ctapipe/image/modifications.py:80:20: NPY002 Replace legacy `np.random.poisson` call with `np.random.Generator`
ctapipe/image/modifications.py:94:28: NPY002 Replace legacy `np.random.multinomial` call with `np.random.Generator`

Additional context Add any other context about the problem here.

kosack commented 1 year ago

I.e. there should be one random number generator with a configurable seed that can be used by all components.

maxnoe commented 1 year ago

Unfortunately, it's not that easy.

The occurences you found are in numba jitted code, which just supports one global rng via those "legacy" functions.

kosack commented 1 year ago

It does say this in the manual:

Numba supports numpy.random.Generator() objects. As of version 0.56, users can pass individual NumPy Generator objects into Numba functions and use their methods inside the functions. The same algorithms are used as NumPy for random number generation hence maintaining parity between the random number generated using NumPy and Numba under identical arguments (also the same documentation notes as NumPy Generator methods apply). The current Numba support for Generator is not thread-safe, hence we do not recommend using Generator methods in methods with parallel execution logic.

So as long as we don't use parallel=True, it should be able to use a centrally configured Generator. Haven't tested it though. I don't see a random.choice method in Generator though.

maxnoe commented 1 year ago

Ah, great, that's pretty new.