GlacioHack / geoutils

Analysis of georeferenced rasters and vectors
https://geoutils.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
86 stars 19 forks source link

Consistently use NumPy's default random number generator to avoid RAM usage issues from legacy call #536

Closed rhugonnet closed 6 months ago

rhugonnet commented 6 months ago

Turns out using the legacy NumPy random generator

x = np.random.choice()

or its equivalent for consistency in the random output:

rnd = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(random_state)))
rnd.choice()

and without replacement (replace=False) is leaking RAM usage by creating in memory an array the size of the sample size requested, which is not cleared after the function call. So, for instance, asking for 100 random points with a value between 0 and 1 billion, without replacement, will create an array of size 1 billion in memory which stays for a while. See https://github.com/numpy/numpy/issues/14169 and https://github.com/GlacioHack/xdem/discussions/501#discussioncomment-9149146. And we are currently doing this everywhere :scream:

We need to replace everywhere in GeoUtils and xDEM by:

rnd = np.random.default_rng(seed=random_state)