EpistasisLab / tpot2

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
https://epistasislab.github.io/tpot2/
GNU Lesser General Public License v3.0
187 stars 26 forks source link

Replace remaining calls to legacy `np.random.seed()` #112

Open chimaerase opened 10 months ago

chimaerase commented 10 months ago

Thanks for all of your work on TPOT! I and my colleagues use it very often for a variety of synthetic biology research projects.

I'd like to ask that TPOT 2 accept / use only np.random.Generator objects as an alternative to int RNG seeds, and avoid using the global / legacy np.random number generator, even internally, if possible. A quick search through the TPOT2 code at the time of writing (11/17/23), shows this is largely already the case except for 3 remaining instances of the string "random.seed", which refer to the legacy np.random.seed().

Our code sometimes executes multiple TPOTRegressors in parallel, and TPOT 1's dependence on the global np.random generator has caused problems with repeatability. For example, if unpredictable OS-level thread scheduling changes the sequence of calls to the shared np.random.randint() or similar functions. There are workarounds, e.g. using subprocesses instead of threads, but IMO TPOT should be maximally flexible and ideally not require workarounds.

perib commented 5 hours ago

the next version addresses this with PR #156. np.random.see has been removed and everything should only rely on np.random.Generator for now.