aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
https://pydvl.org
GNU Lesser General Public License v3.0
100 stars 8 forks source link

Explicit randomization for subprocesses #392

Closed kosmitive closed 1 year ago

kosmitive commented 1 year ago

As sub-processes fork the random generator from their parent process, it might happen that different sub processes obtain the same sequence of random numbers in different processes. Depending on the backend it might be an issue. In general it would be advantageous to make it explicit, e.g.

def set_time_pid_seed():
    t = int(os.getpid() + time.time())
    np.random.seed(t)
    random.seed(t)

or even better, to pass different seeds from the parent process to the forked child processes. The latter would give us reproducability, see also (https://joblib.readthedocs.io/en/latest/auto_examples/parallel_random_state.html)

AnesBenmerzoug commented 1 year ago

This should be related to #242