Non-thread safe use of resampling machines

JuliaAI / MLJTuning.jl

Hyperparameter optimization algorithms for use in the MLJ machine learning framework

MIT License

67 stars 12 forks source link

Non-thread safe use of resampling machines #126

Closed ablaom closed 2 years ago

ablaom commented 3 years ago

Strictly speaking, as currently implemented, calling fit! on a resampling machine mach mutates the value of mach.model.resampling if resampling has a RNG as a field (all of them do, I believe). It may make sense to modify this behaviour by changing the interface point for RNGs in resampling (in MLJBase). However, in the meantime, I suggest we insert a deep copy on the right hand side of https://github.com/alan-turing-institute/MLJTuning.jl/blob/db5ab433c0570641eade702db0dca55379643dbe/src/tuned_models.jl#L447

cc @OkonSamuel

OkonSamuel commented 3 years ago

@ablaom. Yes, your right that's a bug. But a deep copy won't fix this bug as each thread would have the same random pattern, Although if GLOBAL_RNG is used then it would become threadsafe.

ablaom commented 3 years ago

Yes, good obseration. However this is desirable here. For example, in a Grid search, with Holdout(rng=4), I want to use the same holdout set for each hyperparameter value I am evaluating. Otherwise I'm not comparing apples with oranges. This fix won't completely arrange this (unless the number of threads is the same as the total number of model evaluations) and won't fix a similar issue when acceleration=CPU(), but that's essentially a bug in Resampler from MLJBase. I'm still thinking about the best option for doing this and have not posted an issue there yet.

ablaom commented 3 years ago

Update: The Resampler bug is fixed: https://github.com/alan-turing-institute/MLJBase.jl/pull/559