Some classes for preprocessing like WhitenRecording,NormalizeByQuantileRecording,ZScoreRecordingare using internallyget_random_data_chunks()`.
This make the end user experience easier but this is rather bad for:
reproducibility
parralel processing
For parralel processing particularly it is really really bad because:
every worker make a differents random and so the noise or covariance matrix is not the same across workers.
And so the n_jobs>1 make different results for every run.
the startup of every worker can be super long when n_jobs is very high because each worker is fighting for CPU ressource
for all inversing a covaraince matrix for instance.
We have a way to store _kwargs but we should have another way (other dict) that would enable to restore the class very quickly in the same state without any random in between.
How close are we to this? The issue is two years old and I doubt we really look at it. It is an ongoing effort, but I think we've made huge progress no?
Some classes for preprocessing like
WhitenRecording,
NormalizeByQuantileRecording,
ZScoreRecordingare using internally
get_random_data_chunks()`.This make the end user experience easier but this is rather bad for:
For parralel processing particularly it is really really bad because:
We have a way to store _kwargs but we should have another way (other dict) that would enable to restore the class very quickly in the same state without any random in between.