Open baraline opened 2 months ago
I remember we looked at this in some detail when debugging some rocket features, was there some interaction with its use of prange? I cant remember!
def _static_transform_uni(X, parameters, indices):
"""Transform a 2D collection of univariate time series.
Implemented separately to the multivariate version for numba efficiency reasons.
See issue #1778.
"""
n_cases, n_timepoints = X.shape
(
_,
_,
dilations,
n_features_per_dilation,
biases,
) = parameters
n_kernels = len(indices)
n_dilations = len(dilations)
f = n_kernels * np.sum(n_features_per_dilation)
features = np.zeros((n_cases, f), dtype=np.float32)
for i in prange(n_cases):
I think adding this parameter to so many APIs would clutter the parameter lists. What are your thoughts on using joblib.parallel_config
?
This means that all aeon code does not specify any parallel backend in their calls to joblib.Parallel
(if it does not require a certain backend) and the user can choose its backend by wrapping calls within:
with joblib.parallel_config(backend="loky"):
# aeon code
The same is actually possible for n_jobs
etc.
True, forgot about this option ! That or we set a global variable such as AEON_JOBLIB_BACKEND
and document its usage
A custom env var AEON_JOBLIB_BACKEND
IMO just makes sense if it is also used for other parallel stuff besides joblib. Otherwise, we can use the existing (documented and potentially already known) joblib-facilities.
Sounds like a good idea, would have to be documented though. Not sure about removing n_jobs
(not that it was really suggested), but would be good to tidy up the other bits.
I'm OK with the use of the parallel_config
option and let user warp estimators if they want to change from the default. But if we want to default to threads, we need to do the following, which look like parallel_config
doesn't overwrite :
with parallel_config(backend='loky'):
p = Parallel(backend='threading')
print(p._backend)
You obtain <joblib._parallel_backends.ThreadingBackend object at 0x000001D91BB731F0>
This means that if we want the parallel_config
option to work, we would need to use default Parallel()
, which use the loky backend. This is exactly the one we want to move away from for multiple reasons (check https://scikit-learn.fondation-inria.fr/joblib-sprint/ for some).
Or did I miss something ?
ah, damn ... yes, Parallel()
uses the loky-backend by default. Then, I don't see how parallel_config
would help us here.
So for options we have :
With an equal level of documentation, I think the first options would allow us to be more flexible if for whatever reason we need to use processes for some estimators and threads for others. Also it should be less prone to causing issues with existing methods.
But true, this adds a bit of parameter bloat to the API... We could introduce a dictionary regrouping all joblib params and use it as kwargs when calling parallel ?
Other ideas are welcome !
I'm also in favor of 1. How many joblib parameters do we have? If there are just 2, I would not use kwargs for that.
Describe the issue
Some of the estimators that use joblib for parallelization use process-based backend, while other use threads-based backend. Ideally, we want this to be a parameter tunable by the users.
Suggest a potential alternative/fix
Including a
joblib_backend
parameter which would default tothreading
(from discussions with Matthew), and use this parameter to set the joblibbackend
parameter duringParallel
calls would fix the issue.