Open juliuskittler opened 4 years ago
Thanks for the detailed bug report. It makes it much easier for us to pin down the issue.
Possibly, the controls u, which are provided to the fit function for cross validation, are not passed on to these other functions (predict and/or score) during cross validation?
You're exactly right. I will have to investigate potential solutions.
@juliuskittler @briandesilva Did you find a solution? I'm facing the same issue.
@slimeth I don't think there is any solution yet for Grid Search with CV in scikit-learn. I switched to using Bayesian Optimization for hyperparameter tuning: https://optuna.org
I have been slow about implementing a solution, partly because it would require nontrivial changes to the API. I'll prioritize getting out a fix, but it may live in a separate branch for the time being.
@slimeth I don't think there is any solution yet for Grid Search with CV in scikit-learn. I switched to using Bayesian Optimization for hyperparameter tuning: https://optuna.org
I'd like to second the suggestion to use Bayesian optimization. It's a much more efficient approach than grid or random search. Thanks, @juliuskittler, for suggesting it!
@juliuskittler Optuna works like a charm! Thank you for this suggestion!
@juliuskittler @slimeth : Is it possible to share a code snippet on use of optuna for SINDy hyperparameter tuning with control variables? I have the same issue.
I believe this relates to SINDy()
not implementing the full Estimator interface. It is not quite clear in the documentation, but score()
cannot implement any other arguments besides x and y. The code for GridSearchCV.fit
bundles up extra fit_params
and sends to _fit_and_score()
, but the latter only sends these params to fit()
and not score()
.
The design consideration here is that this is that meta-estimators (e.g. Pipeline
, GridsearchCV
) do not have an effective way to pass anything other than X
and y
to their constituent estimators. This problem was discussed most extensively in the context of models which need to know sample_weights
in both fitting and scoring within a gridsearch (issue).
The solution is probably metadata routing, which was added to scikit-learn after this issue was created. It allows passing information other than X
, y
to steps of meta-estimators. I don't fully understand how to implement it yet, but this touches a lot of other issues that deal with pysindy's extra data-dependent parameters (e.g. constraints, controls, t). See SLEP006 for more info.
When you provide a control variable
u
to fit SINDy with cross validation from scikit-learn, there is an error:TypeError: Model was fit using control variables, so u is required
.It seems like this error occurs in the predict and/or score functions: https://pysindy.readthedocs.io/en/latest/_modules/pysindy/pysindy.html
Possibly, the controls u, which are provided to the fit function for cross validation, are not passed on to these other functions (predict and/or score) during cross validation?
Reproducing code example:
This code example is taken from here: https://pysindy.readthedocs.io/en/latest/examples/4_scikit_learn_compatibility.html#cross-validation
However, I have added 1 constant control variable u to show that the code fails when I pass u as argument:
search.fit(x_train, u = u_train)
. I have highlighted the relevant rows with the comment "RELEVANT ROW".Error message:
PySINDy/Python version information:
'1.2.0'