EducationalTestingService / skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
http://skll.readthedocs.org
Other
551 stars 67 forks source link

Make new regressors available #256

Closed desilinguist closed 6 years ago

desilinguist commented 9 years ago

It would be nice to expose the following regressors to SKLL since they can be quite useful in the real world:

linear_model.BayesianRidge  Bayesian ridge regression
linear_model.ElasticNet Linear regression with combined L1 and L2 priors as regularizer.
linear_model.ElasticNetCV Elastic Net model with iterative fitting along a regularization path
linear_model.Lars    Least Angle Regression model a.k.a.
linear_model.LarsCV   Cross-validated Least Angle Regression model
linear_model.LassoCV  Lasso linear model with iterative fitting along a regularization path
linear_model.LassoLars    Lasso model fit with Least Angle Regression a.k.a.
linear_model.LassoLarsCV  Cross-validated Lasso, using the LARS algorithm
linear_model.LassoLarsIC  Lasso model fit with Lars using BIC or AIC for model selection
linear_model.LogisticRegressionCV   Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.RidgeCV    Ridge regression with built-in cross-validation.
linear_model.lars_path  Compute Least Angle Regression or Lasso path using LARS algorithm 
linear_model.lasso_path Compute Lasso path with coordinate descent
linear_model.lasso_stability_path   Stabiliy path based on randomized Lasso estimates

Perhaps we can future-proof this in a way so that it's easy to add new models as they are released in subsequent versions of scikit-learn?

dan-blanchard commented 9 years ago

I would say it already is very easy to add new learners. You just need to:

  1. Import them in learner.py
  2. Add the default parameter grid to the _DEFAULT_PARAM_GRIDS dict. One of our main selling points is that we "put some thought" into what these should be, so this can't really be automated that much.
  3. Add a rescaled version of the appropriate class. This is the only part that I think we could really make simpler. We could just replace all of these lines with:

    # Convert items to list to prevent exception about modifying while iterating
    for name, class_ in list(globals().items()):
       if isinstance(class_, type) and class_ != RegressorMixin and issubclass(class_,
                                                                               RegressorMixin):
           rescaled_name = 'Rescaled{}'.format(name)
           globals()[rescaled_name] = rescaled(class_)
desilinguist commented 8 years ago

What's the status on this, guys?

desilinguist commented 6 years ago

Addressed by #377.