JuliaStats / MultivariateStats.jl

A Julia package for multivariate statistics and data analysis (e.g. dimension reduction)
Other
377 stars 86 forks source link

Provide StatsAPI interface for regression #171

Open wildart opened 2 years ago

wildart commented 2 years ago

Currently, regression algorithms implemented as stand-alone functions, while other methods use StatsAPI interface, i.e. fit/predict.

We should have properly derived types from StatsAPI.RegressionModel and corresponding implemented interface for various regression algorithms.

kescobo commented 2 years ago

Responding to question in #109 - @wildart I'd be happy to take a stab at it, if there's a well defined API / clear instructions for implementation. I'm afraid I'm not that familiar with most of the methods in this package or with StatsAPI, but if there's a regular structure, I can probably figure it out.

wildart commented 2 years ago

Basically, every algorithm in this package has fit method for building a model, see StatisticalModel , and predict for predicting response of a model, see RegressionModel. These two methods are a bare minimum what is required for the regression implementation. The rest of the interface could be approached later.

So, there need to be defined a type derived from RegressionModel that would hold the model parameters, regression coefficients. The fit method would call existed implementation, ridge, and form an object of the model type. The predict method should form prediction given the model parameters. You can look at other algorithms' implementations for guidance, e.g. PCA.

kescobo commented 2 years ago

That makes sense. As I said, I'm happy to take a stab, though realistically it's unlikely to be in the next week or two - I'm teaching this semester and need to get a lot more prep done. If there's not a rush on it, I can definitely tackle it by ~mid February.

wildart commented 2 years ago

Any help is appreciated at any time.

kescobo commented 2 years ago

Looking at this a bit more closely today, I do not think I'm the right person for this job, sorry! I feel like if I had a strong handle on the package interface OR the statistical methods, I could use one thing to reason about the other. But being a novice on both, even using your hints above, I'm not sure how to get started :-(

wildart commented 2 years ago

For minimal implementation, you would need to

  1. Define a type for a regression model, derived from StatsAPI.RegressionModel, e.q. OLS that will hold the coefficients of the OLS regression model.
  2. Define a fit function that accepts three parameters: OLS type, independent x and dependent y variables. This function will execute llsq which will calculate a regression model parameters, and return an OLS object.
  3. Define a predict function, see it description here: https://github.com/JuliaStats/StatsAPI.jl/blob/00ce15f034e7ffdf16ec988766246755fcab47c4/src/regressionmodel.jl#L74-L81

The data parameters should be of AbstractMatrix or AbstractVector types. You may want to include generic placeholder for kw-arguments to path through parameters to llsq call.

The rest of the methods for RegressionModel are optional at this point. Feel free to implement any of them. See any method implementation in this repo: MDS, PCA, etc.