Multiple polynomial regression support

mkuennek commented 7 years ago

What would you like to submit? (put an 'x' inside the bracket that applies)

[x] question
[ ] bug report
[ ] feature request

Issue description

Hi,

Are there any plans of adding multiple polynomial regression to the framework? I am currently investigating which regression model to use for my thesis and multiple polynomial regression is missing. I would implement it myself but my knowledge about statistics and machine learning is not the best.

Greetings

P.S.: Fixed title

cesarsouza commented 7 years ago

Hi @mkuennek,

Do you mean Polynomial regression with multiple independent variables, i.e. multiple inputs and one single output?

If yes, this should be simple to do. You can either transform your inputs to become polynomials by considering all their possible combinations of a particular degree, or maybe you can repurpose the static Transform method of the Polynomial kernel class to transform your inputs for you. After you transform your inputs to this new polynomial space, then you should be able to apply the usual OrdinaryLeastSquares to obtain a MultipleLinearRegression that would be equivalent to a possible MultiplePolynomialRegression.

An example would be:

// Let's say your current inputs and outputs are in the variables x and y below:
double[][] x = ... 
double[] y = ...

// First, transform your inputs to polynomial space
double[][] z = Polynomial.Transform(x, degree: 2, constant: 0);

// Now, create an usual OLS algorithm
var ols = new OrdinaryLeastSquares()
{
    UseIntercept = true
};

// Use the algorithm to learn a multiple regression
MultipleLinearRegression regression = ols.Learn(z, y);

// Check the quality of the regression:
double[] prediction = regression.Transform(z);

double error = new SquareLoss(expected: outputs).Loss(actual: prediction);

However, you might want to compare it out against some other implementation or textbook example just to make sure the results indeed match. It is possible the use of constants in the code above (i.e. in both Polynomial and OrdinaryLeastSquares) might have to be adjusted so the constant is not applied twice.

Hope it helps, Cesar

CatchemAL commented 7 years ago

Hi @mkuennek,

Would you be able to give a bit more detail on what problem you are trying to solve? I am not completely clear from the title/body whether you are trying to solve a multiple linear regression problem or a polynomial regression or possibly something else.

If it's one of the first two, I might be able to post a code snippet to help you out. If it's something else, I might need to defer to someone more knowledgeable than me!

Thanks Alex

CatchemAL commented 7 years ago

@cesarsouza too fast... ;)

mkuennek commented 7 years ago

Hi @cesarsouza and @AlexJCross,

Sorry for not beeign more specific. But as you already correctly guessed I have a supervised learning problem with multiple independent input variables and one output variable and I want to try out Polynomial regression. So similar to multivariate linear regression but with a polynomial.

The approach proposed by @cesarsouza looks very interesting. I will try that. Thanks for the help!

Greetings, Michael

mkuennek commented 7 years ago

I tried the proposed approach and while it looks nice in theory, it does not perform well in practice. The reason is that by transforming the inputs into into the polynomial space, the number of independent variables grows exponentially with the degree of the polynomial. This gets out of hand fast so that already for a polynomial of degree 4 in my case, the regression algorithm took very long to predict values (I stopped after some minutes).

cesarsouza commented 7 years ago

Hi @mkuennek,

Thanks for the feedback! This is actually the way a polynomial regression would normally be computed (i.e. in sklearn you would use PolynomialFeatures to transfer the data first and then use and a LinearRegression to fit them).

However, there are some other things we could try:

Instead of considering all interactions between your input variables, you could choose only some of them. However, it would be necessary to augment your feature vectors to this new polynomial space by hand.
Instead of learning a linear regression on polynomial transformed features, we can learn a kernel linear regression on a implicit transformation of the features. You can achieve that using the SequentialMinimalOptimizationRegression class with a Polynomial kernel (using the same class you were using to transform the data before). However, you might want to review the literature on regression with kernel SVMs first, as they will most likely not provide exactly the same results as a polynomial linear regression trained through least squares.

But by the way, there is also a part that maybe I didn't get right from your last post: did you mention that the code became too slow to predict values? Or do you mean the model got too slow to train? While it is understandable that the model would take longer to train given the combinatoric explosion in the number of inputs, the evaluation time should certainly not take that long.

How many rows (samples) and columns (dimensions) do you have in your problem?

Regards, Cesar

mkuennek commented 7 years ago

Hi @cesarsouza ,

Thanks for the clarification! I might check the approach from your second bullet point. Nevertheless, it was just for trying out PolynomialRegression. In the end I will probably use Neural Networks, as they provided really good results in my case.

Sorry, the formulation was not very specific. The code became slow during the learning part and not when predicting. My data contained 10 columns and about 200 rows. So the approach is probably not suitable for my problem and I should use neural networks instead.

Thanks for the help, again!

Greetings, Michael

accord-net / framework

Multiple polynomial regression support #895