Closed mb52089 closed 4 years ago
correction: 3 independent variables and 1 dependent variable, not 4 independent variables.
Hey @mb52089, thanks for the report. Can you give more details about what you mean by "incorrect" prediction? It's different than linear regression in another language, or the error is high?
Thanks Andrew. We're predicting the % utilization of a resource on the day of service based on the % utilization x days in advance, the duration of the resource in minutes and the day of the week. The predicted value should be between 0 and 1. In the particular test example we're using, the predicted value should be around 66%. We get that value when we use the lightgbm algorithm, but when we use the linear regression we get -1.4 which is a value that doesn't make sense giving the context and the training data. However, if I remove the "day of week" categorical variable and re-run the prediction using the linear regression algorithm, I get a prediction in range. I wasn't sure if the gem deals with categorical variables differently in the linear regression than in the lightGBM algorithm. The data set has around 150 rows of independent variables.
and this is all done in ruby/rails.
If it's not too sensitive, paste the model summary and PMML here or send it to me over email (on my GitHub profile)?
puts model.summary
puts model.to_pmml
I just ran the model summary for the error condition:
Math::DomainError: Numerical argument is out of domain - "sqrt" from /Users/michaelburke/.rvm/gems/ruby-2.6.5@copient_health_rails6/bundler/gems/eps-509da754d6e9/lib/eps/linear_regression.rb:186:in `sqrt' [4] pry(main)>
The model summary after I remove the categorical variable week_day: => "Validation RMSE: 0.14\n\n coef p\n_intercept 0.42 0.094\nday_in_advance_util 0.54 0.000\nblock_minutes -0.00 0.932\n\nadjusted r2: 0.330\n"
just sent to your chartkick email. I didn't know you were the author of chartkick. It's great too!
To close the loop: the issue was likely related to multicollinearity, which can produce an unstable solution (the link provides a good explanation). One way to counteract this is to use GSL, which uses a different algorithm to produce a more stable solution.
Going to reopen this until the model.summary
error is fixed. @mb52089, can you paste the output of:
model.send(:diagonal)
for a model where you're seeing Math::DomainError: Numerical argument is out of domain - "sqrt"
?
when I try to run model.send(:diagonal) I get the following error:
NoMethodError: undefined method `diagonal' for
from /Users/michaelburke/.rvm/gems/ruby-2.6.5@copient_health_rails6/bundler/gems/eps-509da754d6e9/lib/eps/model.rb:62:in `method_missing'
On Wed, Dec 4, 2019 at 10:59 PM Andrew Kane notifications@github.com wrote:
Going to reopen this until the model.summary error is fixed. @mb52089 https://github.com/mb52089, can you paste the output of:
model.send(:diagonal)
for a model where you're seeing Math::DomainError: Numerical argument is out of domain - "sqrt"?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ankane/eps/issues/12?email_source=notifications&email_token=AANV5YRRW6QLC7NLSHU6LYLQXB4DBA5CNFSM4JU2BCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7MX4I#issuecomment-561957873, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANV5YUGBUE7WHVOFWQWUSDQXB4DBANCNFSM4JU2BCJA .
-- Michael Burke 404.271.8652 LinkedIn https://www.linkedin.com/in/michael-burke-6418681/
My bad, it should be:
model.instance_variable_get("@estimator").send(:diagonal)
Here you go:
=> [0.0005296860721842933, 0.0066308112665816495, 1.3595352803866229e-09, 0.0012121905646438054, 0.0312576935042156, 0.014730636756303176]
On Thu, Dec 5, 2019 at 6:47 AM Andrew Kane notifications@github.com wrote:
My bad, it should be:
model.instance_variable_get("@estimator").send(:diagonal)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ankane/eps/issues/12?email_source=notifications&email_token=AANV5YSRW6MRRKAQSJKBW63QXDS4JA5CNFSM4JU2BCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGAOMEI#issuecomment-562095633, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANV5YVDGL3TKXKADTJHBITQXDS4JANCNFSM4JU2BCJA .
-- Michael Burke 404.271.8652 LinkedIn https://www.linkedin.com/in/michael-burke-6418681/
Thanks. This is from the model that errors on the summary? I'm unable to reproduce with those numbers.
Now that I have installed GSL, I can't seem to reproduce the error when I do the linear regression. Do you want me to uninstall GSL and see if I can reproduce?
Yeah, GSL changes the code path, so you'll want to recreate the initial conditions.
Here you go. After removing the gsl gem and re-bundling, I ran @model.instance_variable_get("@estimator").send(:diagonal) from a model that generated the following error when running @model.summary: Math::DomainError: Numerical argument is out of domain - "sqrt". Here's the output:
[-666372359695044.8, 1.0761875986711336, -3777621086.706599, 0.19673979554666882, -339985897803588.5, 2.390797741078714]
Thanks @mb52089, fixed the error message for unstable solutions. Pushing out a new release in a few with all the fixes we discussed. Thanks for the help!
No problem at all. Thanks for all the great gems!
We have a categorical variable for day_of_week as one of 4 independent variables in our model. The LightGBM algorithm works correctly but when I force the model to use the linear regression algorithm, the resultant prediction is incorrect. If I subsequently remove the categorical variable, the linear regression algorithm gives an accurate prediction. Here's an example of what our data set looks like:
{:day_of_service_util=>0.80952380952381, :day_in_advance_util=>0.714285714285714, :block_minutes=>420.0, :week_day=>"Fri"}, {:day_of_service_util=>0.69047619047619, :day_in_advance_util=>0.214285714285714, :block_minutes=>420.0, :week_day=>"Mon"}, {:day_of_service_util=>0.80952380952381, :day_in_advance_util=>0.238095238095238, :block_minutes=>420.0, :week_day=>"Mon"}, {:day_of_service_util=>0.80952380952381, :day_in_advance_util=>0.238095238095238, :block_minutes=>420.0, :week_day=>"Mon"}
day_of_service_util is the Target dependent variable.
Thanks for this great gem!