ankane / eps

Machine learning for Ruby
MIT License
652 stars 15 forks source link

runtime error: Unknown Label with lightgbm algorithm #13

Closed mb52089 closed 4 years ago

mb52089 commented 4 years ago

I'm receiving a runtime error: Unknown Label when I use the lightgbm algorithm in certain circumstances, but not when I use the linear regression algorithm - on the exact same data set. Here's the full error:
RuntimeError: Unknown label: Tue from /Users/michaelburke/.rvm/gems/ruby-2.6.5@copient_health_rails6/bundler/gems/eps-509da754d6e9/lib/eps/label_encoder.rb:28:in `block in transform' The name of the label varies with different models' error messages. And SOME of the lightgbm models actually build without error, but others fail every time, depending on what filter of the dataset I use to build the model.

ankane commented 4 years ago

Hey @mb52089, that error message isn't great, so here's an explanation of what's going on:

Internally, Eps splits your data into a training and validation set to give you a better idea of performance. With LightGBM, categorical features are encoded to integers before being passed to the library. The mapping is generated from the training set and then used on the validation set.

This error occurs when the validation set contains values that aren't present in the training set (for instance, if the training set only had Monday and Tuesday but the validation set also had Wednesday). In this case, there's no value to map it to, hence the error.

I'm hoping to automatically handle this in the future, but the best options now are either:

  1. Disable the validation set (Eps::Model.new(split: false))
  2. Pass your own validation set with no unseen values

Linear regression uses a different method of mapping categorical features which doesn't have this limitation.

ankane commented 4 years ago

It looks like LightGBM can handle unseen values in the validation set, so just pushed a fix.