cog-imperial / OMLT

Represent trained machine learning models as Pyomo optimization formulations
Other
257 stars 56 forks source link

OMLT doesn't work with a multiclass classification problem and lightgbm #107

Closed DanieleMorotti closed 1 year ago

DanieleMorotti commented 1 year ago

It seems that OMLT doesn't work when using lightgbm with a multiclass classification problem (4 classes). I initialized the lightgbm instance as:

lgb_params = {"learning_rate": 0.05, "num_iterations":200, "early_stopping_round":50, "max_bin": 30, "num_leaves": 30, 
                        "lambda_l1": 0.3, "random_state":42, "force_row_wise":True, "objective":"multiclass", 
                        "metric":['multi_error', 'multi_logloss'], "num_class": 4 }

lgb_model = lgb.LGBMClassifier(**lgb_params)

and I used the same code that you can find in docs/notebooks/bo_with_trees.ipynb to get the ONNX model and to build the tree, it returns the following error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[32], line 13
     10 for i in range(x_test_np.shape[1]):
     11     input_bounds[i] = (float(lb[i]), float(ub[i])) 
---> 13 add_tree_model(opt_model, onnx_model, input_bounds)

Cell In[29], line 8, in add_tree_model(opt_model, onnx_model, input_bounds)
      5 def add_tree_model(opt_model, onnx_model, input_bounds):
      6     # init omlt block and gbt model based on the onnx format
      7     opt_model.gbt = OmltBlock()
----> 8     gbt_model = GradientBoostedTreeModel(onnx_model, 
      9                                          scaled_input_bounds=input_bounds)
     11     # omlt uses a big-m formulation to encode the tree models
     12     formulation = GBTBigMFormulation(gbt_model)

File [...\lib\site-packages\omlt\gbt\model.py:22](file:///.../lib/site-packages/omlt/gbt/model.py:22), in GradientBoostedTreeModel.__init__(self, onnx_model, scaling_object, scaled_input_bounds)
     20 self.__model = onnx_model
     21 self.__n_inputs = _model_num_inputs(onnx_model)
---> 22 self.__n_outputs = _model_num_outputs(onnx_model)
     23 self.__scaling_object = scaling_object
     24 self.__scaled_input_bounds = scaled_input_bounds

File [...\lib\site-packages\omlt\gbt\model.py:66](file:///.../lib/site-packages/omlt/gbt/model.py:66), in _model_num_outputs(model)
     64 """Returns the number of output variables"""
     65 graph = model.graph
---> 66 assert len(graph.output) == 1
     67 return _tensor_size(graph.output[0])

AssertionError:

that it seems to be related to the number of classes. I solved the problem using a simple PyTorch neural network and OMLT works well in that case, but I was wondering if I did something wrong using the lightgbm model.

The version of lightgbm is 3.3.5 and the omlt version is 1.1.

Thanks for your attention.

tsaycal commented 1 year ago

Thanks for the question. OMLT unfortunately does not currently support multi-output GBT models, so it looks like your model is correctly failing the assertion error of having more than one output. We do provide support for multi-output neural network models as you noted. Please see some previous discussion on potential ways to extend this here: https://github.com/cog-imperial/OMLT/discussions/90