Open tigerinus opened 2 years ago
Thanks for pointing this out - it's a good idea and we'll add it soon!
In the meantime, you can set the max_rules
parameter in the FIGSRegressor
to some reasonable number (e.g. 12) to make training much faster!
Thanks for pointing this out - it's a good idea and we'll add it soon!
In the meantime, you can set the
max_rules
parameter in theFIGSRegressor
to some reasonable number (e.g. 12) to make training much faster!
After setting a max_rules
param, the fitting process ended up with a KeyError
:
KeyError Traceback (most recent call last)
<ipython-input-4-da8bf48375c8> in <module>
31 print(f'{clf_name} training time: {t2-t1} seconds')
32
---> 33 y_predicted = clf.predict(X_validate)
34 score_1 = metrics.mean_squared_error(y_validate, y_predicted)
35 #score_2 = metrics.mean_squared_log_error(y_validate, y_predicted)
~/usr/lib64/python3.8/site-packages/imodels/tree/figs.py in predict(self, X)
270 preds = np.zeros(X.shape[0])
271 for tree in self.trees_:
--> 272 preds += self.predict_tree(tree, X)
273 if self.prediction_task == 'regression':
274 return preds
~/usr/lib64/python3.8/site-packages/imodels/tree/figs.py in predict_tree(self, root, X)
306 preds = np.zeros(X.shape[0])
307 for i in range(X.shape[0]):
--> 308 preds[i] = predict_tree_single_point(root, X[i])
309 return preds
310
~/usr/lib64/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
~/usr/lib64/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 0
Since this is the only param I am not sure if it's something I missed or is it a flaw in the code. Let me know if I need to file a bug separately.
Ah thank you for pointing this out - indeed we will fix it on our end. The issue is that currently the FIGSRegressor predict
function expects a numpy array not a pandas dataframe. We will change the function to handle both types (if you want a quick work around, you can just use clf.predict(X_validate.values)
for now).
I have a training dataset of around 1.5m records. I was trying to get FIGSRegressor to fit it, and it's been running more than 2hrs without any indication about its progress.
It'd be great to have
verbose: int
param in the constructor to report what's happening within the fitting process based on the level (in int) passed to it.E.g.
Thanks.