Open annaveronika opened 5 years ago
Hey Anna, it would be great to add in. Not sure if this solution works for catboost but someone else found a way to add more operators not in the default config.
If you want only numerical features then it can be done the same way as XGBoost, which is already included. If you want categorical features then you need to do a little bit more - you need to pass parameter with categorical feature indices to estimator creation or to the fit function.
is there an update on this? if not, i would probably look into it over the next months
@jhmenke we don't have updates on this so far. Please let us know your findings. Thanks.
Hi guys, I've tried the proposed way of only running tpot over catboost shown in #407 ,and modified it accordingly for a classifier problem. However, tpot still goes through other models instead of only catboost. Not sure if anyone also had the same issue.
Hi guys, I've tried the proposed way of only running tpot over catboost shown in #407 ,and modified it accordingly for a classifier problem. However, tpot still goes through other models instead of only catboost. Not sure if anyone also had the same issue.
Can you post your classifier dict and code sample? For me the method did work, but the issue right now is that there is no feasible way of passing the cat_columns to catboost.
It is now possible to pass cat_features together with other training parameters. So there should be no problem with them.
Then this should suffice, to be added to the default regressor dict (analogous for classifier)
cat_features = [...] # e.g. features.select_dtypes(include=["category"])
if 'catboost' in sys.modules.keys():
from sklearn.base import RegressorMixin
from catboost import CatBoostRegressor
CatBoostRegressor.__bases__ += (RegressorMixin,)
regressor_config_dict['catboost.CatBoostRegressor'] = {
'logging_level': ['Silent'],
'cat_features': [cat_features],
}
So is there a plan to add CatBoost? It now supports text features along with categorical ones.
do the catboost classes, e.g., CatBoostRegressor now derive from the sklearn RegressorMixin?
Afterwards catboost could simply be added to the default configs in tpot.
No, but I think all the needed methods should be in place.
Then i think a reasonable solution would be to make an example notebook with Catboost, but not add it to the default configuration since cat_features and the import need to be coded manually.
Actually you don't have to pass cat_features if you don't have them, you can use the library without categorical features
I would likewise be interested in seeing Catboost added in the default configuration; even without categorical variables, I've found it has comparable performance to xgboost after tuning, and frequently outperforms when both are run using default settings. Especially for pipelines with shorter runtimes, it could be a real value added.
https://github.com/catboost/catboost/issues/696#issuecomment-627258634 - catboost performs well in other auto ml packages
CatBoost is a gradient boosting library that gives state of the art results on datasets with categorical features and also on many datasets without categorical features. So it makes sense to add it here. https://catboost.ai https://github.com/catboost/catboost