catboost / benchmarks

Comparison tools
Apache License 2.0
168 stars 47 forks source link

'data' is numpy array of floating point numerical type, it means no categorical features, but 'cat_features' parameter specifies nonzero number of categorical features #13

Open iyliamjd opened 4 years ago

iyliamjd commented 4 years ago

Hi everyone, it's me again. I have run this code. I get error code below: pool = Pool(data, label, cat_features=cat_cols)

the error : 'data' is numpy array of floating point numerical type, it means no categorical features," _catboost.CatBoostError: 'data' is numpy array of floating point numerical type, it means no categorical features, but 'cat_features' parameter specifies nonzero number of categorical features

Does anyone know what is happening, i did't change any of the code but got error maybe because of my train and test file. But I dont know how is the structure for test and train file.

annaveronika commented 4 years ago

If I'm not mistaken, you should be doing issues here: https://github.com/catboost/catboost/issues I would recommend to create the new ones there, because we check that place all the time.

About this issue: the problem, you're facing is that you are passing floating point numbers to categorical columns, which is not allowed. Here's an explanation, why it is forbidden: https://catboost.ai/docs/concepts/faq.html#why-float-and-nan-values-are-forbidden-for-cat-features

We are planning to allow it for python though, it's one of the open problems for new contributors: https://github.com/catboost/catboost/blob/master/open_problems/open_problems.md

So what you need to do, is you need to convert those columns to integers or to strings.