apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.33k stars 627 forks source link

Can't convert OneHotEncoder with non-categorical features #232

Open calebmadrigal opened 6 years ago

calebmadrigal commented 6 years ago

I can convert an sklearn OneHotEncoder in which every feature is categorical. Example of OneHotEncoder with all categorical features:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) 

However, I cannot seem to convert a OneHotEncoder which has non-categorical features. For example, this conversion fails:

# Train simple OneHotEncoder with one categorical and one non-categorical feature
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(sparse=False, categorical_features=[False, True])
enc.fit([[0, 0], [0, 1], [0, 2], [0, 3], [0, 4], [2, 4]])

# Verify that it works
print('Transform of [10, 0]: {}'.format(enc.transform([[10, 0]])))

# Try to convert to CoreML
import coremltools
author_ohe_coreml = coremltools.converters.sklearn.convert(enc)

Here is the output:

Transform of [10, 0]: [[ 1.  0.  0.  0.  0. 10.]]
Traceback (most recent call last):
  File "coreml_onehot_bad.py", line 11, in <module>
    author_ohe_coreml = coremltools.converters.sklearn.convert(enc)
  File "/Users/caleb.madrigal/.pyenv/versions/3.6.0/lib/python3.6/site-packages/coremltools/converters/sklearn/_converter.py", line 146, in convert
    sk_obj, input_features, output_feature_names, class_labels = None)
  File "/Users/caleb.madrigal/.pyenv/versions/3.6.0/lib/python3.6/site-packages/coremltools/converters/sklearn/_converter_internal.py", line 214, in _convert_sklearn_model
    features = _fm.process_or_validate_features(input_features, num_dimensions)
  File "/Users/caleb.madrigal/.pyenv/versions/3.6.0/lib/python3.6/site-packages/coremltools/models/_feature_management.py", line 215, in process_or_validate_features
    raise_type_error("If a single feature name is given, then "
  File "/Users/caleb.madrigal/.pyenv/versions/3.6.0/lib/python3.6/site-packages/coremltools/models/_feature_management.py", line 192, in raise_type_error
    % (additional_msg, str(original_features)))
TypeError: Error processing feature list: If a single feature name is given, then num_dimensions must be provided.
features = input
TobyRoseman commented 3 years ago

This is still an issue with coremltools 4.1, macOS 11.3 Beta and Scikit Learn 0.19.2.

Also the following OneHotEncoder, with only categorical features, produces the same error: enc = OneHotEncoder(sparse=False, categorical_features=[True, True])