apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Unable to export random_forest_classifier to CoreML #2203

Open shayneobrien opened 5 years ago

shayneobrien commented 5 years ago

I am attempting to train and export a tc.random_forest_classifier using the following code:

model = tc.random_forest_classifier.create(training_data, 
                                           target='target',
                                           verbose=False,
                                           features=['featurized'],
                                           max_iterations=100,
                                           class_weights=None,
                                           random_seed=3, 
                                           max_depth=6)

model.export_coreml('rf.coreml')

The model fully trains and evaluates successfully but I am unable to export it to CoreML. I receive the following error when trying to do so:

---------------------------------------------------------------------------
ToolkitError                              Traceback (most recent call last)
<ipython-input-17-1272491701fc> in <module>
----> 1 model.export_coreml('../test.corml')

/usr/local/lib/python3.6/site-packages/turicreate/toolkits/classifier/random_forest_classifier.py in export_coreml(self, filename)
    439                    }
    440                 }
--> 441         self._export_coreml_impl(filename, context)
    442 
    443 def create(dataset, target,

/usr/local/lib/python3.6/site-packages/turicreate/toolkits/_tree_model_mixin.py in _export_coreml_impl(self, filename, context)
    316 
    317     def _export_coreml_impl(self, filename, context):
--> 318         tc.extensions._xgboost_export_as_model_asset(self.__proxy__, filename, context)
    319 

/usr/local/lib/python3.6/site-packages/turicreate/extensions.py in <lambda>(*args, **kwargs)
    168 
    169 def _make_injected_function(fn, arguments):
--> 170     return lambda *args, **kwargs: _run_toolkit_function(fn, arguments, args, kwargs)
    171 
    172 def _class_instance_from_name(class_name, *arg, **kwarg):

/usr/local/lib/python3.6/site-packages/turicreate/extensions.py in _run_toolkit_function(fnname, arguments, args, kwargs)
    157     if not ret[0]:
    158         if len(ret[1]) > 0:
--> 159             raise _ToolkitError(ret[1])
    160         else:
    161             raise _ToolkitError("Toolkit failed with unknown error")

ToolkitError: Errors encountered during processing tree model:

  In TreeID=0, true child of NodeID=0 is already the child of node NodeID=0;
  In TreeID=0, true child of NodeID=1 is already the child of node NodeID=1;
  In TreeID=0, true child of NodeID=2 is already the child of node NodeID=2;
  In TreeID=0, true child of NodeID=5 is already the child of node NodeID=5;
  In TreeID=0, true child of NodeID=6 is already the child of node NodeID=6;
  In TreeID=0, true child of NodeID=9 is already the child of node NodeID=9;
  In TreeID=0, true child of NodeID=11 is already the child of node NodeID=11;
  In TreeID=0, true child of NodeID=12 is already the child of node NodeID=12;
  In TreeID=0, true child of NodeID=13 is already the child of node NodeID=13;
  In TreeID=0, true child of NodeID=15 is already the child of node NodeID=15;
  In TreeID=0, true child of NodeID=18 is already the child of node NodeID=18;
  In TreeID=0, true child of NodeID=19 is already the child of node NodeID=19;
  In TreeID=0, true child of NodeID=20 is already the child of node NodeID=20;
  In TreeID=0, true child of NodeID=22 is already the child of node NodeID=22;
  In TreeID=0, true child of NodeID=23 is already the child of node NodeID=23;
  In TreeID=0, true child of NodeID=24 is already the child of node NodeID=24;
  In TreeID=0, true child of NodeID=25 is already the child of node NodeID=25;
  In TreeID=0, true child of NodeID=26 is already the child of node NodeID=26;
  In TreeID=0, true child of NodeID=27 is already the child of node NodeID=27;
  In TreeID=0, true child of NodeID=28 is already the child of node NodeID=28;
  In TreeID=0, true child of NodeID=30 is already the child of node NodeID=30;
  In TreeID=0, true child of NodeID=34 is already the child of node NodeID=34;
  In TreeID=0, true child of NodeID=36 is already the child of node NodeID=36;
  In TreeID=0, true child of NodeID=40 is already the child of node NodeID=40;
  In TreeID=0, true child of NodeID=41 is already the child of node NodeID=41;
  In TreeID=0, true child of NodeID=42 is already the child of node NodeID=42;
  In TreeID=0, true child of NodeID=44 is already the child of node NodeID=44;
  In TreeID=1, true child of NodeID=3 is already the child of node NodeID=3;
  In TreeID=1, true child of NodeID=4 is already the child of node NodeID=4;
  In TreeID=1, true child of NodeID=8 is already the child of node NodeID=8;
  In TreeID=1, true child of NodeID=10 is already the child of node NodeID=10;
  In TreeID=1, true child of NodeID=14 is already the child of node NodeID=14;
  In TreeID=2, true child of NodeID=1 is already the child of node NodeID=1;
  In TreeID=2, true child of NodeID=3 is already the child of node NodeID=3;
  In TreeID=2, true child of NodeID=4 is already the child of node NodeID=4;
  In TreeID=2, true child of NodeID=5 is already the child of node NodeID=5;
  In TreeID=2, true child of NodeID=8 is already the child of node NodeID=8;
  In TreeID=2, true child of NodeID=9 is already the child of node NodeID=9;
  In TreeID=2, true child of NodeID=12 is already the child of node NodeID=12;
  In TreeID=2, true child of NodeID=14 is already the child of node NodeID=14;
  In TreeID=2, true child of NodeID=15 is already the child of node NodeID=15;
  In TreeID=2, true child of NodeID=16 is already the child of node NodeID=16;
  In TreeID=2, true child of NodeID=18 is already the child of node NodeID=18;
  In TreeID=2, true child of NodeID=22 is already the child of node NodeID=22;
  In TreeID=2, true child of NodeID=24 is already the child of node NodeID=24;
  In TreeID=2, true child of NodeID=25 is already the child of node NodeID=25;
  In TreeID=2, true child of NodeID=30 is already the child of node NodeID=30;
  In TreeID=3, true child of NodeID=0 is already the child of node NodeID=0;
  In TreeID=3, true child of NodeID=1 is already the child of node NodeID=1;
  In TreeID=3, true child of NodeID=2 is already the child of node NodeID=2;
  FATAL: maximum number of errors reached; aborting processing.

The input data consists of a binary target (0, 1) and has features consisting of strings tokenized into dicts using tc.text_analytics.count_words. I am on TuriCreate version 5.6 and Python 3.6. Thank you and please advise.

syoutsey commented 5 years ago

Hi @shayneobrien are you able to share the dataset that reproduces this error? We'd like to repro it in house. Thanks!

shayneobrien commented 5 years ago

Hi @shayneobrien are you able to share the dataset that reproduces this error? We'd like to repro it in house. Thanks!

Unfortunately I am unable to share the dataset :/... is there any other info I can provide to help debug? I ran into similar issues with other tree-based models as well. Non-tree models e.g. logistic regression export to CoreML without error.

syoutsey commented 5 years ago

@shayneobrien I'm unable to reproduce this with any datasets I have access to, is there a subset of the data you can share or perhaps a different dataset that also reproduces the error?

hoytak commented 5 years ago

This easily reproduces it:

import turicreate as tc
X = tc.util.generate_random_classification_sframe(500, "d", 4)
m = tc.random_forest_classifier.create(X, "target")
m.export_coreml("test.mlmodel")