combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 313 forks source link

NoSuchElementException: key not found: org.apache.spark.ml.feature.OneHotEncoder #220 #478

Closed ruimaximo closed 5 years ago

ruimaximo commented 5 years ago

Hi,

I want to bundle a PySpark ML pipeline with MLeap. I was able to do it fine until I added pyspark.ml.feature.OneHotEncoderEstimator to my pipeline.

When I am using a cluster based on Python 3 and Databricks runtime 4.3 (Scala 2.11,Spark 2.3.1) I got the issue [#220]

As suggested in #220 I tried to import and use the mleap OneHotEncoder. However, I cannot import anything from org.apache.spark.ml.mleap.feature

`%scala import org.apache.spark.ml.mleap.feature

notebook:1: error: object feature is not a member of package org.apache.spark.ml.mleap import org.apache.spark.ml.mleap.feature ` I got the same error with Databricks 3.5 LTS (Scala 2.11, Spark 2.2.1)

Extra Info:

I am using Databricks Community edition. I have installed MLeap-Spark:

I am new to Spark and to GitHub. I might be missing something really obvious. Please go easy on me :)

ancasarb commented 5 years ago

hey @ruimaximo we've removed mleap's own one hot encoder, so you can use

import org.apache.spark.ml.feature.OneHotEncoderEstimator

instead.

To use the pyspark integration, you'll need to attach to your cluster the mleap pypi dependency as well. And then you can do something like this

import mleap.pyspark
from mleap.pyspark.spark_support import SimpleSparkSerializer 

from pyspark.ml.feature import OneHotEncoderEstimator

Please let me know if you have any further questions, and if not, if it's okay to close this issue.

mohaissa commented 5 years ago

Hi @ancasarb I am using Databricks run time version cluster: 5.2 ML Beta (includes Apache Spark 2.4.0, Scala 2.11) and from Marven : ml.combust.mleap:mleap-spark_2.11:0.12.0 I get the same Error: java.util.NoSuchElementException: key not found: com.databricks.sparkdl.DeepImageFeaturizer

Py4JJavaError Traceback (most recent call last)

in () 3 from pyspark.ml.feature import OneHotEncoderEstimator 4 ----> 5 p_model.serializeToBundle("jar:file:/tmp/mleap_python_model_export/ModelImage-json.zip", tested_df) /databricks/python/lib/python3.6/site-packages/mleap/pyspark/spark_support.py in serializeToBundle(self, path, dataset) 23 def serializeToBundle(self, path, dataset=None): 24 serializer = SimpleSparkSerializer() ---> 25 serializer.serializeToBundle(self, path, dataset=dataset) 26 27 /databricks/python/lib/python3.6/site-packages/mleap/pyspark/spark_support.py in serializeToBundle(self, transformer, path, dataset) 40 41 def serializeToBundle(self, transformer, path, dataset): ---> 42 self._java_obj.serializeToBundle(transformer._to_java(), path, dataset._jdf) 43 44 def deserializeFromBundle(self, path): ![image](https://user-images.githubusercontent.com/29986627/55635829-a152aa80-578f-11e9-8246-ae763b8463b3.png) Could you help with that please thanks
ancasarb commented 5 years ago

Hey, we don't have support for DeepImageFeaturizer, I could help guide you in what's required for adding support for it, if you'd like.

mohaissa commented 5 years ago

Yes please, I would like your help. thanks

ancasarb commented 5 years ago

Closing this, will open a new issue for support of DeepImageFeaturizer.