Open caesarjuly opened 6 years ago
@caesarjuly Try adding "ml.combust.mleap" %% "mleap-spark-extension" % "0.9.5" as a dependency and use
import org.apache.spark.ml.mleap.feature.Imputer
from there instead.
I believe that the out of the box Spark transformer can work on multiple columns and that isn't supported at the moment in MLeap. The transformer from mleap-spark-extension works the same as Spark's, with the additional restriction that it works on just a single column.
@ancasarb Thank u very much. That should be the key. I'll try it later. There is another question. What's the relationship between mleap-spark and mleap-spark-extension? Which one should I use?
Btw, after reading and using this project. I really want to participate in it. Is there any way to join? I feel that there are many features waited to be added.
Any developments on this issue?
@gabtibe please see the answer above, about using the Imputer from mleap-extensions. Let me know if you have any questions!
@ancasarb Thanks for your response; I did use the Imputer
from spark-extension
but was wondering if there's any plan to support the standard Imputer
from Spark
as I noticed it speeds up computation and reduce the number of steps in the pipeline to be saved
+1 for support of Spark ImputerModel
, makes exporting existing pipelines much easier.
I have the same issue working with Pyspark 2.3.0. mleap-spark-extension are not available in python from what I saw.
I am very confused about the mleap's support for spark Imputer
.
Why doesn't the documentation mention that the Imputer in spark is only supported when using an object from mleap-spark-extension
? The Supported transformers table mentions that Imputer
is supported (without any explanation).
Do I understand correctly that it's not possible to instantiate the Imputer
when creating a pipeline in pyspark?
EDIT: I have asked a SO question about the above https://stackoverflow.com/questions/71209926/mleap-support-spark-ml-imputer as well
According to the doc, imputer is supported. But I get this error when trying to save bundlefile. Here are my dependency versions: ''spark 2.2.0'' "ml.combust.mleap" %% "mleap-runtime" % "0.9.5", "ml.combust.mleap" %% "mleap-spark" % "0.9.5" I don't know what to do, can u help me, please~