Open inardini opened 3 years ago
The error means you are using StringIndexer with the multi-column in/out formats. I.e., you set the InputCols parameter (and maybe the OutputCols parameter). This is a new feature added in Spark 3. Mleap does support spark 3, but doesn't yet support 100% of the capabilities (we try to throw exceptions like this when support isn't available yet).
As a workaround, you can replace your multi-column StringIndexer with multiple single-column StringIndexer. E.g., supposing you had code like this right now:
indexer = StringIndexer(inputCols=["foo", "bar", "baz"], outputCols=["a", "b", "c"])
pipe = Pipeline(stages=[...,indexer,...])
Then change it to:
indexer1 = StringIndexer(inputCol="foo", outputCol="a')
indexer2 = StringIndexer(inputCol="bar", outputCol="b")
indexer3 = StringIndexer(inputCol="baz", outputCol="c")
pipe = Pipeline(stages=[...,indexer1, indexer2, indexer3, ...])
Will be functionally equivalent and be supported in mleap.
To whom it may concern,
I'm trying to deploy an PySpark pipeline using the MLeap bundle with
combustml/mleap-spring-boot:0.19.0-SNAPSHOT
docker image. And I get this error:Any insights how can I fix it?
The bundle has the following structure
and it was trained using
ml.combust.mleap:mleap-runtime_2.12:0.18.1
andml.combust.mleap:mleap-spark_2.12:0.18.1
withspark version: 3.1.2
.Thanks