Closed avinash-indix closed 5 years ago
@avinash-indix would it be possible to perhaps get the csv file you were using so that I can run the example you've referenced?
I've tried a simple out of the box pipeline and was able to run without issues. The dataset I used is here https://github.com/combust/mleap/blob/master/mleap-databricks-runtime-testkit/src/main/resources/datasources/lending_club_sample.avro.
import org.apache.spark.ml.{Pipeline, Transformer}
import org.apache.spark.ml.classification.{LogisticRegression, OneVsRest}
import org.apache.spark.ml.feature.{StringIndexer, VectorAssembler}
import org.apache.spark.sql.DataFrame
import com.databricks.spark.avro._
val dataset = spark.sqlContext.read.avro("/FileStore/tables/lending_club_sample.avro")
val pipeline = new Pipeline().setStages(Array(new StringIndexer().
setInputCol("fico_score_group_fnl").
setOutputCol("fico_index"),
new VectorAssembler().
setInputCols(Array("fico_index", "dti")).
setOutputCol("features"),
new OneVsRest().setClassifier(new LogisticRegression()).
setLabelCol("fico_index").
setFeaturesCol("features").
setPredictionCol("prediction"))).fit(dataset)
import ml.combust.bundle.BundleFile
import ml.combust.mleap.spark.SparkSupport._
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.bundle.SparkBundleContext
import resource._
import ml.combust.bundle.serializer.SerializationFormat
val sbc = SparkBundleContext().withDataset(pipeline.transform(dataset))
for(bf <- managed(BundleFile("jar:file:/tmp/simple-spark-pipeline1.zip"))) {
pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bf)(sbc).get
}
import ml.combust.bundle.BundleFile
import ml.combust.mleap.runtime.MleapSupport._
import resource._
val bundle = (for(bundleFile <- managed(BundleFile("jar:file:/tmp/simple-spark-pipeline1.zip"))) yield {
bundleFile.loadMleapBundle().get
}).opt.get
val mleapPipeline = bundle.root
println(mleapPipeline)
I can actually replicate your issue, using the mleap
import org.apache.spark.ml.mleap.classification.OneVsRest
Currently looking into what the issue is.
I've raised https://github.com/combust/mleap/pull/492 to fix this issue, in the meantime, you can also try to use the default (as in my example above)
import org.apache.spark.ml.classification.OneVsRest
if you don't require the additional probability column and don't use any custom transformers from mleap-spark-extensions.
Thanks a lot Anca. I am using a custom transformer to convert the int prediction column to double(multiclassevaluator throws an error if not). In my code the i2dTransformer does only this. I could not find the transformer that could do this so I had to custom make one. If you know of any plz do let me know
On Mon, Feb 25, 2019, 21:09 Anca Sarb notifications@github.com wrote:
I've raised #492 https://github.com/combust/mleap/pull/492 to fix this issue, in the meantime, you can also try to use the default (as in my example above)
import org.apache.spark.ml.classification.OneVsRest
if you don't require the additional probability column and don't use any custom transformers from mleap-spark-extensions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/combust/mleap/issues/490#issuecomment-467058686, or mute the thread https://github.com/notifications/unsubscribe-auth/AbNDbb5H6PGU9WP105T1AgJwXcwpru-Qks5vRAOjgaJpZM4bGwFT .
See if you can use import org.apache.spark.ml.classification.OneVsRest
in your pipeline for now.
Closing as #492 was merged and will be released with the new mleap version.
the line throwing the exception is [https://github.com/combust/mleap/blob/30c1ce2c7e5ca81492514e1eadd79a8ec1b10a7a/mleap-runtime/src/main/scala/ml/combust/mleap/bundle/ops/classification/OneVsRestOp.scala#L38] while trying to load the model from line `val mleapModel = { val model = (for (bf <- managed(BundleFile(new File(modelDir)))) yield { bf.loadMleapBundle() }).tried.flatMap(identity).get.root`**
my versions are mleapVersion = "0.13.0", sparkMl = "2.3.1"