Open psc0606 opened 4 years ago
@hollinwilkins
@psc0606 I tried to do this, to include a null value with a vector assembler without mleap and it looks like that's also not supported, so it looks like this is expected?
Here's the small example I've tried
import org.apache.spark.ml.parity.SparkParityBase
import org.apache.spark.ml.Transformer
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.spark.sql.types.{DoubleType, StructType}
import scala.util.Random
def randomRow(): Row = Row(Random.nextDouble(), null)
val rows = spark.sparkContext.parallelize(Seq.tabulate(1) { _ => randomRow() })
val schema = new StructType()
.add("real", DoubleType, nullable = false)
.add("another_real", DoubleType, nullable = true)
val dataset: DataFrame = spark.sqlContext.createDataFrame(rows, schema)
val sparkTransformer: Transformer = new VectorAssembler().
setInputCols(Array("real", "another_real")).
setOutputCol("features")
display(sparkTransformer.transform(dataset))
To get this to work, you'd need to use an Imputer or some similar transformer to impute the null values first.
Do you have an example where Spark works and MLeap doesn't that I could take a look?
@ancasarb by creating the VectorAssembler
as shown below will handle the null
values in spark
new VectorAssembler()
.setHandleInvalid("keep")
.setInputCols(Array("real", "another_real"))
.setOutputCol("features")
But the mleap corresponding to this still fails for null
values with the scala.MatchError: null
I get this error, when i use mleap bundle model. because one of input value is missing. But the mleap cannot process this problem. Anyone can help? My mleap version: 0.13.0
scala.MatchError: null at ml.combust.mleap.core.feature.VectorAssemblerModel$$anonfun$apply$3.apply(VectorAssemblerModel.scala:37) at ml.combust.mleap.core.feature.VectorAssemblerModel$$anonfun$apply$3.apply(VectorAssemblerModel.scala:37) at scala.collection.immutable.Stream.foreach(Stream.scala:594) at ml.combust.mleap.core.feature.VectorAssemblerModel.apply(VectorAssemblerModel.scala:37) at ml.combust.mleap.runtime.transformer.feature.VectorAssembler$$anonfun$1.apply(VectorAssembler.scala:18) at ml.combust.mleap.runtime.transformer.feature.VectorAssembler$$anonfun$1.apply(VectorAssembler.scala:18) at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) at ml.combust.mleap.runtime.frame.ArrayRow.udfValue(ArrayRow.scala:17) at ml.combust.mleap.runtime.frame.Row$class.withValue(Row.scala:221) at ml.combust.mleap.runtime.frame.ArrayRow.withValue(ArrayRow.scala:17) at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumn$1$$anonfun$apply$2$$anonfun$2.apply(DefaultLeapFrame.scala:54) at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumn$1$$anonfun$apply$2$$anonfun$2.apply(DefaultLeapFrame.scala:54) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) at scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:1120) at scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:1120) at scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:1109) at scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:1109) at scala.collection.immutable.StreamIterator.hasNext(Stream.scala:1114) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)