combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 310 forks source link

Mleap runtime Logistic regression thresholds #307

Closed wanchaol closed 6 years ago

wanchaol commented 6 years ago

Hi, I tested locally by training the logistic regression model, serliaizing to disk, then deserialize and make prediction, the logistic regression in the runtime is having the "thresholds" variable empty. It probably because you are not setting the thresholds inside the ProbablisticClassificationModel using the impl.

case class LogisticRegressionModel(impl: AbstractLogisticRegressionModel) extends ProbabilisticClassificationModel {
  override val numClasses: Int = impl.numClasses
  override val numFeatures: Int = impl.numFeatures
  val isMultinomial: Boolean = impl.numClasses > 2

  def multinomialModel: ProbabilisticLogisticsRegressionModel = impl.asInstanceOf[ProbabilisticLogisticsRegressionModel]
  def binaryModel: BinaryLogisticRegressionModel = impl.asInstanceOf[BinaryLogisticRegressionModel]

  override def predict(features: Vector): Double = impl.predict(features)

  override def predictRaw(features: Vector): Vector = impl.predictRaw(features)

  override def rawToProbabilityInPlace(raw: Vector): Vector = impl.rawToProbabilityInPlace(raw)
}
ancasarb commented 6 years ago

Hi @wanchaol,

I am trying to reproduce the error and I was wondering what did you use to serialize the model, Spark or MLeap? Are the "thresholds" serialized to the bundle but not set when the model is loaded at scoring time?

https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/bundle/ops/classification/LogisticRegressionOp.scala#L48

I can see the "thresholds" being set in the LogisticRegressionOp but perhaps there's an issue and it doesn't get set properly.

wanchaol commented 6 years ago

Hi, @ancasarb I trained using spark, and serialize it to bundle, the bundle contains the thresholds, but when I try deserializing and do prediction, the threshold is not actually applied to the prediction

hollinwilkins commented 6 years ago

@wanchaol This is probably an issue related to a discussion we have been having in gitter. There are probably a few Spark deserializers that are not getting all of the information out of the bundle back into Spark.

ancasarb commented 6 years ago

@hollinwilkins @wanchaol https://github.com/combust/mleap/pull/342 PR contains the fixes for the deserialization issues.

ancasarb commented 6 years ago

@wanchaol @hollinwilkins closing this issue as 0.9.5 version available on maven contains fix for it.