combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 310 forks source link

org.apache.spark.sql.mleap.TypeConverters can not convert 2D tensor to Matrix #854

Open austinzh opened 1 year ago

austinzh commented 1 year ago

Current implementation always will convert Tensor to Vector Bug is hidden in tt.dimensions.size where tt.dimensions is Option[Seq[Int]], so calling size on Some will have size of 1 and calling size on None will have size of 0. So in following code, TensorType will always convert to VectorUDT

  def mleapTensorToSpark(tt: types.TensorType): DataType = {
    assert(TypeConverters.VECTOR_BASIC_TYPES.contains(tt.base),
      s"cannot convert tensor with base ${tt.base} to vector")
    assert(tt.dimensions.isDefined, "cannot convert tensor with undefined dimensions")

    if(tt.dimensions.isEmpty) {
      mleapBasicTypeToSparkType(tt.base)
    } else if(tt.dimensions.size == 1) {
      new VectorUDT
    } else if(tt.dimensions.size == 2) {
      new MatrixUDT
    } else {
      throw new IllegalArgumentException("cannot convert tensor for non-scalar, vector or matrix tensor")
    }
  }

Same bug exists in mleapToSparkValue function as well.