combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 310 forks source link

Mleap Schema for vector of features #510

Closed aditya619 closed 4 years ago

aditya619 commented 5 years ago

I have a spark RF model which I'm trying to serve using Mleap. The schema of the input is of the following format from spark.

inputFeatures.rdd.map(x =>  (x(0).asInstanceOf[String], 
                                           x(1).asInstanceOf[String],
                                           x(2).asInstanceOf[String],
                                           Vectors.dense(x(3).asInstanceOf[String].split(",")
                                           .map(_.toDouble)))).toDF("entityOneId","entityTwoId", "label", "features")

I'm trying to use the below schema in leap frame for the above input features.

{
  "schema": {
    "fields": [
      {
        "name": "entityOneId",
        "type": "string"
      },
      {
        "name": "entityTwoId",
        "type": "string"
      },
      {
        "name": "label",
        "type": "string"
      },
      {
        "name": "features",
        "type": {
          "type": "array",
          "base": "double"
        }
      }
    ]
  },
  "rows": [
    [
      "some-id-one",
      "some-id-two",
      "1",
      [
        1.0,
        1.0,
        -3.0,
        0.0,
        0.0,
        1.0,
        -1.0,
        -1.0,
        -1.0,
        -1.0,
        -1.0,
        -1.0,
        1.0,
        1.0,
        0.0,
        0.0,
        0.0,
        1.0,
        3.0
      ]
    ]
  ]
}

However the sending the above input through mleap transform endpoint fails. The exception seems to be in JSONMapping.

2019-04-05 19:00:12.441  WARN 1 --- [nio-8080-exec-2] m.c.m.springboot.GlobalExceptionHandler  : Returned error due to                                                               

scalapb.json4s.JsonFormatException: Unexpected value (JObject(List((schema,JObject(List((fields,JArray(List(JObject(List((name,JString(entityOneId)), (type,JString(string)))), JObje
ct(List((name,JString(entityTwoId)), (type,JString(string)))), JObject(List((name,JString(label)), (type,JString(string)))), JObject(List((name,JString(features)), (type,JObject(Lis
t((type,JString(array)), (base,JString(double))))))))))))), (rows,JArray(List(JArray(List(JString(http://dmid.amazon.com/270420605), JString(http://evi.com/p/track_version_of_big_bu
tter_and_egg_man_7), JString(1), JArray(List(JDouble(1.0), JDouble(1.0), JDouble(-3.0), JDouble(0.0), JDouble(0.0), JDouble(1.0), JDouble(-1.0), JDouble(-1.0), JDouble(-1.0), JDoubl
e(-1.0), JDouble(-1.0), JDouble(-1.0), JDouble(1.0), JDouble(1.0), JDouble(0.0), JDouble(0.0), JDouble(0.0), JDouble(1.0), JDouble(3.0))))))))))) for field frame of TransformFrameRe
quest                                                                                                                                                                                

Can anyone confirm if the above input is correct?

ancasarb commented 5 years ago

@aditya619 could you please try to change to a tensor type for features and see if that works better?

{
"name": "features",
"type": {
         "type": "tensor",
         "base": "double",
         "dimensions": [
           19
         ]
       }
}
ancasarb commented 4 years ago

Closing this, I believe the answer above would solve the issue, please re-open if you're still not finding this to be the case.