combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 312 forks source link

Key Not Found exception when I try to use my custom Transformer #725

Open bmm-2020 opened 3 years ago

bmm-2020 commented 3 years ago

Hi Team, I am trying to use MLeap to bundle my spark ML pipeline and I am using spark ML inbuild as well as custom transformers in the pipeline. It throws exception on bundle serialization. I am not sure if MLeap doesn't support custom transformer or if its some other issue. I am using spark 3.0.1 and Mleap-spark 0.16.0 , can you please advise? Really appreciate your help!

Exception in thread "main" java.util.NoSuchElementException: key not found: my_custom_transfomer at scala.collection.MapLike.default(MapLike.scala:235) at scala.collection.MapLike.default$(MapLike.scala:234) at scala.collection.AbstractMap.default(Map.scala:65) at scala.collection.MapLike.apply(MapLike.scala:144) at scala.collection.MapLike.apply$(MapLike.scala:143) at scala.collection.AbstractMap.apply(Map.scala:65) at ml.combust.bundle.BundleRegistry.opForObj(BundleRegistry.scala:102) at ml.combust.bundle.serializer.GraphSerializer.$anonfun$writeNode$1(GraphSerializer.scala:31) at scala.util.Try$.apply(Try.scala:213) at ml.combust.bundle.serializer.GraphSerializer.writeNode(GraphSerializer.scala:30) at ml.combust.bundle.serializer.GraphSerializer.$anonfun$write$2(GraphSerializer.scala:21) at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:38) at ml.combust.bundle.serializer.GraphSerializer.write(GraphSerializer.scala:21) at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:21) at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:14) at ml.combust.bundle.serializer.ModelSerializer.$anonfun$write$1(ModelSerializer.scala:87) at scala.util.Try$.apply(Try.scala:213) at ml.combust.bundle.serializer.ModelSerializer.write(ModelSerializer.scala:83) at ml.combust.bundle.serializer.NodeSerializer.$anonfun$write$1(NodeSerializer.scala:85) at scala.util.Try$.apply(Try.scala:213) at ml.combust.bundle.serializer.NodeSerializer.write(NodeSerializer.scala:81) at ml.combust.bundle.serializer.BundleSerializer.$anonfun$write$1(BundleSerializer.scala:34) at scala.util.Try$.apply(Try.scala:213) at ml.combust.bundle.serializer.BundleSerializer.write(BundleSerializer.scala:29) at ml.combust.bundle.BundleWriter.save(BundleWriter.scala:34)

bmm-2020 commented 3 years ago

I found the guide on how to create a custom transformer , I followed those steps and still the same error. I am bit unclear on registering the custom transformer part. As instructed at the end I created reference.conf in my local project but not sure if its taking effect.

Also, I found that when we override inputSchema or outputSchema only scalar types can be passed into StructType, not sure how to deal with vector col. (I have few custom transformers and some of them operates on vectors).

Any help/advise greatly appreciated! Thanks.

bmm-2020 commented 3 years ago

I downloaded the mleap library and modified it with my newly added custom transformer related code, compiled and packaged it. I now refer to this newly built mleap libraries from my code and I am able to generate the mleap bundle. Now my next road blocks are - 1) support for vector col (as mentioned above) 2) once I deploy this bundled model on aws sagemaker as an endpoint, how do I supply the modified mleap packages along with?

Any advise greatly appreciated, thanks!