aws / sagemaker-sparkml-serving-container

This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.
Apache License 2.0
50 stars 25 forks source link

Update pom.xml - Fix MLP dependency #42

Closed dilta closed 10 months ago

dilta commented 11 months ago

Fix MLP inference by adding the dependency required by Mleap

Issue #, if available: AWS support case https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=13330833201&language=en

Description of changes: Fix MLP (Multi-layer Perceptron) inference error Caused by: java.lang.ClassNotFoundException: com.github.fommil.netlib.NativeSystemBLAS at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[na:1.8.0_342] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[na:1.8.0_342] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[na:1.8.0_342] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[na:1.8.0_342] at java.lang.Class.forName0(Native Method) ~[na:1.8.0_342] at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_342] at com.github.fommil.netlib.BLAS.load(BLAS.java:76) ~[sparkml-serving-3.3.jar:3.3] at com.github.fommil.netlib.BLAS.(BLAS.java:58) ~[sparkml-serving-3.3.jar:3.3] ... 86 common frames omitted

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

xyang16 commented 11 months ago
  1. License: https://github.com/fommil/netlib-java/blob/master/LICENSE.txt
  2. I notice there's no test, please run through the pipeline and make sure it's fine.
dilta commented 10 months ago

Hi, I am not sure I understand the meaning of running through the pipeline. Did you guys have CI/CD that tests the pipeline itself during the process?

I don't know what kind of tests your team usually perform for this repo. Can someone from your team familiar with this help run through in this regard?

We already tested for our use case of the MLP model, and it works pretty well with reasonable QPS and model latency.

This is merely a missing dependency originally referenced by the library used in your code. I just helped find the exact version it is using.

If you try to look up the dependency of the MLeap library you are using, you will find the same thing.

Not sure why your existing tests didn't spot this issue. You probably want to consider adding Multilayer Perceptron into your test cases.

Thanks!

dilta commented 10 months ago

This is the inference recommender job we tested with this change.

https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/inference-recommender/details/mlp-gil-0618-inference-recommender-1691105941913

Without this change, this serving container will not be able to run. (The error was shown above.)

This model is trained using SparkML without any additional/custom functionality.

claytonparnell commented 10 months ago

@dilta we really appreciate your contribution and apologize for the delay. As part of our build and testing process, we need to make sure no critical security vulnerabilities are present. Since this image was not updated for some time, there were several vulnerabilities which needed addressing, and this required migrating JDK version and has taken some extra effort. I'll go ahead and merge this, and we'll release it to ECR once vulnerabilities are accounted for.

dilta commented 10 months ago

Thank you very much for the detailed explanation, @claytonparnell ! Really appreciate your help in this regard!