combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 310 forks source link

Mleap Runtime serialization using wrong file name for DecisionTree models #843

Closed AllanaEvans closed 1 year ago

AllanaEvans commented 1 year ago

This PR introduced a bug in the Mleap Runtime serialization for DecisionTree-type models (DecisionTreeClassifier and DecisionTreeRegression). That PR intended to update how the trees in the DecisionTree are serialized, including switching the file name from "nodes" to "tree". However, two instances were missed, one in the runtime DecisionTreeClassifierOp and one in the runtime DecisionTreeRegressionOp. This bug means that any DecisionTreeClassifier or DecisionTreeRegression models that are serialized using Mleap Runtime will use the "nodes" file name and thus are unable to be deserialized because both Mleap Runtime and Mleap Spark expect the "tree" file name during deserialization (because they were correctly updated in that PR).

In this PR, I am fixing the two missed instances from the original PR and adding a test case for each of these instances. I have verified that the test case failed before the change and succeeds after the change.

Demonstrating that the two new tests failed before fixing the file name:

$ sbt "project mleap-runtime” test
…
[info] DecisionTreeRegressionSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded *** FAILED ***
[info]   java.nio.file.NoSuchFileException: /root/tree.json
[info]   at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.newInputStream(ZipFileSystem.java:572)
[info]   at jdk.zipfs/jdk.nio.zipfs.ZipPath.newInputStream(ZipPath.java:708)
[info]   at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newInputStream(ZipFileSystemProvider.java:273)
[info]   at java.base/java.nio.file.Files.newInputStream(Files.java:158)
[info]   at ml.combust.bundle.tree.decision.TreeSerializer.$anonfun$read$1(TreeSerializer.scala:107)
[info]   at resource.DefaultManagedResource.open(AbstractManagedResource.scala:110)
[info]   at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:87)
[info]   at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26)
[info]   at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26)
[info]   at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
…
[info] DecisionTreeClassifierSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded *** FAILED ***
[info]   java.nio.file.NoSuchFileException: /root/tree.json
[info]   at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.newInputStream(ZipFileSystem.java:572)
[info]   at jdk.zipfs/jdk.nio.zipfs.ZipPath.newInputStream(ZipPath.java:708)
[info]   at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newInputStream(ZipFileSystemProvider.java:273)
[info]   at java.base/java.nio.file.Files.newInputStream(Files.java:158)
[info]   at ml.combust.bundle.tree.decision.TreeSerializer.$anonfun$read$1(TreeSerializer.scala:107)
[info]   at resource.DefaultManagedResource.open(AbstractManagedResource.scala:110)
[info]   at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:87)
[info]   at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26)
[info]   at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26)
[info]   at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
…
[info] Run completed in 3 seconds, 652 milliseconds.
[info] Total number of tests run: 274
[info] Suites: completed 72, aborted 0
[info] Tests: succeeded 272, failed 2, canceled 0, ignored 0, pending 0
[info] *** 2 TESTS FAILED ***
[error] Failed tests:
[error]     ml.combust.mleap.runtime.transformer.regression.DecisionTreeRegressionSpec
[error]     ml.combust.mleap.runtime.transformer.classification.DecisionTreeClassifierSpec
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful

Demonstrating that the two new tests succeed after fixing the file name:

$ sbt "project mleap-runtime” test
…
[info] DecisionTreeRegressionSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded
…
[info] DecisionTreeClassifierSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded
…
[info] Run completed in 4 seconds, 104 milliseconds.
[info] Total number of tests run: 274
[info] Suites: completed 72, aborted 0
[info] Tests: succeeded 274, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
AllanaEvans commented 1 year ago

@jsleight @WeichenXu123 Hi! Could I get a review on this PR? This bug is impacting my team. Thanks!

AllanaEvans commented 1 year ago

@jsleight Thank you!