This PR introduced a bug in the Mleap Runtime serialization for DecisionTree-type models (DecisionTreeClassifier and DecisionTreeRegression). That PR intended to update how the trees in the DecisionTree are serialized, including switching the file name from "nodes" to "tree". However, two instances were missed, one in the runtime DecisionTreeClassifierOp and one in the runtime DecisionTreeRegressionOp. This bug means that any DecisionTreeClassifier or DecisionTreeRegression models that are serialized using Mleap Runtime will use the "nodes" file name and thus are unable to be deserialized because both Mleap Runtime and Mleap Spark expect the "tree" file name during deserialization (because they were correctly updated in that PR).
In this PR, I am fixing the two missed instances from the original PR and adding a test case for each of these instances. I have verified that the test case failed before the change and succeeds after the change.
Demonstrating that the two new tests failed before fixing the file name:
$ sbt "project mleap-runtime” test
…
[info] DecisionTreeRegressionSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded *** FAILED ***
[info] java.nio.file.NoSuchFileException: /root/tree.json
[info] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.newInputStream(ZipFileSystem.java:572)
[info] at jdk.zipfs/jdk.nio.zipfs.ZipPath.newInputStream(ZipPath.java:708)
[info] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newInputStream(ZipFileSystemProvider.java:273)
[info] at java.base/java.nio.file.Files.newInputStream(Files.java:158)
[info] at ml.combust.bundle.tree.decision.TreeSerializer.$anonfun$read$1(TreeSerializer.scala:107)
[info] at resource.DefaultManagedResource.open(AbstractManagedResource.scala:110)
[info] at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:87)
[info] at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26)
[info] at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26)
[info] at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
…
[info] DecisionTreeClassifierSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded *** FAILED ***
[info] java.nio.file.NoSuchFileException: /root/tree.json
[info] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.newInputStream(ZipFileSystem.java:572)
[info] at jdk.zipfs/jdk.nio.zipfs.ZipPath.newInputStream(ZipPath.java:708)
[info] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newInputStream(ZipFileSystemProvider.java:273)
[info] at java.base/java.nio.file.Files.newInputStream(Files.java:158)
[info] at ml.combust.bundle.tree.decision.TreeSerializer.$anonfun$read$1(TreeSerializer.scala:107)
[info] at resource.DefaultManagedResource.open(AbstractManagedResource.scala:110)
[info] at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:87)
[info] at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26)
[info] at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26)
[info] at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
…
[info] Run completed in 3 seconds, 652 milliseconds.
[info] Total number of tests run: 274
[info] Suites: completed 72, aborted 0
[info] Tests: succeeded 272, failed 2, canceled 0, ignored 0, pending 0
[info] *** 2 TESTS FAILED ***
[error] Failed tests:
[error] ml.combust.mleap.runtime.transformer.regression.DecisionTreeRegressionSpec
[error] ml.combust.mleap.runtime.transformer.classification.DecisionTreeClassifierSpec
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
Demonstrating that the two new tests succeed after fixing the file name:
$ sbt "project mleap-runtime” test
…
[info] DecisionTreeRegressionSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded
…
[info] DecisionTreeClassifierSpec:
…
[info] save/load model
[info] - correctly reproduces the model when saved and loaded
…
[info] Run completed in 4 seconds, 104 milliseconds.
[info] Total number of tests run: 274
[info] Suites: completed 72, aborted 0
[info] Tests: succeeded 274, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
This PR introduced a bug in the Mleap Runtime serialization for DecisionTree-type models (DecisionTreeClassifier and DecisionTreeRegression). That PR intended to update how the trees in the DecisionTree are serialized, including switching the file name from "nodes" to "tree". However, two instances were missed, one in the runtime DecisionTreeClassifierOp and one in the runtime DecisionTreeRegressionOp. This bug means that any DecisionTreeClassifier or DecisionTreeRegression models that are serialized using Mleap Runtime will use the "nodes" file name and thus are unable to be deserialized because both Mleap Runtime and Mleap Spark expect the "tree" file name during deserialization (because they were correctly updated in that PR).
In this PR, I am fixing the two missed instances from the original PR and adding a test case for each of these instances. I have verified that the test case failed before the change and succeeds after the change.
Demonstrating that the two new tests failed before fixing the file name:
Demonstrating that the two new tests succeed after fixing the file name: