Closed z3d1k closed 1 year ago
Thanks for the detailed measurements! Indeed we did see a performance hit elsewhere (RCFCast, in forecasting) and we have a PR out https://github.com/aws/random-cut-forest-by-aws/pull/378 for a couple of weeks that fixes the regression. But indeed this solution is better and let us see if we can have this as a 3.6 release.
Please check out the new version. The example here is in core/src/test/java/com/amazon/randomcutforest/state/RandomCutForestMapperTest.java, see @Test void benchmarkMappers() {
the entire roundtrip (deserialize, evaluate, serialize) seems to be about 36 ms on a commodity Mac with mapper.setSaveTreeStateEnabled(true). The default is false.
Thanks! I've tested the new version, the issue is fixed.
Thank you for the pointer to setSaveTreeStateEnabled
.
I've noticed significant drop in performance in Java library when restoring from
RandomCutForestState
.Benchmark data:
Profiling data shows that most of the time spent on performing![flamegraph](https://user-images.githubusercontent.com/2481047/229477500-a8408ea0-d42e-4405-a43c-5788878d7687.png)
.toString
operations:It appears that source of the problem is this operation, performing heavy string construction on every iteration:
This can be resolved by either using constant string as error message for
checkArgument
, or, to preserve detailed error, - perform error message computation lazily if error is detected, e.g.