Closed john-waczak closed 2 years ago
@john-waczak Thanks for reporting! Good to know about this.
A complete minimum working example might speed up resolution, ideally without the the MLJ wrap.
Okay, here's a MWE. I crash when running the following on Ubuntu 21.04 machine w/ 16GB ram and 4 core i7-7700HQ @ 2.80GHz
using EvoTrees
# Simple Regression Demo
n=2000;
X = 2*(rand(n,2) .- 0.5);
y = X[:,1].^5 + X[:,2].^4 - X[:,1].^4 - X[:,2].^3
size(X)
size(y)
# train for first time with default settings
params1 = EvoTreeRegressor()
model = fit_evotree(params1, X, y)
# train wit increased max_depth
# this causes julia to crash
params2 = EvoTreeRegressor(max_depth=20)
model = fit_evotree(params2, X, y)
Here's the output of Pkg.status:
(evoTree_bug) pkg> status
Status `~/gitRepos/evoTree_bug/Project.toml`
[f6006082] EvoTrees v0.8.4
Here's a screenshot of my memory usage:
Thanks for reporting! For what I can tell, it doesn't seem an issue per se or a memory leak, but more of a consequence of the design choices geared toward fitting speed which results in significant memory pre-allocations. Specifically, histograms for each tree nodes are pre-allocated, and in the case of a depth of 20, there are over 500K such nodes. What appears like a memory leak is actually a long pre-allocation process.
However, in gradient boosted model, each tree act as a weak learner and as such, I'm not aware of situation where depth much greater than 10 were of any value. Typically, a depth in the 3-8 range will best perform. Let me know if you are in a situation where greater depth is needed. I'm afraid though a significantly different design, potentially less efficient, would be needed to support such scenarios,
@jeremiedb Thanks for your reply! That makes a lot of sense. I think I should be more than fine with a smaller max_depth
. I was trying some hyper-parameter variation just to see what would happen and noticed the script kept dying once it got past 10 or so.
I have been able to train an EvoTreeRegressor with the default parameters successfully. When I try to increase the
max_depth
parameter beyond 10 suddenly my memory usage spikes and Julia dies.Here's a snippet from the REPL