Open CameronBieganek opened 4 years ago
You could cast the features to a concrete type (ie X = Int.(X)
) as opposed to using the Any
type, which is quite heavy. That should help a little bit.
But otherwise, we need a new implementation of the Leaf
type (see #90), which requires a significant amount of work.
You could cast the features to a concrete type (ie
X = Int.(X)
) as opposed to using theAny
type, which is quite heavy. That should help a little bit.
The features matrix in my example had typeof(X) == Array{Float64,2}
, so I think I dodged that bullet.
I have a data set of dimensions
(87390, 243)
. Most of the columns are categorical variables that have been one-hot encoded. The size of the data set in memory is ~160 MB. I compared the memory usage for DecisionTree.jl and R's ranger package.DecisionTree.jl
Memory consumption:
ranger
Memory consumption:
Conclusion
Thus, it appears that DecisionTree.jl is using 2.4x as much memory as ranger for this model. Is it possible to reduce the memory footprint of DecisionTree.jl? I can provide a scrubbed version of my data set if that helps.