JuliaAI / DecisionTree.jl

Julia implementation of Decision Tree (CART) and Random Forest algorithms
Other
351 stars 101 forks source link

VFDT's based on this package? #55

Open robertfeldt opened 6 years ago

robertfeldt commented 6 years ago

Hi, I'm interested in a Julia implementation of Domingo's VFDT's aka "Hoeffding Trees", see, for example:

http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/HoeffdingTree.html

This is a streaming algorithm for learning decision trees and might be very useful for modelling "big data" such as logs etc.

Are there any plans for implementing streaming algorithms within this package? If not do you think it is feasible on top of the infrastructure provided here, or would a clean/separate implementation/package be better?

Thanks for any input you might have.

ValdarT commented 6 years ago

I have also missed something like that in Julia. But perhaps it would fit better to OnlineStats.jl?

robertfeldt commented 6 years ago

Agreed, although my feeling is that the types and methods available in this package (DT) might be needed for a VFDT implementation and, in some sense, it is not as clear that a VFDT has O(1) memory requirement (since the tree might grow very large).

ValdarT commented 6 years ago

What is your opinion on this, @joshday? Would VFDT fit the scope of OnlineStats.jl or would it better fit elsewhere?

joshday commented 6 years ago

Yes, it would fit in the scope of OnlineStats. I actually started working on it recently, but @robertfeldt makes a good point that before it's done I'll probably be reinventing some data structures that exist here. I think that's fine at least in the short term.