incremental CART decision tree, based on the hoeffding tree i.e. very fast decision tree (VFDT), which is proposed in this paper "Mining High-Speed Data Streams" by Domingos & Hulten (2000). And a newly extended model "Extremely Fast Decision Tree" (EFDT) by Manapragada, Webb & Salehi (2018). Added new implementation of Random Forest
100
stars
28
forks
source link
update and predict now take sklearn-style arrays (+ PEP8) #2
I've been using your VFDT code, thank you a lot, it's very clear and well-made.
I'm using VFDT in a pipeline where I sometimes want to pass it arrays to update/predict at once, so:
I renamed your (VFDT class') update & predict (that take single vectors) into __update and __predict and created update & predict to call check_array and check_X_y from sklearn.utils before calling your own functions.
I also wanted to try the algorithm without tie-breaking, so I added a condition to disable tie-breaking when tau == 0. It was a terrible idea, the algo does not converge, but still, it's a feature.
Finally, I use a formatter/linter that is a PEP8 nazi on my text editor, so it cut every line that was > 80 chars into several lines. This is just cosmetic.
(Your example at the bottom of vfdt.py still works.)
(I didn't open the other files.)
In case you are interested, have this pull request, with regards,
Adrien Luxey
PS: If you want to be even more sklearn compatible, you should just rename your update into fit. You could also add a fit_predict function doing both, as in e.g. KMeans.
Hi!
I've been using your VFDT code, thank you a lot, it's very clear and well-made.
I'm using VFDT in a pipeline where I sometimes want to pass it arrays to update/predict at once, so:
update
&predict
(that take single vectors) into__update
and__predict
and createdupdate
&predict
to callcheck_array
andcheck_X_y
fromsklearn.utils
before calling your own functions.tau == 0
. It was a terrible idea, the algo does not converge, but still, it's a feature.vfdt.py
still works.)In case you are interested, have this pull request, with regards, Adrien Luxey
PS: If you want to be even more sklearn compatible, you should just rename your
update
intofit
. You could also add afit_predict
function doing both, as in e.g. KMeans.