Xtra-Computing / thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs
Apache License 2.0
689 stars 85 forks source link

Could you implement cross-validation with callbacks when to stop? #17

Open Denisevi4 opened 5 years ago

Denisevi4 commented 5 years ago

and/or train tree by tree with callbacks whether to stop training?

that'd be nice.

Doesn't have to be a python callback. c++ is fine if you don't want to lose in speed performance.

jiahuanluo commented 5 years ago

Thanks for your advice. We are working on this.

Denisevi4 commented 5 years ago

Just a suggestion. Xgboost creates K independent datasets for K-fold CV. You could save a lot of memory and make it fast if you instead set the id for each sample indicating which fold number is used in the testing.

For example if you pass a vector [0,1,2,3,4], then the first row would be used as a test in CV only in the first fold.

Or something like that. Do you understand what I mean?

For example catboost planned to do something similar: https://github.com/catboost/catboost/issues/765

zhuyw05 commented 5 years ago

顶楼上,迫切希望补上callback和指定validate dataset的功能,这样就可以代替lightgbm了,速度真的快了很多很多,这样好的project不被完成太可惜了! 还有在使用的时候发现数据传输到GPU花的时间比训练长不少,要是能有builtin的grid search就更完美了

zhuyw05 commented 5 years ago

没能力coding帮助楼主,楼主贴个支付宝吧我们赞助点咖啡钱

zeyiwen commented 5 years ago

@zhuyw05 Thank you for the feedback! We will work on that. Please stay tuned.

QinbinLi commented 5 years ago

Hi, we have added a function "cv" for cross-validation which supports the setting of the validation set. You can refer to the document for the parameters. We're still improving the python interface and will add callbacks such as early stopping in the future. Thanks.