Open Denisevi4 opened 5 years ago
Thanks for your advice. We are working on this.
Just a suggestion. Xgboost creates K independent datasets for K-fold CV. You could save a lot of memory and make it fast if you instead set the id for each sample indicating which fold number is used in the testing.
For example if you pass a vector [0,1,2,3,4], then the first row would be used as a test in CV only in the first fold.
Or something like that. Do you understand what I mean?
For example catboost planned to do something similar: https://github.com/catboost/catboost/issues/765
顶楼上,迫切希望补上callback和指定validate dataset的功能,这样就可以代替lightgbm了,速度真的快了很多很多,这样好的project不被完成太可惜了! 还有在使用的时候发现数据传输到GPU花的时间比训练长不少,要是能有builtin的grid search就更完美了
没能力coding帮助楼主,楼主贴个支付宝吧我们赞助点咖啡钱
@zhuyw05 Thank you for the feedback! We will work on that. Please stay tuned.
Hi, we have added a function "cv" for cross-validation which supports the setting of the validation set. You can refer to the document for the parameters. We're still improving the python interface and will add callbacks such as early stopping in the future. Thanks.
and/or train tree by tree with callbacks whether to stop training?
that'd be nice.
Doesn't have to be a python callback. c++ is fine if you don't want to lose in speed performance.