dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.27k stars 8.73k forks source link

[RFC] Making R interface more idiomatic #7906

Open david-cortes opened 2 years ago

david-cortes commented 2 years ago

I notice that there is a version 2.0 of xgboost in the plans, which among other things, is expected to include support for categorical features in the R interface.

Given that this is a major version release and as such is expected to introduce potentially breaking changes, I think this is a good opportunity to make the R interface more in line with base R and core/popular R modeling packages. Many people (including myself) find the R interface of xgboost to be inconvenient and unidiomatic, but changing the interface for xgboost() from its current state would be a rather big breaking change and would probably break lots of user scripts that depend on xgboost().

In short, xgboost() does not work with the most common data types used in R (data.frame) and does not follow R conventions in terms of e.g. function arguments. For people who are familiar with base R and with other R packages, there are many ways in which the R interface of xgboost could be improved for a better end-user experience, such as:

Among many others.

Would this project accept big breaking PRs for the R interface (particularly for xgboost() and predict.xgb.Booster()) for the 2.0 release that would make it more similar to base R and other R packages?

RAMitchell commented 2 years ago

I don't think any current active maintainers are big R users so we welcome input. Could we just build a new interface behind a different namespace until it's ready? I don't think there's a need to immediate replace the old interface in a short space of time.

trivialfis commented 2 years ago

Would this project accept big breaking PRs for the R interface (particularly for xgboost() and predict.xgb.Booster()) for the 2.0 release that would make it more similar to base R and other R packages?

I would like to welcome these changes. The concern about breaking changes can be handled by running reverse dependency checks.

mayer79 commented 2 years ago

I suggest to keep xgboost() and predict() as they are and instead call the new functions differently, e.g. xgboost2() and predict2(). Too much code would break when changing the main functions.

Otherwise, great work @david-cortes.