dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.14k stars 8.71k forks source link

[Feature proposal] Expose xgboost/c_api.h in R package for LinkingTo #5979

Open jeffreyhanson opened 4 years ago

jeffreyhanson commented 4 years ago

I am working on an R package which involves running lots of models, with various inputs/datasets, and doing various things with the outputs. Since speed is an important consideration, I am writing most of the code using C++ (embedded into the R package using the Rcpp package) and I am directly interfacing with the xgboost C api (via <xgboost/c_api.h>). Would it be possible to update the xgboost R package so that it provides a header file (in inst/include folder) that other R packages can link to so they can use the functions in <xgboost/c_api.h>? At the moment, I have just copy-and-pasted the xgboost code into the src folder of this R package (after building the xgboost R package tarball) -- but this is not ideal and (most likely) means that I won't be releasing it on CRAN when it's finished. For example, the RcppGSL package provides linking to GSL so that other R packages can include GSL by adding LinkingTo: RcppGSL in their DESCRIPTION file (e.g. see the smam R package; https://github.com/ChaoranHu/smam and the DESCRIPTION file https://github.com/ChaoranHu/smam/blob/master/DESCRIPTION).

trivialfis commented 4 years ago

I'm not entirely sure how to do that. Just out of curiosity, since you are using multiple languages and each of them has its own dependencies, why using a R specific package manager, why not conda or some others?

trivialfis commented 4 years ago

Surely if you want to help adding optional support for this in XGBoost we would love to review and help. But from my experience, asking a language specific dependency manager to manage libraries from other languages is quite hacky and difficult to maintain.

jeffreyhanson commented 4 years ago

Excellent - I'll look further into this and see if there's a simple solution - thanks @trivialfis! Yeah, the reason I'm writing this as an R package is that I need to fit these analyses within a broader analysis (written in R, because R has excellent support for data wrangling, statistical analysis, visualization, and spatial data operations). Also, the Rcpp package makes it trivially easy to embed C++ code within an R package and provides functionality to seamlessly convert R objects to native C++ data types (e.g. std::vector<double>) or common C++ Matrix libraries (e.g. Eigen and Armadillo libraries via RcppEigen and RcppArmadillo). If you're interested in learning about using Rcpp in R packages, there's a really helpful introduction in the Hadley Wickham's R Package's book (available online for free).

privefl commented 2 years ago

This would come in very handy for me as well.

aosakwe commented 7 months ago

I would also be interested in this feature. Has there been any update on this?

jeffreyhanson commented 7 months ago

I'm sorry for my lack of progress/updates on this. I did try putting togeather a PR a few years ago, but I wasn't able to get it working correctly so I gave up.