dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.18k stars 8.71k forks source link

[R] More compact Roxygen #9875

Closed mayer79 closed 1 month ago

mayer79 commented 10 months ago

Roxygen is now closer to Markdown. E.g., instead of the verbose

 \itemize{
   \item \code{Features}
   ...
}

one can write


- `Features`
- ...

This makes the docstrings easier to read.

I can work on this, if nobody has any objections.

trivialfis commented 10 months ago

I can work on this, if nobody has any objections.

Thank you for volunteering! Feel free to open a PR.

mayer79 commented 10 months ago

@trivialfis : Beginners question: How do you generate the rd help files in a convenient way?

For other packages, I usually run devtools::document() or devtools::document("R-package").. But that starts building the library and stops with

"fatal error: common.h: No such file or directory

include <dmlc/common.h"

trivialfis commented 10 months ago

@mayer79 Try to use this in the XGB repo, it's using roxygen:

python ./tests/ci_build/test_r_package.py --task=doc

Inside, it runs:

roxygen2::roxygenize()

Yes, it does build the package.

david-cortes commented 10 months ago

@trivialfis : Beginners question: How do you generate the rd help files in a convenient way?

For other packages, I usually run devtools::document() or devtools::document("R-package").. But that starts building the library and stops with

"fatal error: common.h: No such file or directory #include <dmlc/common.h"

I think you might have cloned the repository without the submodules. You need to use clone --recursive.

Maybe you could try something like this:

git submodule update --init --recursive
jameslamb commented 10 months ago

python ./tests/ci_build/test_r_package.py --task=doc

You can avoid needing to rebuild + reinstall every time by passing load = 'installed', like this:

# build and install the library for the first time
R CMD INSTALL --with-keep.source ./R-package

# re-generate the docs
Rscript -e "roxygen2::roxygenize('R-package/', load = 'installed')"

# (go manually change some roxygen comments)

# re-generate the docs again (no reinstall)
Rscript -e "roxygen2::roxygenize('R-package/', load = 'installed')"

(all commands run from the root of the repo)

For iterating during local development, you'll probably find that quicker than running devtools::document() or roxygen2::roxygenize() in ways that lead to reinstallation.

mayer79 commented 10 months ago

Both hints (almost) did the trick, thanks a lot!

Remaining problem: The first step of @jameslamb fails after some time with a

D:\xgboost>R CMD INSTALL --with-keep.source ./R-package
* installing to library 'C:/Users/Michael/AppData/Local/R/win-library/4.3'
* installing *source* package 'xgboost' ...
** using staged installation
** libs
Makevars.win:19: -DXGBOOST_STRICT_R_MODE=1
Makevars.win:19: -DDMLC_LOG_BEFORE_THROW=0
Makevars.win:19: -DDMLC_ENABLE_STD_THREAD=0
Makevars.win:19: -DDMLC_DISABLE_STDIN=1
Makevars.win:19: -DDMLC_LOG_CUSTOMIZE=1
Makevars.win:19: -DDMLC_CXX11_THREAD_LOCAL=0
using C compiler: 'gcc.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0'
using C++ compiler: 'g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0'
using C++17
Makevars.win:19: -DXGBOOST_STRICT_R_MODE=1
Makevars.win:19: -DDMLC_LOG_BEFORE_THROW=0
Makevars.win:19: -DDMLC_ENABLE_STD_THREAD=0

[...]

../..//include/xgboost/collective/socket.h:715:7: error: there are no arguments to 'inet_ntop' that depend on a template parameter, so a declaration of 'inet_ntop' must be available [-fpermissive]
       inet_ntop(AF_INET, addr, str, INET_ADDRSTRLEN);
       ^~~~~~~~~
../..//include/xgboost/collective/socket.h:715:7: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
../..//include/xgboost/collective/socket.h:722:7: error: there are no arguments to 'inet_ntop' that depend on a template parameter, so a declaration of 'inet_ntop' must be available [-fpermissive]
       inet_ntop(AF_INET6, addr, str, INET6_ADDRSTRLEN);
       ^~~~~~~~~
make: *** [C:/PROGRA~1/R/R-4.3.0/etc/x64/Makeconf:270: ../..//src/collective/allgather.o] Error 1
ERROR: compilation failed for package 'xgboost'
* removing 'C:/Users/Michael/AppData/Local/R/win-library/4.3/xgboost'
* restoring previous 'C:/Users/Michael/AppData/Local/R/win-library/4.3/xgboost'

Do we have an up-to-date docu on how to build the DLL?

trivialfis commented 10 months ago

Do we have an up-to-date docu on how to build the DLL?

Can you build it with:

cd R-package
R CMD INSTALL .
mayer79 commented 10 months ago

Thanks for the support, and sorry for the troubles. This gives the same:

D:\xgboost\R-package>R CMD INSTALL .
* installing to library 'C:/Users/Michael/AppData/Local/R/win-library/4.3'
* installing *source* package 'xgboost' ...
** using staged installation
** libs
Makevars.win:19: -DXGBOOST_STRICT_R_MODE=1
Makevars.win:19: -DDMLC_LOG_BEFORE_THROW=0
Makevars.win:19: -DDMLC_ENABLE_STD_THREAD=0
Makevars.win:19: -DDMLC_DISABLE_STDIN=1
Makevars.win:19: -DDMLC_LOG_CUSTOMIZE=1
Makevars.win:19: -DDMLC_CXX11_THREAD_LOCAL=0
using C compiler: 'gcc.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0'
using C++ compiler: 'g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0'
using C++17
Makevars.win:19: -DXGBOOST_STRICT_R_MODE=1
Makevars.win:19: -DDMLC_LOG_BEFORE_THROW=0
Makevars.win:19: -DDMLC_ENABLE_STD_THREAD=0
Makevars.win:19: -DDMLC_DISABLE_STDIN=1
Makevars.win:19: -DDMLC_LOG_CUSTOMIZE=1
g++  -std=gnu++17 -I"C:/PROGRA~1/R/R-4.3.0/include" -DNDEBUG -I../..//include -I../..//dmlc-core/include -I../..//rabit/include -I../../ -DXGBOOST_STRICT_R_MODE=1 -DDMLC_LOG_BEFORE_THROW=0 -DDMLC_ENABLE_STD_THREAD=0 -DDMLC_DISABLE_STDIN=1 -DDMLC_LOG_CUSTOMIZE=1    -I"c:/rtools43/x86_64-w64-mingw32.static.posix/include"  -fopenmp -DDMLC_CMAKE_LITTLE_ENDIAN=1 -pthread    -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c ../..//src/collective/allgather.cc -o ../..//src/collective/allgather.o
In file included from ../..//src/collective/loop.h:17,
                 from ../..//src/collective/comm.h:14,
                 from ../..//src/collective/allgather.h:13,
                 from ../..//src/collective/allgather.cc:4:
../..//include/xgboost/collective/socket.h: In function 'xgboost::collective::Result xgboost::collective::INetNToP(const H&, std::__cxx11::string*)':
../..//include/xgboost/collective/socket.h:715:7: error: there are no arguments to 'inet_ntop' that depend on a template parameter, so a declaration of 'inet_ntop' must be available [-fpermissive]
       inet_ntop(AF_INET, addr, str, INET_ADDRSTRLEN);
       ^~~~~~~~~
../..//include/xgboost/collective/socket.h:715:7: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
../..//include/xgboost/collective/socket.h:722:7: error: there are no arguments to 'inet_ntop' that depend on a template parameter, so a declaration of 'inet_ntop' must be available [-fpermissive]
       inet_ntop(AF_INET6, addr, str, INET6_ADDRSTRLEN);
       ^~~~~~~~~
make: *** [C:/PROGRA~1/R/R-4.3.0/etc/x64/Makeconf:270: ../..//src/collective/allgather.o] Error 1
ERROR: compilation failed for package 'xgboost'
* removing 'C:/Users/Michael/AppData/Local/R/win-library/4.3/xgboost'
* restoring previous 'C:/Users/Michael/AppData/Local/R/win-library/4.3/xgboost'

Maybe try with a different compiler?

trivialfis commented 10 months ago

That's interesting, we run the command on CI https://github.com/dmlc/xgboost/blob/ae32936ba242e3fe083f79e8592b337220c1cb73/tests/ci_build/test_r_package.py#L221

trivialfis commented 10 months ago

using C++ compiler: 'g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0'

Where did you get the R distribution? (apologies for my lack of familiarity with Windows R).

trivialfis commented 10 months ago

The R distribution I use on Windows is downloaded from Rtools official site. https://cran.r-project.org/bin/windows/Rtools/rtools43/rtools.html

mayer79 commented 10 months ago

You got me... I was still on rtools40 😇

trivialfis commented 10 months ago

That's even more interesting, you have a line in the log:

g++ -std=gnu++17 -I"C:/PROGRA~1/R/R-4.3.0/include" -DNDEBUG -I../..//include -I../..//dmlc-core/include -I../..//rabit/include -I../../ -DXGBOOST_STRICT_R_MODE=1 -DDMLC_LOG_BEFORE_THROW=0 -DDMLC_ENABLE_STD_THREAD=0 -DDMLC_DISABLE_STDIN=1 -DDMLC_LOG_CUSTOMIZE=1 -I"c:/rtools43/x86_64-w64-mingw32.static.posix/include" -fopenmp -DDMLC_CMAKE_LITTLE_ENDIAN=1 -pthread -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c ../..//src/collective/allgather.cc -o ../..//src/collective/allgather.o

which shows rtools43.

mayer79 commented 10 months ago

At least the path variable was pointing to C:/rtools40... let's see what I get with clean environment variables.

mayer79 commented 10 months ago

A different error message pops up: Is g++ version too high?

D:\xgboost>R CMD INSTALL --with-keep.source ./R-package
* installing to library 'C:/Users/Michael/AppData/Local/R/win-library/4.3'
* installing *source* package 'xgboost' ...
** using staged installation
** libs
Makevars.win:19: -DXGBOOST_STRICT_R_MODE=1
Makevars.win:19: -DDMLC_LOG_BEFORE_THROW=0
Makevars.win:19: -DDMLC_ENABLE_STD_THREAD=0
Makevars.win:19: -DDMLC_DISABLE_STDIN=1
Makevars.win:19: -DDMLC_LOG_CUSTOMIZE=1
Makevars.win:19: -DDMLC_CXX11_THREAD_LOCAL=0
using C compiler: 'gcc.exe (GCC) 12.3.0'
using C++ compiler: 'g++.exe (GCC) 12.3.0'
using C++17
Makevars.win:19: -DXGBOOST_STRICT_R_MODE=1
Makevars.win:19: -DDMLC_LOG_BEFORE_THROW=0
Makevars.win:19: -DDMLC_ENABLE_STD_THREAD=0
Makevars.win:19: -DDMLC_DISABLE_STDIN=1
Makevars.win:19: -DDMLC_LOG_CUSTOMIZE=1
g++ -shared -s -static-libgcc -o xgboost.dll xgboost-win.def ./xgboost_R.o ./xgboost_custom.o ./init.o ../..//src/metric/metric.o ../..//src/metric/elementwise_metric.o ../..//src/metric/multiclass_metric.o ../..//src/metric/rank_metric.o ../..//src/metric/auc.o ../..//src/metric/survival_metric.o ../..//src/objective/objective.o ../..//src/objective/regression_obj.o ../..//src/objective/multiclass_obj.o ../..//src/objective/lambdarank_obj.o ../..//src/objective/hinge.o ../..//src/objective/aft_obj.o ../..//src/objective/adaptive.o ../..//src/objective/init_estimation.o ../..//src/objective/quantile_obj.o ../..//src/gbm/gbm.o ../..//src/gbm/gbtree.o ../..//src/gbm/gbtree_model.o ../..//src/gbm/gblinear.o ../..//src/gbm/gblinear_model.o ../..//src/data/adapter.o ../..//src/data/simple_dmatrix.o ../..//src/data/data.o ../..//src/data/sparse_page_raw_format.o ../..//src/data/ellpack_page.o ../..//src/data/file_iterator.o ../..//src/data/gradient_index.o ../..//src/data/gradient_index_page_source.o ../..//src/data/gradient_index_format.o ../..//src/data/sparse_page_dmatrix.o ../..//src/data/proxy_dmatrix.o ../..//src/data/iterative_dmatrix.o ../..//src/predictor/predictor.o ../..//src/predictor/cpu_predictor.o ../..//src/predictor/cpu_treeshap.o ../..//src/tree/constraints.o ../..//src/tree/param.o ../..//src/tree/fit_stump.o ../..//src/tree/tree_model.o ../..//src/tree/multi_target_tree_model.o ../..//src/tree/tree_updater.o ../..//src/tree/updater_approx.o ../..//src/tree/updater_colmaker.o ../..//src/tree/updater_prune.o ../..//src/tree/updater_quantile_hist.o ../..//src/tree/updater_refresh.o ../..//src/tree/updater_sync.o ../..//src/tree/hist/param.o ../..//src/tree/hist/histogram.o ../..//src/linear/linear_updater.o ../..//src/linear/updater_coordinate.o ../..//src/linear/updater_shotgun.o ../..//src/learner.o ../..//src/context.o ../..//src/logging.o ../..//src/global_config.o ../..//src/collective/allgather.o ../..//src/collective/allreduce.o ../..//src/collective/broadcast.o ../..//src/collective/comm.o ../..//src/collective/coll.o ../..//src/collective/tracker.o ../..//src/collective/communicator.o ../..//src/collective/in_memory_communicator.o ../..//src/collective/in_memory_handler.o ../..//src/collective/loop.o ../..//src/collective/socket.o ../..//src/common/charconv.o ../..//src/common/column_matrix.o ../..//src/common/common.o ../..//src/common/error_msg.o ../..//src/common/hist_util.o ../..//src/common/host_device_vector.o ../..//src/common/io.o ../..//src/common/json.o ../..//src/common/numeric.o ../..//src/common/pseudo_huber.o ../..//src/common/quantile.o ../..//src/common/random.o ../..//src/common/stats.o ../..//src/common/survival_util.o ../..//src/common/threading_utils.o ../..//src/common/ranking_utils.o ../..//src/common/quantile_loss_utils.o ../..//src/common/timer.o ../..//src/common/version.o ../..//src/c_api/c_api.o ../..//src/c_api/c_api_error.o ../..//amalgamation/dmlc-minimum0.o ../..//rabit/src/engine.o ../..//rabit/src/rabit_c_api.o ../..//rabit/src/allreduce_base.o -fopenmp -DDMLC_CMAKE_LITTLE_ENDIAN=1 -pthread -lwsock32 -lws2_32 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib -LC:/PROGRA~1/R/R-4.3.0/bin/x64 -lR
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: c:/rtools43/x86_64-w64-mingw32.static.posix/bin/../lib/gcc/x86_64-w64-mingw32.static.posix/12.3.0/libstdc++.a(fs_path.o):(.text$_ZNSt10filesystem7__cxx1116filesystem_errorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10error_code+0x0): multiple definition of `std::filesystem::__cxx11::filesystem_error::filesystem_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::error_code)'; ../..//src/data/file_iterator.o:file_iterator.cc:(.text$_ZNSt10filesystem7__cxx1116filesystem_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10error_code[_ZNSt10filesystem7__cxx1116filesystem_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10error_code]+0x0): first defined here
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.text+0x1c03): undefined reference to `std::codecvt<wchar_t, char, int>::codecvt(unsigned long long)'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.text+0x1d49): undefined reference to `std::__codecvt_utf8_base<wchar_t>::do_in(int&, char const*, char const*, char const*&, wchar_t*, wchar_t*, wchar_t*&) const'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.text$_ZNSt10filesystem7__cxx1116filesystem_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10error_code[_ZNSt10filesystem7__cxx1116filesystem_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10error_code]+0x1f0): undefined reference to `std::filesystem::__cxx11::filesystem_error::_M_gen_what()'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.text$_ZNSt10filesystem7__cxx114path14_S_str_convertIcSt11char_traitsIcESaIcEEENSt7__cxx1112basic_stringIT_T0_T1_EERKNS7_IwS3_IwESaIwEEERKSA_[_ZNSt10filesystem7__cxx114path14_S_str_convertIcSt11char_traitsIcESaIcEEENSt7__cxx1112basic_stringIT_T0_T1_EERKNS7_IwS3_IwESaIwEEERKSA_]+0x76): undefined reference to `std::codecvt<wchar_t, char, int>::codecvt(unsigned long long)'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.text$_ZNSt10filesystem7__cxx114path14_S_str_convertIcSt11char_traitsIcESaIcEEENSt7__cxx1112basic_stringIT_T0_T1_EERKNS7_IwS3_IwESaIwEEERKSA_[_ZNSt10filesystem7__cxx114path14_S_str_convertIcSt11char_traitsIcESaIcEEENSt7__cxx1112basic_stringIT_T0_T1_EERKNS7_IwS3_IwESaIwEEERKSA_]+0x19d): undefined reference to `std::__codecvt_utf8_base<wchar_t>::do_out(int&, wchar_t const*, wchar_t const*, wchar_t const*&, char*, char*, char*&) const'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.rdata$_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE[_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE]+0x20): undefined reference to `std::__codecvt_utf8_base<wchar_t>::do_out(int&, wchar_t const*, wchar_t const*, wchar_t const*&, char*, char*, char*&) const'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.rdata$_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE[_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE]+0x28): undefined reference to `std::__codecvt_utf8_base<wchar_t>::do_unshift(int&, char*, char*, char*&) const'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.rdata$_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE[_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE]+0x30): undefined reference to `std::__codecvt_utf8_base<wchar_t>::do_in(int&, char const*, char const*, char const*&, wchar_t*, wchar_t*, wchar_t*&) const'
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.rdata$_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE[_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE]+0x48): undefined reference to `std::__codecvt_utf8_base<wchar_t>::do_length(int&, char const*, char const*, unsigned long long) const'
collect2.exe: error: ld returned 1 exit status
make: *** [C:/PROGRA~1/R/R-4.3.0/share/make/winshlib.mk:16: xgboost.dll] Error 1
ERROR: compilation failed for package 'xgboost'
* removing 'C:/Users/Michael/AppData/Local/R/win-library/4.3/xgboost'
* restoring previous 'C:/Users/Michael/AppData/Local/R/win-library/4.3/xgboost'
trivialfis commented 10 months ago

A different error message pops up: Is g++ version too high?

No, I don't think the compiler version can be "too high" for C++ code. Judging from:

C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../..//src/data/file_iterator.o:file_iterator.cc:(.rdata$_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE[_ZTVSt12codecvt_utf8IwLm1114111ELSt12codecvt_mode0EE]+0x20): undefined reference to `std::__codecvt_utf8_base::do_out(int&, wchar_t const, wchar_t const, wchar_t const&, char, char, char&) const'

I think you have a mix of R/C++ environments and they got messed together, the compiler is using the wrong standard library.

mayer79 commented 10 months ago

@trivialfis Thanks your your patient assistance! A simple --preclean was now sufficient to build the library :-).

R CMD INSTALL --preclean --with-keep.source ./R-package
trivialfis commented 10 months ago

Excellent! That's probably caused by the binaries compiled before you upgrade your toolchain to 4.3. If I guess right, the second run after the first successful compilation should not require the preclean option.

mayer79 commented 1 month ago

Implemented in #9906, #9919, #10674, and #10733