dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 111 forks source link

validate parameter names by default #159

Closed ExpandingMan closed 1 year ago

ExpandingMan commented 1 year ago

I don't really like changing the default parameters will-he nill-he but multiple of us have gotten pretty annoyed by this so this PR turns on warnings for invalid parameter names by default.

It suffers from the usual issue of having its own stderr output so we can't run it through the Julia logger.

ExpandingMan commented 1 year ago

Of course this will also be blocked now thanks to windows. Sigh.

trivialfis commented 1 year ago

Hi, xgb has a logging callback that can be used to redirect logging to bindings

trivialfis commented 1 year ago

https://github.com/dmlc/xgboost/blob/72ec0c54841c03f9c7a81b72137234f2519273ab/python-package/xgboost/core.py#L234

trivialfis commented 1 year ago

A little bit context on the default.

The default is set to false not because there's potential danger in the validation implementation. Some bindings copy the internal default values (old scala binding) for training parameters like max leave and pass them into libxgboost even if it's not used. The default is just to avoid breaking those bindings. Once we are sure that all existing bindings out there can pass only those parameters that are needed, we can remove the validate_parameters altogether and just carry out the validation.

Of course this will also be blocked now thanks to windows. Sigh.

Been there. Nothing can stop the fun of knowing Windows CI have just failed.

ExpandingMan commented 1 year ago

Thanks! Can't believe I never noticed XGBRegisterLogCallback. This PR now also calls that on __init__ so logging output from libxgboost now gets redirected through the Julia logger.

devmotion commented 1 year ago

This PR broke our tests: https://github.com/TuringLang/MCMCDiagnosticTools.jl/actions/runs/3914362811/jobs/6691330173#step:4:510 We are not doing anything fancy with XGBoost (in particular not passing any parameters that should be validated), but since we use XGBoost through MLJ I assume that the problem is the list of parameters here: https://github.com/JuliaAI/MLJXGBoostInterface.jl/blob/ab061e4b80069cd7884ee23e4d7bbe4834bdd7e9/src/MLJXGBoostInterface.jl#L54 It seems the different XGBoost models (classifier, regressor, ...) should use different sets of parameters and not every parameter is needed for every model.

ExpandingMan commented 1 year ago

Not sure why your tests are failing on a warning, however, the way MLJXGBoostInterface.jl works is not ideal. In retrospect the choice to always have every model hold every possible parameter was probably not a great one and is just asking for these kinds of problems.

For the time being, I have made a PR to disable parameter validation in the MLJ interface by default.

devmotion commented 1 year ago

It's an example in the docstrings, and different output (e.g., due to logging) causes doctests to fail.