dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.09k stars 8.7k forks source link

Decreased predictiveness when converting integers to floats #10683

Open tsmith-perchgroup opened 1 month ago

tsmith-perchgroup commented 1 month ago

I am running some experimentation on a dataset with roughly 300 features and around 300k datapoints. There are roughly 50 integer variables, representing a range of one-hot encoded, label encoded and ordinal numerical data.

When I convert all integer columns from integer to float before fitting my model, I see a significant reduction in model predictiveness on the test set.

Can anyone shed some light on why this might be? I can't find anything when performing a web search. I'm running XGBoost 2.0.3 using the sklearn API.

trivialfis commented 1 month ago

Hi, could you please share a reproducible example? It's not easy to make guess based on your description.