Decreased predictiveness when converting integers to floats

dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Apache License 2.0

26.36k stars 8.74k forks source link

I am running some experimentation on a dataset with roughly 300 features and around 300k datapoints. There are roughly 50 integer variables, representing a range of one-hot encoded, label encoded and ordinal numerical data.

When I convert all integer columns from integer to float before fitting my model, I see a significant reduction in model predictiveness on the test set.

Can anyone shed some light on why this might be? I can't find anything when performing a web search. I'm running XGBoost 2.0.3 using the sklearn API.

dmlc / xgboost

Decreased predictiveness when converting integers to floats #10683