LeoGrin / tabular-benchmark

448 stars 59 forks source link

how to find big data size - lets say 10 million rows and 80 features #6

Open Sandy4321 opened 2 years ago

Sandy4321 commented 2 years ago

great work Why do tree-based models still outperform deep learning on tabular data?

but can you recommend data set for mixed continues and categorical features for binary classification with big data size - lets say 10 million rows and 80 features ?

when 1 features are not independent - for example some features have dependencies on several other features ?
2 unbalanced data - much more NO labels than YES labels

like https://www.kaggle.com/competitions/amex-default-prediction/data

https://github.com/jxzly/Kaggle-American-Express-Default-Prediction-1st-solution