Models don't perform as expected when trained on a large train set

atkm / avazu-ctr

0 stars 0 forks source link

Models don't perform as expected when trained on a large train set #4

Open atkm opened 5 years ago

atkm commented 5 years ago

From Kaggle submissions (private scores):

Model 3:

Half of train.csv: 0.4241803
train_small.csv: 0.4123864

Model 1:

Half of train.csv: 0.4172254
train_small.csv: 0.4168748

Run an experiment to pick an optimal training set size. One way to reduce the training set is to remove rows with (device_id, site/app_id) pairs that do not appear on the test set.

atkm commented 5 years ago

Maybe the model needs a different regularization parameter when fitting to a large dataset?

atkm commented 5 years ago

A simple classifier like logistic regression shouldn't require as much data as a more complex one like decision trees.