atkm / avazu-ctr

0 stars 0 forks source link

Models don't perform as expected when trained on a large train set #4

Open atkm opened 5 years ago

atkm commented 5 years ago

From Kaggle submissions (private scores):

Model 3:

Model 1:

Run an experiment to pick an optimal training set size. One way to reduce the training set is to remove rows with (device_id, site/app_id) pairs that do not appear on the test set.

atkm commented 5 years ago

Maybe the model needs a different regularization parameter when fitting to a large dataset?

atkm commented 5 years ago

A simple classifier like logistic regression shouldn't require as much data as a more complex one like decision trees.