Closed chadbreece closed 11 months ago
Also, do you have any pointers for hyperparameters while scaling up? The full dataset I am trying to run is ~40M rows, I'm trying to tune the hyperparameters on this 10% sample before applying those parameters to the full dataset.
Your train/val plot looks suspicious to me: that is strange that train and valid have the exact same scores at every epochs.
Have you tried a learning decay?
I'm using OneCycleLR right now, open to suggestions to try (scheduler or lr values) and I can follow up with results here.
Also, I have been using log cosh loss as my objective, MASE as my eval. My regression target is heavily right skewed so I recently tried RMSLE but that didn't change the dynamic you see above.
Here is an example where I trained with log cosh loss as my objective and use MAE as my eval (ignore the legend), this is a bit better but still pretty volatile.
I tried reducing the number of epochs and pct_start in OneCycleLR and got the following
Much more stable but still not seeing the training MASE getting much better than validation.
More playing yielded more of the same. XGBoost is often able to get down to <0.4 MASE but I can't seem to get tabnet below ~0.45
A large batch size often plays the role of a regularization method because of the batch norm used during training. At the cost of a longer training time you can try to significantly lower batch_size and virtual_batch_size (like 64). I'm not sure you'll get better validation performance but you should be able to see some overfitting.
Doing this, I still was unable to get the model to overfit... Are there any other hyperparameters I should be looking to change to help this?
larger n_d, n_a, larger number of steps: larger model capacity should enable overfitting capacity.
Describe the bug Can't get TabNetRegressor to overfit or stabilize.
What is the current behavior? Dataset is 4M rows, ~150 features with a 60/20/20 train/val/test split. I have tried increasing the model complexity and I still cannot get it to work as intended even overfit intentionally.
Here is the Train/Val eval graph:
and here is the same data run on XGBoost: (note that this had the first 20 boost rounds removed to highlight that the model is overfitting to the train set)
The hyperparams I used for this TabNet run are below using OneCycleLR and found via a Hyperopt run (which didn't help much)
{'cat_emb_dim': [8, 4, 1], 'gamma': 1.5, 'learning_rate': 0.0076031329945881925, (whether this is big or small I see this behavior) 'mask_type': 'sparsemax', 'max_lr': 0.15911336214731228, (whether this is big or small I see this behavior) 'n_d_a': 8, 'n_steps': 3.0, 'pct_start': 0.15}
I've read through other tickets on here but haven't seen anyone struggling to get overfitting, usually the opposite. Any advice is appreciated.
If the current behavior is a bug, please provide the steps to reproduce. N/A
Expected behavior Overfitting.
Screenshots See above.
Other relevant information: poetry version:
python version: 3.7 Operating System: Additional tools:
Additional context N/A