Closed Kayne88 closed 2 years ago
The consumption should attain a peak at the end of an epoch.
Do you manage to get the pearson correlation score for the first epoch?
Does it work if you reduce the batch size ?
Do you manage to get the pearson correlation score for the first epoch? No, I can't see the evaluation of first epoch
Does it work if you reduce the batch size ? I tried initially with 256 batch_size, also out of RAM
Is it GPU OOM or RAM OOM ?
RAM OOM. It basically jumps for 30GB consumption over 52GB consumption
Could this be related to the custom metric? Might it help if I implement the metric with torch rather than np, so it can use the GPU?
Could this be related to the custom metric?
I think it's unlikely, but you can try rmse and see if it solves the problem.
What is the size of your train/test ? in number of rows and colums?
TRAIN (1914562, 1214) - TEST (476390, 1214)
RMSE actually works :)
I'd be happy to know if you get competitive results on your dataset with tabnet. Please leave a comment if you can :)
With pleasure. However I first need to make the corr metric work. RMSE is not appropriate for my problem. Also, I would really like to use a custom loss eventually.
But with pleasure, I can give a comparison with my current catboost benchmark scores to tabnet.
Can't you use a simple pearson correlation ?
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html
I tried to use the sklearn r2_score, also OOM. I suspect the problem is that during eval metric calculation the model, tensors and data are moved to cpu. One way could be to explicitly transfer them to cuda in the metric calculation.
What is working for me now is the implementation here: https://torchmetrics.readthedocs.io/en/stable/regression/pearson_corr_coef.html
First runs look very promising, after only 15 epochs I come close to catboost performance (which is hyperparam optimized). Real comparison will come on full validation (separate from train and test) which is almost as large as whole train.
One drawback of tabnet is that hyperparam optimization (with optuna) will take a very long time already for 100 trials. I need to see how to best approach that topic.
Keep you updated
PS: What I observe during training with fixed LR that for 1-2 epochs the eval metric is "oscilating" and then does a significant improvement. I am not very experienced with LR schedulers but decided to give OneCycleLR a try. Maybe it smoothes the training.
yes I would advise to decay with OneCycleLR. This will make the model converge in fewer epochs.
Thanks for the updates!
Here are some intermediate results and comparison with catboost benchmark. Ive applied shallow hyperparam optimization to tabnet. Things to note, the data set has very low signal to noise ratio, it's from the financial context, where the target is some performance measure of an asset to be predicted. Adequate basic metrics for such a problem are different kinds of correlations. The comparison are done on a large validation set, which has almost the size of the train set. The task is regression
PREDS - pearson correlation 0.031141676801666244 - feature neutral corrleation 0.02642358893294882 PREDS NEUTRALIZED - pearson correlation 0.028844560064221897 - feature neutral correlation 0.026562891162170366
PREDS - pearson correlation 0.02533170902252626 - spearman corr 0.02516739791397788 - fnc 0.021378226358012287 PREDS NEUTRALIZED - pearson correlation 0.021450115592071817 - spearman corr 0.020884041941114838 - fnc 0.020596744887857364
We can see that the metrics fall of by quite some margin, however tabnet achieves the best performance among other deep learning architectures (tabtransformer, resnet). Another thing to note is that the pearson correlation between the catboost predictions and tabnet predictions is roughly 0.66, which is not tremendously high. So it seems that tabnet learns a different signal than catboost.
Current flaws:
Current hyperparam grid:
param_grid = {
"optimizer_fn": torch.optim.AdamW,
"optimizer_params": dict(lr=0.017),
"scheduler_fn": torch.optim.lr_scheduler.CosineAnnealingWarmRestarts,
"scheduler_params": dict(T_0=200, T_mult=1, eta_min=1e-4, last_epoch=-1, verbose=False),
"n_d": 8,
"n_a": 8,
"n_steps": 7,
"gamma": 2.0,
"n_independent": 4,
"n_shared": 3,
"momentum": 0.17,
"lambda_sparse": 0,
"verbose": 1,
"mask_type": "entmax"
}
@Optimox
@Kayne88 thank you very much for sharing your results.
The model learns to pay attention to specific features in order to minimize the loss function. Some features might end up masked out if they correlate too much with a better feature, however you'll have no guarantee that this is the case. You could simply remove those feature before training.
However you can play with hyperparameters to get closer to what you want:
lambda_sparse
: the bigger this is the sparsier your mask will be. So setting this to a score > 0 might ensure that the model won't look at two correlated features.gamma
: a large gamma (gamma values should stay between 1 and 5 max I'd recommend) will forbid the model to reuse the same features at different steps. So if you don't want weak correlated features to be used by the model you can set a high gamman_steps
: the more steps the more features your model will be able to pick at some point.All these recommendations have no guarantee of working. This is just my general understanding but you should experiment on them and see how it goes.
Good luck!
When training with custom eval metric (pearson corr), after first evaluation my colab session runs out of memory.
What is the current behavior? Training of TabNetRegressor starts fine and after first evaluation round, I run out of memory. I am training the model on GPU 16GB and free RAM is approx 40 GB. The RAM consumption during training steadily increases. I am training on a pretty large dataset (11 GB)
Expected behavior
I would expect that the RAM consumption is more or less constant during training, once the model is initialized.
Screenshots
Other relevant information: poetry version: ? python version: 3.8 Operating System: Ubuntu Additional tools:
Additional context