Closed melfm closed 1 year ago
When training the sequential models such as CL4SRec, after a few epochs of training, I'm getting nans for the batch_loss and rec_loss. For instance see the output below:
CL4SRec
batch_loss and rec_loss
## CL4SRec Epoch: 2, Hit Ratio:0.02066 | Precision:0.00103 | Recall:0.02066 | NDCG:0.00784 *Best Performance* Epoch: 2, Hit Ratio:0.02066 | NDCG:0.00784 ------------------------------------------------------------------------------------------------------------------------ training: 3 batch 50 batch_loss: 0.5157323479652405 rec_loss: 0.4582507908344269 Evaluating the model... Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100% ------------------------------------------------------------------------------------------------------------------------ Real-Time Ranking Performance (Top-20 Item Recommendation) *Current Performance* Epoch: 3, Hit Ratio:0.03372 | Precision:0.00169 | Recall:0.03372 | NDCG:0.01302 *Best Performance* Epoch: 3, Hit Ratio:0.03372 | NDCG:0.01302 ------------------------------------------------------------------------------------------------------------------------ training: 4 batch 50 batch_loss: 0.4821103513240814 rec_loss: 0.4299513101577759 Evaluating the model... Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100% ------------------------------------------------------------------------------------------------------------------------ Real-Time Ranking Performance (Top-20 Item Recommendation) *Current Performance* Epoch: 4, Hit Ratio:0.04212 | Precision:0.00211 | Recall:0.04212 | NDCG:0.01622 *Best Performance* Epoch: 4, Hit Ratio:0.04212 | NDCG:0.01622 ------------------------------------------------------------------------------------------------------------------------ training: 5 batch 50 batch_loss: nan rec_loss: nan
Any ideas what could be causing this?
P.S. This is training with the amazon-beauty datasets, some of the other datasets don't load with this model.
amazon-beauty
This model has no official implementation released. I will maintain the code in next months.
Problem addressed.
When training the sequential models such as
CL4SRec
, after a few epochs of training, I'm getting nans for thebatch_loss and rec_loss
. For instance see the output below:Any ideas what could be causing this?
P.S. This is training with the
amazon-beauty
datasets, some of the other datasets don't load with this model.