Coder-Yu / SELFRec

An open-source framework for self-supervised recommender systems.
516 stars 76 forks source link

Sequential models loss outputting nan after a few epochs #21

Closed melfm closed 1 year ago

melfm commented 1 year ago

When training the sequential models such as CL4SRec, after a few epochs of training, I'm getting nans for the batch_loss and rec_loss. For instance see the output below:

## CL4SRec
Epoch: 2, Hit Ratio:0.02066  |  Precision:0.00103  |  Recall:0.02066  |  NDCG:0.00784
*Best Performance* 
Epoch: 2, Hit Ratio:0.02066  |  NDCG:0.00784
------------------------------------------------------------------------------------------------------------------------
training: 3 batch 50 batch_loss: 0.5157323479652405 rec_loss: 0.4582507908344269
Evaluating the model...
Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100%
------------------------------------------------------------------------------------------------------------------------
Real-Time Ranking Performance  (Top-20 Item Recommendation)
*Current Performance*
Epoch: 3, Hit Ratio:0.03372  |  Precision:0.00169  |  Recall:0.03372  |  NDCG:0.01302
*Best Performance* 
Epoch: 3, Hit Ratio:0.03372  |  NDCG:0.01302
------------------------------------------------------------------------------------------------------------------------
training: 4 batch 50 batch_loss: 0.4821103513240814 rec_loss: 0.4299513101577759
Evaluating the model...
Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100%
------------------------------------------------------------------------------------------------------------------------
Real-Time Ranking Performance  (Top-20 Item Recommendation)
*Current Performance*
Epoch: 4, Hit Ratio:0.04212  |  Precision:0.00211  |  Recall:0.04212  |  NDCG:0.01622
*Best Performance* 
Epoch: 4, Hit Ratio:0.04212  |  NDCG:0.01622
------------------------------------------------------------------------------------------------------------------------
training: 5 batch 50 batch_loss: nan rec_loss: nan

Any ideas what could be causing this?

P.S. This is training with the amazon-beauty datasets, some of the other datasets don't load with this model.

Coder-Yu commented 1 year ago

This model has no official implementation released. I will maintain the code in next months.

Coder-Yu commented 1 year ago

Problem addressed.