chengkai-liu / Mamba4Rec

[RelKD'24] Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models
https://arxiv.org/abs/2403.03900
MIT License
77 stars 3 forks source link

On Memory Efficiency in Movie Lens 1-M #11

Open Leonardo16AM opened 3 months ago

Leonardo16AM commented 3 months ago

In the paper, the following table is presented:

| Method    | GPU memory | Training time | Inference time |
|-----------|-------------|---------------|----------------|
| SASRec    | 14.98GB     | 131.24s       | 0.27s          |
| BERT4Rec  | 15.48GB     | 207.60s       | 0.90s          |
| Mamba4Rec |  4.82GB     |  75.52s       | 0.14s          |

However, when I run the code (on Google Colab, with only 2 epochs), I get the following results:

Trainable parameters: 290944
07 Jul 22:31    INFO  FLOPs: 10126400.0
Train     0: 100%|███████████████████████| 480/480 [03:14<00:00,  2.47it/s, GPU RAM: 4.06 G/14.75 G]
07 Jul 22:35    INFO  epoch 0 training [time: 194.54s, train loss: 3305.3605]
Evaluate   : 100%|███████████████████████████| 2/2 [00:00<00:00,  4.52it/s, GPU RAM: 6.02 G/14.75 G]
07 Jul 22:35    INFO  epoch 0 evaluating [time: 0.53s, valid_score: 0.085300]
07 Jul 22:35    INFO  valid result: 
hit@10 : 0.1637    ndcg@10 : 0.0853    mrr@10 : 0.0616
07 Jul 22:35    INFO  Saving current: saved/Mamba4Rec-Jul-07-2024_22-31-48.pth
07 Jul 22:35    INFO  Loading model structure and parameters from saved/Mamba4Rec-Jul-07-2024_22-31-48.pth
Evaluate   : 100%|███████████████████████████| 2/2 [00:00<00:00,  5.48it/s, GPU RAM: 6.02 G/14.75 G]
07 Jul 22:35    INFO  The running environment of this training is as follows:
+-------------+----------------+
| Environment |     Usage      |
+=============+================+
| CPU         |     2.50 %     |
+-------------+----------------+
| GPU         | 6.02 G/14.75 G |
+-------------+----------------+
| Memory      | 6.13 G/12.67 G |
+-------------+----------------+

I think that at the end, the 6.02GB includes the model plus the size occupied by the test batch. But in the first epoch, without loading the test batch, it consumes 4.06GB, which is even lower than what is reported in the paper.

Are you using different hyperparameters than those provided in the repository? Is there something I might be overlooking?

Thank you in advance. : )