LincanLi98 / STG-Mamba

Official Implementation of STG-Mamba: Spatial-Temporal Graph Learning via Selective State Space Model.
144 stars 25 forks source link

CUDA out of memory #1

Open houchenfeng opened 4 months ago

houchenfeng commented 4 months ago

Impressive work!

While running the training code, I encountered a CUDA out of memory error.

Could you please advise on any settings that could reduce the memory requirements?

rginjapan commented 4 months ago

You can reduce the batch_size, default is 48.

leiershuai commented 4 months ago

I'm having the same issue, where the GPU memory keeps increasing until it exceeds it while training.

ohhhh2022 commented 4 months ago

me too. During the training process, the GPU memory will become higher and higher, and no matter what batchsize. Is there too much useless information during training that is not cleaned up in time?

LincanLi98 commented 4 months ago

Hi, I just run a quick test on a RTX 4090 (24GB) GPU. However, during the whole training session, I didn't encounter the problem you mentioned. The original work was trained on One A100 GPU. During the whole session, the monitored memory consumption is around 27%-37% of the total GPU memory. Here is the hardware which I just have tested on:  Cuda 11.3 GPU: RTX 4090(24GB) * 1 CPU: 12 vCPU Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz Memory: 90GB

Hope it can helps.

截屏2024-06-04 21 18 21
rginjapan commented 3 months ago

@flww213 @leiershuai @ohhhh2022 I also see the increasing VRAM during training... do you figure out the reason?

ReedGAOOO commented 2 months ago

动画1 the cost of graphic processer memory gets higher and higher, and finally my 8gb 4060 laptop stopped at gen 129/199 QAQ then I tried to use cloud gpu for training, without shared gpu memory,3090 24gb died at gen 135/199, only a tiny progress than my 4060 laptop with 8gb+16gb (shared memory)