Train error - Cuda out of memory

Seokju-Cho / Volumetric-Aggregation-Transformer

Official Implementation of VAT

MIT License

153 stars 14 forks source link

Train error - Cuda out of memory #3

Closed moonsh closed 2 years ago

moonsh commented 2 years ago

Hi, First, thank you for sharing your work.

I tried the following command to start training. python train.py --config "config/pascal_resnet50/pascal_resnet50_fold0/config.yaml"

However, strangely training with base config file is not possible because of out of memory even though I use 3090 GPU.

I don't know the reason but somewhere takes lots of GPU memory. Training only started when I reduce batchsize to 4 from 16.

and..., This command doesn't work. Could you check ? conda env create -f environment.yaml

SunghwanHong commented 2 years ago

Hi, the memory issue you faced is actually exactly what you would experience if you trained our code right. As stated in our paper, we used 3090 GPU to train and reported the required memory, which if summed up, batch size of 4 is the maximum value for a single 3090. In practice, we used 4 3090 GPU for training, and this is why we set the batch size to 16.

We re-checked the command by following the instruction, but we did not face the same issue you experienced. We did not find any errors entering the particular command line. I don't think we can help you with this issue as we are not sure what could be the reason is.

moonsh commented 2 years ago

Yes. Thank you. All issues are resolved.