SoccerNet / sn-spotting

Repository containing all necessary codes to get started on the SoccerNet Action Spotting challenge. This repository also contains several benchmark methods.
61 stars 8 forks source link

CALF model training error ! #6

Closed Tortoise17 closed 2 years ago

Tortoise17 commented 2 years ago

@SilvioGiancola I want to ask that the CALF model training is throwing error with


RuntimeError: CUDA out of memory. Tried to allocate 7.38 GiB (GPU 0; 10.76 GiB total capacity; 7.45 GiB already allocated; 2.19 GiB free; 7.45 GiB reserved in total by PyTorch)
Exception in thread Thread-2:

while there is no any of error with Temporary pooling model NetVLAD++

Can you or any friend guide me why is it so. I tried to free cache but didn't work. I used smaller batch_size and also failed to resolve so. If you can kindly help.

SilvioGiancola commented 2 years ago

Hi @Tortoise17 , I am not sure why you get this error, as far as I remember, we were not using fancy GPU with memory >11GB, most probably a GTX1080Ti or a 2080. When do you get this error? Is it at training time or testing time? It looks like the memory of your GPU is already allocated when it throws this error, are you running 2 trainings in the same time? Maybe this implementation does not release the memory after the training, so I would advice you first train only, and then test only.

Tortoise17 commented 2 years ago

@SilvioGiancola this is right when I start training. Can you check once at your end? The training is single time. Should I mention the the flag --train_only ?

SilvioGiancola commented 2 years ago

I have personally checked all those methods when we published this repository. Can you maybe please try with the flag --train_only?

Tortoise17 commented 2 years ago

@SilvioGiancola There is no flag option train_only .

SilvioGiancola commented 2 years ago

@Tortoise17 I was able to run a training following the instructions here using as little as 1GB of RAM for training and evaluation.

python src/main.py --SoccerNet_path=path/to/SoccerNet/ \
--features=ResNET_TF2_PCA512.npy \
--num_features=512 \
--model_name=CALF_v2 \
--batch_size 32 \
--evaluation_frequency 20 \
--chunks_per_epoch 18000

If you ran into a CUDA out of memory error, it means you were doing something different, maybe using a different set of features, or a different set of data. If that is the case, then you need a larger GPU, I cannot do anything for you.