SilvioGiancola / SoccerNetv2-DevKit

Development Kit for the SoccerNet Challenge
MIT License
169 stars 39 forks source link

Training NetVLAD++ process killed #28

Closed GriesserP closed 3 years ago

GriesserP commented 3 years ago

Hello, while executing: from the python src/main.py --SoccerNet_path=my/path/to/soccernet from the SoccerNetv2-DevKit/Task1-ActionSpotting/TemporallyAwarePooling folder, the process get an out of memory kill signal from the kernel. This doesn't happen with the reduced features i.e. with --features ResNET152_TF2_PCA512.npy. The signal is sent when trying to run the line 131 of the SoccerNetv2-DevKit/Task1-ActionSpotting/TemporallyAwarePooling/src/dataset.py which is called by the line 24 in SoccerNetv2-DevKit/Task1-ActionSpotting/TemporallyAwarePooling/src/main.py

Here is the log from my /var/log/syslog: Sep 21 11:26:03 MS-7B79 kernel: [440320.649437] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1004.slice/session-752.scope,task=python,pid=126284,uid=1004 Sep 21 11:26:03 MS-7B79 kernel: [440320.649479] Out of memory: Killed process 126284 (python) total-vm:55748164kB, anon-rss:31574096kB, file-rss:0kB, shmem-rss:4kB, UID:1004 pgtables:63392kB oom_score_adj:0 Sep 21 11:26:03 MS-7B79 kernel: [440321.292489] oom_reaper: reaped process 126284 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB

My config is: OS: Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-84-generic x86_64) Ram: 32GB GPU: NVIDIA GeForce RTX 2080

Maybe I simply don't have enough ram ? Do you have any suggestion ?

SilvioGiancola commented 3 years ago

Hi @GriesserP ,

I believe your issue is simply that you have not enough RAM, I was running with 60 or 90GB of RAM, I cannot recall exactly.

A first attempt to solve your issue would be for you to try reducing the batch_size to something smaller (--batch_size=256 by default).

Another reason for such high memory consumption may come from the dataloader, that pre-processes all games by extracting clips to train on (see that class). You could either read those clips in the __getitem__, but that will be extremely slow. Alternatively you could consider less games to train on.

I hope that helps, and sorry NetVLAD++ is that RAM-hungry :)

GriesserP commented 3 years ago

Thank you very much for your answer! I will try to bypass the issue with your suggestions