What is the minimum GPU memory?

shawngiese commented 1 year ago

Hi, everything installed and was built fine in my Win 11 (WSL2 Ubuntu 20.04) environment. However, when I went to load a single agent in self-play I get; 23:59:13 | Using CUDA 23:59:13 | loading dictionary from models/draw_classifier.dict 23:59:13 | num words = 50349 23:59:13 | BartClassifier: full interactive mode on. RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.15 GiB already allocated; 0 bytes free; 8.00 GiB allowed; 7.37 GiB reserved in total by PyTorch)

python run.py --adhoc --cfg conf/c01_ag_cmp/cmp.prototxt Iagent_one=agents/cicero.prototxt use_shared_agent=1 power_one=TURKEY

$ nvcc.exe --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0

I switched to torch 1.8 since it supported stuff like torch.cuda.set_per_process_memory_fraction(1.0, 0) where I could set a percentage of the memory to use... lowering it made no difference since I was apparently already below the required memory. Someone with a running system could lower theirs to 0.9, 0.7... to see when a memory error appears. That would be the minimum memory. I'll see if I can get my hands on a 12GB card to see if that is enough.

c-flaherty commented 1 year ago

Hm, I was usually using a 32gb gpu for testing. I can try some lower memory settings and get back to you about what works and what doesn't.

shawngiese commented 1 year ago

Thanks, I tried again with a 12GB GPU and it was not enough to run agents/cicero.prototxt with use_shared_agent=1, I might need to move that to the cloud if it needs more GPU memory. Otherwise:

agents/diplodocus_low >> runs fine
agents/base_strategy_model >> takes a little more than a minute
agents/diplodocus_high >> runs fine
agents/searchbot >> runs fine, finished in around 4 hours
agents/repro >> failed with TypeError: expected str, bytes or os.PathLike object, not FrozenReproAgent

Are there any other agents we can use? The docs mentioned the following agent but I could not find it:

agents/bqre1p_parlai_20220819_cicero_2.prototxt

Otherwise all the tests passed... 348 passed, 23 warnings in 27.47s

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          32048        3039       26948          72        2059       28543
Swap:         16384           0       16384

NOTE: If anyone gets a "Killed" error when using Windows WSL, that is probably because the default Windows WSL has a (configurable) limit to use half of your system RAM.

c-flaherty commented 1 year ago

agents/bqre1p_parlai_20220819_cicero_2.prototxt does not exist, so ignore that. It's a typo.

Maybe try moving to v100s or a100s on AWS or LambdaLabs?

facebookresearch / diplomacy_cicero

What is the minimum GPU memory? #13