Closed shawngiese closed 1 year ago
Hm, I was usually using a 32gb gpu for testing. I can try some lower memory settings and get back to you about what works and what doesn't.
Thanks, I tried again with a 12GB GPU and it was not enough to run agents/cicero.prototxt with use_shared_agent=1, I might need to move that to the cloud if it needs more GPU memory. Otherwise:
Are there any other agents we can use? The docs mentioned the following agent but I could not find it:
agents/bqre1p_parlai_20220819_cicero_2.prototxt
Otherwise all the tests passed... 348 passed, 23 warnings in 27.47s
$ free -m
total used free shared buff/cache available
Mem: 32048 3039 26948 72 2059 28543
Swap: 16384 0 16384
NOTE: If anyone gets a "Killed" error when using Windows WSL, that is probably because the default Windows WSL has a (configurable) limit to use half of your system RAM.
agents/bqre1p_parlai_20220819_cicero_2.prototxt does not exist, so ignore that. It's a typo.
Maybe try moving to v100s or a100s on AWS or LambdaLabs?
Hi, everything installed and was built fine in my Win 11 (WSL2 Ubuntu 20.04) environment. However, when I went to load a single agent in self-play I get; 23:59:13 | Using CUDA 23:59:13 | loading dictionary from models/draw_classifier.dict 23:59:13 | num words = 50349 23:59:13 | BartClassifier: full interactive mode on. RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.15 GiB already allocated; 0 bytes free; 8.00 GiB allowed; 7.37 GiB reserved in total by PyTorch)
python run.py --adhoc --cfg conf/c01_ag_cmp/cmp.prototxt Iagent_one=agents/cicero.prototxt use_shared_agent=1 power_one=TURKEY
$ nvcc.exe --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0
I switched to torch 1.8 since it supported stuff like torch.cuda.set_per_process_memory_fraction(1.0, 0) where I could set a percentage of the memory to use... lowering it made no difference since I was apparently already below the required memory. Someone with a running system could lower theirs to 0.9, 0.7... to see when a memory error appears. That would be the minimum memory. I'll see if I can get my hands on a 12GB card to see if that is enough.