Running the following command parlai interactive -mf zoo:blenderbot2/blenderbot2_3B/model --search-server relevant_search_server --gpu 0 Results in the following error:
00:22:51 | Building Memory Decoder from file: /home/[user]/ParlAI/data/models/blenderbot2/memory_decoder/model
Traceback (most recent call last):
File "/home/[user]/ParlAI/venv/bin/parlai", line 33, in <module>
sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
File "/home/[user]/ParlAI/parlai/__main__.py", line 14, in main
superscript_main()
File "/home/[user]/ParlAI/parlai/core/script.py", line 325, in superscript_main
return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
File "/home/[user]/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
return script.run()
File "/home/[user]/ParlAI/parlai/scripts/interactive.py", line 118, in run
return interactive(self.opt)
File "/home/[user]/ParlAI/parlai/scripts/interactive.py", line 84, in interactive
agent = create_agent(opt, requireModelExists=True)
File "/home/[user]/ParlAI/parlai/core/agents.py", line 468, in create_agent
model = create_agent_from_opt_file(opt)
File "/home/[user]/ParlAI/parlai/core/agents.py", line 421, in create_agent_from_opt_file
return model_class(opt_from_file)
File "/home/[user]/ParlAI/parlai/agents/rag/rag.py", line 186, in __init__
self._generation_agent.__init__(self, opt, shared) # type: ignore
File "/home/[user]/ParlAI/parlai/core/torch_generator_agent.py", line 537, in __init__
self.model = ph.make_parallel(self.model)
File "/home/[user]/ParlAI/parlai/utils/torch.py", line 370, in make_parallel
model.apply(self._place_modulelist)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 668, in apply
module.apply(fn)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 668, in apply
module.apply(fn)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 669, in apply
fn(self)
File "/home/[user]/ParlAI/parlai/utils/torch.py", line 418, in _place_modulelist
layers[layer_no] = layer.to(layer_gpu)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in to
return self._apply(convert)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 1; 2.94 GiB total capacity; 2.32 GiB already allocated; 37.56 MiB free; 2.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Some context:
$ nvidia-smi
Sat Jan 7 00:29:37 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M40 24GB Off | 00000000:25:00.0 Off | 416 |
| N/A 40C P8 14W / 250W | 0MiB / 23040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:26:00.0 Off | N/A |
| 26% 34C P8 7W / 120W | 2MiB / 3072MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Running the following command
parlai interactive -mf zoo:blenderbot2/blenderbot2_3B/model --search-server relevant_search_server --gpu 0
Results in the following error:Some context:
It appears https://github.com/facebookresearch/ParlAI/blob/main/parlai/utils/torch.py is trying to paralise some work on GPUs even when a single GPU is specified. I believe the correct behaviour should be only using 1 gpu.