Bug: Can't force blernderbot2 to use only 1 GPU

getorca commented 1 year ago

Running the following command parlai interactive -mf zoo:blenderbot2/blenderbot2_3B/model --search-server relevant_search_server --gpu 0 Results in the following error:

00:22:51 | Building Memory Decoder from file: /home/[user]/ParlAI/data/models/blenderbot2/memory_decoder/model
Traceback (most recent call last):
  File "/home/[user]/ParlAI/venv/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/home/[user]/ParlAI/parlai/__main__.py", line 14, in main
    superscript_main()
  File "/home/[user]/ParlAI/parlai/core/script.py", line 325, in superscript_main
    return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/home/[user]/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "/home/[user]/ParlAI/parlai/scripts/interactive.py", line 118, in run
    return interactive(self.opt)
  File "/home/[user]/ParlAI/parlai/scripts/interactive.py", line 84, in interactive
    agent = create_agent(opt, requireModelExists=True)
  File "/home/[user]/ParlAI/parlai/core/agents.py", line 468, in create_agent
    model = create_agent_from_opt_file(opt)
  File "/home/[user]/ParlAI/parlai/core/agents.py", line 421, in create_agent_from_opt_file
    return model_class(opt_from_file)
  File "/home/[user]/ParlAI/parlai/agents/rag/rag.py", line 186, in __init__
    self._generation_agent.__init__(self, opt, shared)  # type: ignore
  File "/home/[user]/ParlAI/parlai/core/torch_generator_agent.py", line 537, in __init__
    self.model = ph.make_parallel(self.model)
  File "/home/[user]/ParlAI/parlai/utils/torch.py", line 370, in make_parallel
    model.apply(self._place_modulelist)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 668, in apply
    module.apply(fn)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 668, in apply
    module.apply(fn)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 669, in apply
    fn(self)
  File "/home/[user]/ParlAI/parlai/utils/torch.py", line 418, in _place_modulelist
    layers[layer_no] = layer.to(layer_gpu)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/home/[user]/ParlAI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 1; 2.94 GiB total capacity; 2.32 GiB already allocated; 37.56 MiB free; 2.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Some context:

$ nvidia-smi
Sat Jan  7 00:29:37 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      Off  | 00000000:25:00.0 Off |                  416 |
| N/A   40C    P8    14W / 250W |      0MiB / 23040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:26:00.0 Off |                  N/A |
| 26%   34C    P8     7W / 120W |      2MiB /  3072MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

It appears https://github.com/facebookresearch/ParlAI/blob/main/parlai/utils/torch.py is trying to paralise some work on GPUs even when a single GPU is specified. I believe the correct behaviour should be only using 1 gpu.

klshuster commented 1 year ago

You should specify --model-parallel False to force it to only use one GPU

getorca commented 1 year ago

Thanks, marking this closed.

facebookresearch / ParlAI

Bug: Can't force blernderbot2 to use only 1 GPU #4930