CUDA out of memory error

RodriMora commented 3 months ago

Hi!

I'm getting a "CUDA out of memory error" but I'm trying to benchmark a small model with 4x3090 (96GB VRAM):

Error:

Start to evaluate qwen_15_7b_chat's close_freeform_hard split.

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.03s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading close-ended freeform hard data.
Sorting data based on input length.
Finished evaluating qwen_15_7b_chat's close_freeform_hard split. Used 8.58 minutes.

Start to evaluate qwen_15_7b_chat's close_multichoice_hard split.

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.01it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading close-ended multichoice hard data.
Sorting data based on input length.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 252, in <module>
    raise e
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 243, in <module>
    eval(args)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 235, in eval
    _eval(args)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 200, in _eval
    model.get_responses(batch, response_file)
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 212, in get_responses
    return self.get_closeended_responses(batch, response_file)
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 236, in get_closeended_responses
    responses = self.chunk_generate(
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 164, in chunk_generate
    outputs = model.generate(
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1736, in generate
    result = self._sample(
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2375, in _sample
    outputs = self(
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1163, in forward
    logits = logits.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.63 GiB. GPU 3 has a total capacty of 23.68 GiB of which 7.94 GiB is free. Including non-PyTorch memory, this process has 15.74 GiB memory in use. Of the allocated memory 14.25 GiB is allocated by PyTorch, and 1.18 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The command i'm using:

python -m mix_eval.evaluate \
    --model_name qwen_15_7b_chat \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --max_gpu_memory 96GiB \
    --output_dir mix_eval/data/model_responses/ \
    --api_parallel_num 20

System: EPYC 7402 512GB RAM 4x3090's

I believe it should be enough VRAM to bench?

Psycoy commented 3 months ago

You should set --max_gpu_memory to 5GiB. Which means the maximum memory to store the model weights.

RodriMora commented 3 months ago

Seems like I'm running into the same problem with 5GiB

 python -m mix_eval.evaluate \                               
                                          --model_name qwen_2_7b_instruct \
                                          --benchmark mixeval_hard \
                                          --version 2024-06-01 \
                                          --batch_size 20 \
                                          --max_gpu_memory 5GiB \
                                          --output_dir mix_eval/data/model_responses/ \
                                          --api_parallel_num 20

Start to evaluate qwen_2_7b_instruct's close_freeform_hard split.

Loading checkpoint shards: 100%|██████████████████████████████████████████| 4/4 [00:03<00:00,  1.02it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading close-ended freeform hard data.
Sorting data based on input length.
Finished evaluating qwen_2_7b_instruct's close_freeform_hard split. Used 7.72 minutes.

Start to evaluate qwen_2_7b_instruct's close_multichoice_hard split.

Loading checkpoint shards: 100%|██████████████████████████████████████████| 4/4 [00:03<00:00,  1.04it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading close-ended multichoice hard data.
Sorting data based on input length.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 252, in <module>
    raise e
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 243, in <module>
    eval(args)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 235, in eval
    _eval(args)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 200, in _eval
    model.get_responses(batch, response_file)
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 212, in get_responses
    return self.get_closeended_responses(batch, response_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 236, in get_closeended_responses
    responses = self.chunk_generate(
                ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 164, in chunk_generate
    outputs = model.generate(
              ^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/transformers/generation/utils.py", line 1736, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/transformers/generation/utils.py", line 2375, in _sample
    outputs = self(
              ^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/MixEval/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1163, in forward
    logits = logits.float()
             ^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.64 GiB. GPU 3 has a total capacty of 23.68 GiB of which 11.80 GiB is free. Including non-PyTorch memory, this process has 11.88 GiB memory in use. Of the allocated memory 10.99 GiB is allocated by PyTorch, and 598.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Psycoy commented 2 months ago

Try to set the --batch_size to 5 or 10. The 20 batch_size is tested on 4 A100 40G gpus.

RodriMora commented 2 months ago

Try to set the --batch_size to 5 or 10. The 20 batch_size is tested on 4 A100 40G gpus.

hi!, seems like it's working now with "10" using 4x3090's. Thanks!

To run the parser I believe I need and open ai key:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

But according to the readme "Open-source model parsers are also supported." How can I use an open-source model? Does that mean I can use a local model?

Psycoy commented 2 months ago

Hi, yes, currently we have only implemented the GPT-3.5-parser and you need an openai key to run the eval. It is faster and does not require gpu.

We will implement the open-source model parsers soon (probably using llama 3 8b or qwen2 7b).

Psycoy / MixEval

CUDA out of memory error #2