NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.03k stars 2.51k forks source link

When converting a checkpoint from Hugging Face, the checkpoint format conversion keeps getting CUDA out of memory #10679

Open changg10 opened 1 month ago

changg10 commented 1 month ago
python3 /opt/NeMo/scripts/checkpoint_converters/convert_llava_hf_to_nemo.py \
    --input_name_or_path llava-hf/llava-1.5-7b-hf \
    --output_path /workspace/checkpoints/llava-7b.nemo \
    --tokenizer_path /workspace/checkpoints/vicuna-7b-v1.5/tokenizer_neva.model

Keep reporting errors

[NeMo I 2024-09-30 05:28:51 convert_llava_hf_to_nemo:288] Running verifications ['query: how much protein should a female eat'] ...
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/NeMo/scripts/checkpoint_converters/convert_llava_hf_to_nemo.py", line 331, in <module>
[rank0]:     convert(args)
[rank0]:   File "/opt/NeMo/scripts/checkpoint_converters/convert_llava_hf_to_nemo.py", line 295, in convert
[rank0]:     model = model.cuda().eval()
[rank0]:   File "/opt/NeMo/nemo/core/classes/modelPT.py", line 1963, in cuda
[rank0]:     return super().cuda(device=device)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 76, in cuda
[rank0]:     return super().cuda(device=device)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 911, in cuda
[rank0]:     return self._apply(lambda t: t.cuda(device))
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 802, in _apply
[rank0]:     module._apply(fn)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 802, in _apply
[rank0]:     module._apply(fn)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 802, in _apply
[rank0]:     module._apply(fn)
[rank0]:   [Previous line repeated 3 more times]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 825, in _apply
[rank0]:     param_applied = fn(param)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 911, in <lambda>
[rank0]:     return self._apply(lambda t: t.cuda(device))
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 47.29 GiB of which 169.00 MiB is free. Process 573844 has 46.78 GiB memory in use. Of the allocated memory 46.42 GiB is allocated by PyTorch, and 17.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I am using two NVIDIA RTX 5880 48G GPUs, each with 32GB of memory. When I use only one GPU, the memory is fully utilized. However, when I use both GPUs, only one GPU's memory is fully utilized, while the other GPU seems to be underutilized. Why is this happening? Additionally, why does data type conversion consume so much GPU memory?

changg10 commented 1 month ago

And I often have this warning, please how to solve it zarr distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (torch_dist).

github-actions[bot] commented 5 days ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.