OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.82k stars 829 forks source link

RuntimeError: CUDA error: invalid device ordinal #356

Closed mwaikul closed 1 month ago

mwaikul commented 1 month ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

Trying to fine tune a model using sample code provided finetune_lora.sh, finetune.py, dataset.py, trainer.py provided in your github repository.

I have set export CUDA_VISIBLE_DEVICES=0,1

root@584a1774aaad:/# nvidia-smi
Sat Jul 20 21:07:53 2024
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-SXM4-80GB On | 00000000:0A:00.0 Off | 0 | | N/A 32C P0 61W / 400W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-SXM4-80GB On | 00000000:44:00.0 Off | 0 | | N/A 33C P0 61W / 400W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ root@584a1774aaad:/# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0 root@584a1774aaad:/#

root@584a1774aaad:/workspace/minicpm/finetune# ./finetune_lora.sh W0720 21:08:27.097000 139678581792768 torch/distributed/run.py:757] W0720 21:08:27.097000 139678581792768 torch/distributed/run.py:757] W0720 21:08:27.097000 139678581792768 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0720 21:08:27.097000 139678581792768 torch/distributed/run.py:757] [2024-07-20 21:08:35,215] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,216] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,233] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,233] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,234] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,239] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,270] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-20 21:08:35,270] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) df: /root/.triton/autotune: No such file or directory [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( /workspace/pypacks/transformers/training_args.py:1494: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( [2024-07-20 21:08:49,356] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,356] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,357] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,357] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,358] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,358] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,358] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-20 21:08:49,358] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-07-20 21:08:49,358] [INFO] [comm.py:637:init_distributed] cdb=None rank5: Traceback (most recent call last): rank5: File "/workspace/minicpm/finetune/finetune.py", line 281, in

rank5: File "/workspace/minicpm/finetune/finetune.py", line 162, in train rank5: ) = parser.parse_args_into_dataclasses() rank5: File "/workspace/pypacks/transformers/hf_argparser.py", line 339, in parse_args_into_dataclasses rank5: obj = dtype(**inputs) rank5: File "", line 136, in init rank5: File "/workspace/pypacks/transformers/training_args.py", line 1693, in __post_init__

rank5: File "/workspace/pypacks/transformers/training_args.py", line 2171, in device rank5: return self._setup_devices rank5: File "/workspace/pypacks/transformers/utils/generic.py", line 60, in get rank5: cached = self.fget(obj) rank5: File "/workspace/pypacks/transformers/training_args.py", line 2108, in _setup_devices rank5: self.distributed_state = PartialState(**accelerator_state_kwargs) rank5: File "/workspace/pypacks/accelerate/state.py", line 280, in init

rank5: File "/workspace/pypacks/accelerate/state.py", line 790, in set_device

rank5: File "/workspace/pypacks/torch/cuda/init.py", line 399, in set_device

rank5: RuntimeError: CUDA error: invalid device ordinal rank5: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank5: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. rank5: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

rank3: Traceback (most recent call last): rank3: File "/workspace/minicpm/finetune/finetune.py", line 281, in

rank3: File "/workspace/minicpm/finetune/finetune.py", line 162, in train rank3: ) = parser.parse_args_into_dataclasses() rank3: File "/workspace/pypacks/transformers/hf_argparser.py", line 339, in parse_args_into_dataclasses rank3: obj = dtype(**inputs) rank3: File "", line 136, in init rank3: File "/workspace/pypacks/transformers/training_args.py", line 1693, in __post_init__

rank3: File "/workspace/pypacks/transformers/training_args.py", line 2171, in device rank3: return self._setup_devices rank3: File "/workspace/pypacks/transformers/utils/generic.py", line 60, in get rank3: cached = self.fget(obj) rank3: File "/workspace/pypacks/transformers/training_args.py", line 2108, in _setup_devices rank3: self.distributed_state = PartialState(**accelerator_state_kwargs) rank3: File "/workspace/pypacks/accelerate/state.py", line 280, in init

rank3: File "/workspace/pypacks/accelerate/state.py", line 790, in set_device

rank3: File "/workspace/pypacks/torch/cuda/init.py", line 399, in set_device

rank3: RuntimeError: CUDA error: invalid device ordinal rank3: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank3: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. rank3: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

rank6: Traceback (most recent call last): rank6: File "/workspace/minicpm/finetune/finetune.py", line 281, in

rank6: File "/workspace/minicpm/finetune/finetune.py", line 162, in train rank6: ) = parser.parse_args_into_dataclasses() rank6: File "/workspace/pypacks/transformers/hf_argparser.py", line 339, in parse_args_into_dataclasses rank6: obj = dtype(**inputs) rank6: File "", line 136, in init rank6: File "/workspace/pypacks/transformers/training_args.py", line 1693, in __post_init__

rank6: File "/workspace/pypacks/transformers/training_args.py", line 2171, in device rank6: return self._setup_devices rank6: File "/workspace/pypacks/transformers/utils/generic.py", line 60, in get rank6: cached = self.fget(obj) rank6: File "/workspace/pypacks/transformers/training_args.py", line 2108, in _setup_devices rank6: self.distributed_state = PartialState(**accelerator_state_kwargs) rank6: File "/workspace/pypacks/accelerate/state.py", line 280, in init

rank6: File "/workspace/pypacks/accelerate/state.py", line 790, in set_device

rank6: File "/workspace/pypacks/torch/cuda/init.py", line 399, in set_device

rank6: RuntimeError: CUDA error: invalid device ordinal rank6: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank6: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. rank6: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

rank2: Traceback (most recent call last): rank2: File "/workspace/minicpm/finetune/finetune.py", line 281, in

rank2: File "/workspace/minicpm/finetune/finetune.py", line 162, in train rank2: ) = parser.parse_args_into_dataclasses() rank2: File "/workspace/pypacks/transformers/hf_argparser.py", line 339, in parse_args_into_dataclasses rank2: obj = dtype(**inputs) rank2: File "", line 136, in init rank2: File "/workspace/pypacks/transformers/training_args.py", line 1693, in __post_init__

rank2: File "/workspace/pypacks/transformers/training_args.py", line 2171, in device rank2: return self._setup_devices rank2: File "/workspace/pypacks/transformers/utils/generic.py", line 60, in get rank2: cached = self.fget(obj) rank2: File "/workspace/pypacks/transformers/training_args.py", line 2108, in _setup_devices rank2: self.distributed_state = PartialState(**accelerator_state_kwargs) rank2: File "/workspace/pypacks/accelerate/state.py", line 280, in init

rank2: File "/workspace/pypacks/accelerate/state.py", line 790, in set_device

rank2: File "/workspace/pypacks/torch/cuda/init.py", line 399, in set_device

rank2: RuntimeError: CUDA error: invalid device ordinal rank2: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank2: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. rank2: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

rank4: Traceback (most recent call last): rank4: File "/workspace/minicpm/finetune/finetune.py", line 281, in

rank4: File "/workspace/minicpm/finetune/finetune.py", line 162, in train rank4: ) = parser.parse_args_into_dataclasses() rank4: File "/workspace/pypacks/transformers/hf_argparser.py", line 339, in parse_args_into_dataclasses rank4: obj = dtype(**inputs) rank4: File "", line 136, in init rank4: File "/workspace/pypacks/transformers/training_args.py", line 1693, in __post_init__

rank4: File "/workspace/pypacks/transformers/training_args.py", line 2171, in device rank4: return self._setup_devices rank4: File "/workspace/pypacks/transformers/utils/generic.py", line 60, in get rank4: cached = self.fget(obj) rank4: File "/workspace/pypacks/transformers/training_args.py", line 2108, in _setup_devices rank4: self.distributed_state = PartialState(**accelerator_state_kwargs) rank4: File "/workspace/pypacks/accelerate/state.py", line 280, in init

rank4: File "/workspace/pypacks/accelerate/state.py", line 790, in set_device

rank4: File "/workspace/pypacks/torch/cuda/init.py", line 399, in set_device

rank4: RuntimeError: CUDA error: invalid device ordinal rank4: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank4: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. rank4: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

rank7: Traceback (most recent call last): rank7: File "/workspace/minicpm/finetune/finetune.py", line 281, in

rank7: File "/workspace/minicpm/finetune/finetune.py", line 162, in train rank7: ) = parser.parse_args_into_dataclasses() rank7: File "/workspace/pypacks/transformers/hf_argparser.py", line 339, in parse_args_into_dataclasses rank7: obj = dtype(**inputs) rank7: File "", line 136, in init rank7: File "/workspace/pypacks/transformers/training_args.py", line 1693, in __post_init__

rank7: File "/workspace/pypacks/transformers/training_args.py", line 2171, in device rank7: return self._setup_devices rank7: File "/workspace/pypacks/transformers/utils/generic.py", line 60, in get rank7: cached = self.fget(obj) rank7: File "/workspace/pypacks/transformers/training_args.py", line 2108, in _setup_devices rank7: self.distributed_state = PartialState(**accelerator_state_kwargs) rank7: File "/workspace/pypacks/accelerate/state.py", line 280, in init

rank7: File "/workspace/pypacks/accelerate/state.py", line 790, in set_device

rank7: File "/workspace/pypacks/torch/cuda/init.py", line 399, in set_device

rank7: RuntimeError: CUDA error: invalid device ordinal rank7: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank7: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. rank7: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

config.json: 100%|████████████████████████████████████████████████████████| 1.37k/1.37k [00:00<00:00, 6.74MB/s] configuration_minicpm.py: 100%|███████████████████████████████████████████| 4.06k/4.06k [00:00<00:00, 20.8MB/s] A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5:

期望行为 | Expected Behavior

Finetuning to not have CUDA errors

复现方法 | Steps To Reproduce

Take a few rows and images per the vl_finetune_data.json and try to run ./finetune_lora.sh and run on unbuntu

运行环境 | Environment

- OS:ubuntu 22.04
- Python:3.10.12 
- Transformers:4.42.4
- PyTorch:2.3.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

LDLINGLINGLING commented 1 month ago

Because I see that your machine only has two graphics cards, is this how GPUS_PER_NODE=2 is written in your fine-tuned bash script?