[BUG] <title> - Githubissues

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

Zero3 开了CPU offload后报错

Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/4] /home/gary/miniconda3/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/gary/miniconda3/lib/python3.12/site-packages/deepspeed/ops/csrc/includes -I/home/gary/miniconda3/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/TH -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/THC -isystem /home/gary/miniconda3/include -isystem /home/gary/miniconda3/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -DBF16_AVAILABLE -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT162_OPERATORS -c /home/gary/miniconda3/lib/python3.12/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o [2/4] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/gary/miniconda3/lib/python3.12/site-packages/deepspeed/ops/csrc/includes -I/home/gary/miniconda3/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/TH -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/THC -isystem /home/gary/miniconda3/include -isystem /home/gary/miniconda3/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/home/gary/miniconda3/lib -lcudart -lcublas -g -march=native -fopenmp -DAVX512 -DENABLE_CUDA -DBF16_AVAILABLE -c /home/gary/miniconda3/lib/python3.12/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o [3/4] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/gary/miniconda3/lib/python3.12/site-packages/deepspeed/ops/csrc/includes -I/home/gary/miniconda3/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/TH -isystem /home/gary/miniconda3/lib/python3.12/site-packages/torch/include/THC -isystem /home/gary/miniconda3/include -isystem /home/gary/miniconda3/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/home/gary/miniconda3/lib -lcudart -lcublas -g -march=native -fopenmp -DAVX512 -DENABLE_CUDA__ -DBF16_AVAILABLE -c /home/gary/miniconda3/lib/python3.12/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o [4/4] c++ cpu_adam.o cpu_adam_impl.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/gary/miniconda3/lib/python3.12/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/gary/miniconda3/lib -lcudart -o cpu_adam.so

Loading extension module cpu_adam...
Time to load cpu_adam op: 18.459045886993408 seconds
Parameter Offload: Total persistent parameters: 2639600 in 486 params
E0828 09:26:42.875000 140062897721728 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: -9) local_rank: 0 (pid: 62999) of binary:

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python: 3.10
- Transformers:
- PyTorch: 2.3
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
- 双卡NVIDIA 4090

备注 | Anything else?

No response

OpenBMB / MiniCPM-V

[BUG] <title> #530

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?