OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.98k stars 558 forks source link

lora训练请教报错:ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory 是否环境版本问题 #326

Closed orderer0001 closed 1 week ago

orderer0001 commented 1 week ago

[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include/TH -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include/THC -isystem /root/anaconda3/envs/guihun_doc_aigc/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o ninja: build stopped: subcommand failed. rank0: Traceback (most recent call last): rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build

rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/subprocess.py", line 526, in run rank0: raise CalledProcessError(retcode, process.args, rank0: subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

rank0: The above exception was the direct cause of the following exception:

rank0: Traceback (most recent call last): rank0: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in

rank0: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train

rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train rank0: return inner_training_loop( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop rank0: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare rank0: result = self._prepare_deepspeed(*args) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed rank0: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize rank0: engine = DeepSpeedEngine(args=args, rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init rank0: self._configure_optimizer(optimizer, model_parameters) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer rank0: basic_optimizer = self._configure_basic_optimizer(model_parameters) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer rank0: optimizer = FusedAdam( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init rank0: fused_adam_cuda = FusedAdamBuilder().load() rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load rank0: return self.jit_load(verbose) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load rank0: op_module = load(name=self.name, rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load rank0: return _jit_compile( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile

rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library

rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build rank0: raise RuntimeError(message) from e rank0: RuntimeError: Error building extension 'fused_adam' Loading extension module fused_adam... Loading extension module fused_adam... rank2: Traceback (most recent call last): rank2: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in

rank2: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train

rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train rank2: return inner_training_loop( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop rank2: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare rank2: result = self._prepare_deepspeed(*args) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed rank2: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize rank2: engine = DeepSpeedEngine(args=args, rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init rank2: self._configure_optimizer(optimizer, model_parameters) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer rank2: basic_optimizer = self._configure_basic_optimizer(model_parameters) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer rank2: optimizer = FusedAdam( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init rank2: fused_adam_cuda = FusedAdamBuilder().load() rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load rank2: return self.jit_load(verbose) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load rank2: op_module = load(name=self.name, rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load rank2: return _jit_compile( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile rank2: return _import_module_from_library(name, build_directory, is_python_module) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library rank2: module = importlib.util.module_from_spec(spec) rank2: File "", line 571, in module_from_spec rank2: File "", line 1176, in create_module rank2: File "", line 241, in _call_with_frames_removed rank2: ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory rank1: Traceback (most recent call last): rank1: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in

rank1: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train

rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train rank1: return inner_training_loop( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop rank1: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare rank1: result = self._prepare_deepspeed(*args) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed rank1: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize rank1: engine = DeepSpeedEngine(args=args, rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init rank1: self._configure_optimizer(optimizer, model_parameters) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer rank1: basic_optimizer = self._configure_basic_optimizer(model_parameters) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer rank1: optimizer = FusedAdam( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init rank1: fused_adam_cuda = FusedAdamBuilder().load() rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load rank1: return self.jit_load(verbose) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load rank1: op_module = load(name=self.name, rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load rank1: return _jit_compile( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile rank1: return _import_module_from_library(name, build_directory, is_python_module) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library rank1: module = importlib.util.module_from_spec(spec) rank1: File "", line 571, in module_from_spec rank1: File "", line 1176, in create_module rank1: File "", line 241, in _call_with_frames_removed rank1: ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory 目前环境: python 3.10 cuda 12.2 torch 2.3.0 torchvision 0.18.0 该环境下,推理没有问题