MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.98k
stars
558
forks
source link
lora训练请教报错:ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory 是否环境版本问题 #326
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/subprocess.py", line 526, in run
rank0: raise CalledProcessError(retcode, process.args,
rank0: subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
rank0: The above exception was the direct cause of the following exception:
rank0: Traceback (most recent call last):
rank0: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in
rank0: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
rank0: return inner_training_loop(
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop
rank0: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
rank0: result = self._prepare_deepspeed(*args)
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed
rank0: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs)
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize
rank0: engine = DeepSpeedEngine(args=args,
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in initrank0: self._configure_optimizer(optimizer, model_parameters)
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer
rank0: basic_optimizer = self._configure_basic_optimizer(model_parameters)
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer
rank0: optimizer = FusedAdam(
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in initrank0: fused_adam_cuda = FusedAdamBuilder().load()
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load
rank0: return self.jit_load(verbose)
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load
rank0: op_module = load(name=self.name,
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load
rank0: return _jit_compile(
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
rank0: raise RuntimeError(message) from e
rank0: RuntimeError: Error building extension 'fused_adam'
Loading extension module fused_adam...
Loading extension module fused_adam...
rank2: Traceback (most recent call last):
rank2: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in
rank2: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
rank2: return inner_training_loop(
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop
rank2: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
rank2: result = self._prepare_deepspeed(*args)
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed
rank2: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs)
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize
rank2: engine = DeepSpeedEngine(args=args,
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in initrank2: self._configure_optimizer(optimizer, model_parameters)
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer
rank2: basic_optimizer = self._configure_basic_optimizer(model_parameters)
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer
rank2: optimizer = FusedAdam(
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in initrank2: fused_adam_cuda = FusedAdamBuilder().load()
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load
rank2: return self.jit_load(verbose)
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load
rank2: op_module = load(name=self.name,
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load
rank2: return _jit_compile(
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile
rank2: return _import_module_from_library(name, build_directory, is_python_module)
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library
rank2: module = importlib.util.module_from_spec(spec)
rank2: File "", line 571, in module_from_spec
rank2: File "", line 1176, in create_module
rank2: File "", line 241, in _call_with_frames_removed
rank2: ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
rank1: Traceback (most recent call last):
rank1: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in
rank1: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
rank1: return inner_training_loop(
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop
rank1: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
rank1: result = self._prepare_deepspeed(*args)
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed
rank1: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs)
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize
rank1: engine = DeepSpeedEngine(args=args,
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in initrank1: self._configure_optimizer(optimizer, model_parameters)
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer
rank1: basic_optimizer = self._configure_basic_optimizer(model_parameters)
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer
rank1: optimizer = FusedAdam(
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in initrank1: fused_adam_cuda = FusedAdamBuilder().load()
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load
rank1: return self.jit_load(verbose)
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load
rank1: op_module = load(name=self.name,
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load
rank1: return _jit_compile(
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile
rank1: return _import_module_from_library(name, build_directory, is_python_module)
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library
rank1: module = importlib.util.module_from_spec(spec)
rank1: File "", line 571, in module_from_spec
rank1: File "", line 1176, in create_module
rank1: File "", line 241, in _call_with_frames_removed
rank1: ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
目前环境:
python 3.10
cuda 12.2
torch 2.3.0
torchvision 0.18.0
该环境下,推理没有问题
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include/TH -isystem /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/include/THC -isystem /root/anaconda3/envs/guihun_doc_aigc/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o ninja: build stopped: subcommand failed. rank0: Traceback (most recent call last): rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/subprocess.py", line 526, in run rank0: raise CalledProcessError(retcode, process.args, rank0: subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
rank0: The above exception was the direct cause of the following exception:
rank0: Traceback (most recent call last): rank0: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in
rank0: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train rank0: return inner_training_loop( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop rank0: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare rank0: result = self._prepare_deepspeed(*args) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed rank0: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize rank0: engine = DeepSpeedEngine(args=args, rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init rank0: self._configure_optimizer(optimizer, model_parameters) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer rank0: basic_optimizer = self._configure_basic_optimizer(model_parameters) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer rank0: optimizer = FusedAdam( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init rank0: fused_adam_cuda = FusedAdamBuilder().load() rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load rank0: return self.jit_load(verbose) rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load rank0: op_module = load(name=self.name, rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load rank0: return _jit_compile( rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library
rank0: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build rank0: raise RuntimeError(message) from e rank0: RuntimeError: Error building extension 'fused_adam' Loading extension module fused_adam... Loading extension module fused_adam... rank2: Traceback (most recent call last): rank2: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in
rank2: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train
rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train rank2: return inner_training_loop( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop rank2: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare rank2: result = self._prepare_deepspeed(*args) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed rank2: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize rank2: engine = DeepSpeedEngine(args=args, rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init rank2: self._configure_optimizer(optimizer, model_parameters) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer rank2: basic_optimizer = self._configure_basic_optimizer(model_parameters) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer rank2: optimizer = FusedAdam( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init rank2: fused_adam_cuda = FusedAdamBuilder().load() rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load rank2: return self.jit_load(verbose) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load rank2: op_module = load(name=self.name, rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load rank2: return _jit_compile( rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile rank2: return _import_module_from_library(name, build_directory, is_python_module) rank2: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library rank2: module = importlib.util.module_from_spec(spec) rank2: File "", line 571, in module_from_spec
rank2: File "", line 1176, in create_module
rank2: File "", line 241, in _call_with_frames_removed
rank2: ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
rank1: Traceback (most recent call last):
rank1: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 328, in
rank1: File "/data/projects/train_GhostVsion/MiniCPM-V/finetune/finetune.py", line 318, in train
rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train rank1: return inner_training_loop( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/transformers/trainer.py", line 2045, in _inner_training_loop rank1: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare rank1: result = self._prepare_deepspeed(*args) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _preparedeepspeed rank1: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize rank1: engine = DeepSpeedEngine(args=args, rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init rank1: self._configure_optimizer(optimizer, model_parameters) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer rank1: basic_optimizer = self._configure_basic_optimizer(model_parameters) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer rank1: optimizer = FusedAdam( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init rank1: fused_adam_cuda = FusedAdamBuilder().load() rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load rank1: return self.jit_load(verbose) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load rank1: op_module = load(name=self.name, rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load rank1: return _jit_compile( rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile rank1: return _import_module_from_library(name, build_directory, is_python_module) rank1: File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library rank1: module = importlib.util.module_from_spec(spec) rank1: File "", line 571, in module_from_spec
rank1: File "", line 1176, in create_module
rank1: File "", line 241, in _call_with_frames_removed
rank1: ImportError: /root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
目前环境:
python 3.10
cuda 12.2
torch 2.3.0
torchvision 0.18.0
该环境下,推理没有问题