Open shishijier opened 1 year ago
Make sure you have installed ninja, You can install it by conda install ninja
我也遇到了同样的问题,请问你现在有解决吗?
Make sure you have installed ninja, You can install it by
conda install ninja
I run ninja --version and the result is 1.11.1.git.kitware.jobserver-1
goto the /disk1/shisj/cache/torch_extensions/py38_cu113/utils/
directory, then compile utils.so manully with ninja
goto the
/disk1/shisj/cache/torch_extensions/py38_cu113/utils/
directory, then compile utils.so manully withninja
Nothing in this folder; PS, I reinstall ninja, and it worked! still don't know why
Is there an existing issue for this?
Current Behavior
Loading extension module utils... Traceback (most recent call last): File "main.py", line 431, in
main()
File "main.py", line 370, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/disk1/shisj/project/ChatGLM-6B/ptuning/trainer.py", line 1635, in train
return inner_training_loop(
File "/disk1/shisj/project/ChatGLM-6B/ptuning/trainer.py", line 1704, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeedengine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs)
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/init.py", line 165, in initialize
engine = DeepSpeedEngine(args=args,
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1167, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1398, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 154, in init
util_ops = UtilsBuilder().load()
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
return self.jit_load(verbose)
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1450, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/disk1/shisj/anaconda3/envs/glm/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1844, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1101, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /disk1/shisj/cache/torch_extensions/py38_cu113/utils/utils.so: cannot open shared object file: No such file or directory
多卡训练,显示找不到utils.so这个文件
Expected Behavior
No response
Steps To Reproduce
无
Environment
Anything else?
No response