Open Debouter opened 1 year ago
Hi,
Would you post your full error message? I do not have this problem.
Here is the whole stack trace. Btw, could u please tell me the versions of GCC and Ninja you use?
Using /mnt/petrelfs/klk/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/klk/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/petrelfs/klk/.cache/torch_extensions/py310_cu118/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Traceback (most recent call last):
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/petrelfs/klk/gdGPT/demo.py", line 64, in <module>
res = infer_with_deepspeed(model_name, prompt)
File "/mnt/petrelfs/klk/gdGPT/demo.py", line 40, in infer_with_deepspeed
model.model = deepspeed.init_inference(model.model, config=infer_config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/__init__.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 192, in __init__
self._apply_injection_policy(config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 426, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 523, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 766, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 847, in _replace_module
_, layer_id = _replace_module(child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 847, in _replace_module
_, layer_id = _replace_module(child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 823, in _replace_module
Loading extension module transformer_inference...replaced_module = policies[child.__class__][0](child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 500, in replace_fn
Traceback (most recent call last):
File "/mnt/petrelfs/klk/gdGPT/demo.py", line 64, in <module>
new_module = replace_with_policy(child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 348, in replace_with_policy
res = infer_with_deepspeed(model_name, prompt)
File "/mnt/petrelfs/klk/gdGPT/demo.py", line 40, in infer_with_deepspeed
_container.create_module()
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/containers/bloom.py", line 30, in create_module
model.model = deepspeed.init_inference(model.model, config=infer_config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/__init__.py", line 342, in init_inference
self.module = DeepSpeedBloomInference(_config, mp_group=self.mp_group)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_bloom.py", line 20, in __init__
engine = InferenceEngine(model, config=ds_inference_config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 192, in __init__
super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__
self._apply_injection_policy(config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 426, in _apply_injection_policy
inference_module = builder.load()
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 523, in replace_transformer_layer
return self.jit_load(verbose)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load
replaced_module = replace_module(model=model,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 766, in replace_module
op_module = load(name=self.name,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 847, in _replace_module
return _jit_compile(
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_, layer_id = _replace_module(child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 847, in _replace_module
_write_ninja_file_and_build_library(
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_, layer_id = _replace_module(child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 823, in _replace_module
_run_ninja_build(
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
replaced_module = policies[child.__class__][0](child,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 500, in replace_fn
raise RuntimeError(message) from e
RuntimeErrornew_module = replace_with_policy(child,: Error building extension 'transformer_inference'
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 348, in replace_with_policy
_container.create_module()
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/module_inject/containers/bloom.py", line 30, in create_module
self.module = DeepSpeedBloomInference(_config, mp_group=self.mp_group)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_bloom.py", line 20, in __init__
super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__
inference_module = builder.load()
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load
return self.jit_load(verbose)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load
op_module = load(name=self.name,
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1176, in create_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /mnt/petrelfs/klk/.cache/torch_extensions/py310_cu118/transformer_inference/transformer_inference.so: cannot open shared object file: No such file or directory
Hi,
the output of running ninja --version
on my machine is :
1.11.1.git.kitware.jobserver-1
and the output of running gcc -v
on my machine is:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
Would you rm -rf /mnt/petrelfs/klk/.cache/torch_extensions/py310_cu118
and try again?
Well, I have fixed it by adjusting the version of gcc to match yours, removing the file u mentioned above, and setting export TORCH_EXTENSIONS_DIR=/tmp
according to https://github.com/microsoft/DeepSpeed/issues/3356.
Though similar problems occasionally occur during other installations, it works fine in this repo. Anyway, thanks a lot!
Hi~ 我在运行demo.py时出现了以下Error:
我初步认为这是ninja -v指令执行存在问题,导致共享目标文件transformer_inference.so没有生成。
我已经尝试了网上解决
Command '['ninja', '-v']' returned non-zero exit status 1
的各种方法,例如安装或禁用ninja库、降低pytorch版本等,但都无法解决这个问题。我使用的环境如下:
请问你是否遇到过这个问题?如果没有的话可否分享一下你的transformer_inference.so文件,该文件大概在路径/.cache/torch_extensions/pyXX_cuXX/transformer_inference处。
谢谢!