[DIPU]华为上设置DIPU_PYTHON_DEVICE_AS_CUDA=false报错module 'torch' has no attribute 'xpu'

douzizi commented 6 months ago

背景：使用dipu在华为昇腾910B上跑llama2推理，import torch_dipu 提示

dipu device will show as cuda device. if it's not expected behavior, please set env DIPU_PYTHON_DEVICE_AS_CUDA=false

若设置 export DIPU_PYTHON_DEVICE_AS_CUDA=false 则推理会出现报错：

AttributeError: module 'torch' has no attribute 'xpu'

恢复 export DIPU_PYTHON_DEVICE_AS_CUDA=true 则报错消失，详细报错如下：

[W OperatorEntry.cpp:153] Warning: Warning only once for all operators,  other operators may also be overrided.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::cat(Tensor[] tensors, int dim=0) -> Tensor
    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XLA
  previous kernel: registered at /pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1079
       new kernel: registered at /deeplink_main/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:3364 (function operator())
Thu May  9 09:03:37 2024 dipu | git hash:aea639af-dirty
No NVTX under your environment, ignore related API under this condition.
/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch_npu/dynamo/__init__.py:18: UserWarning: Register eager implementation for the 'npu' backend of dynamo, as torch_npu was not compiled with torchair.
  warnings.warn(
Loading checkpoint shards:   0%|                                                                                                                                                                                                   | 0/3 [00:00<?, ?it/s]/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.39s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
/deeplink_main/deeplink.framework/dipu/third_party/DIOPI/build/_deps/op_plugin-src/op_plugin/utils/op_api_common.h:GetOpApiLibHandler:103 [PTA]:"dlopen libcust_opapi.so failed, error:(null)."Traceback (most recent call last):
  File "jiutian-dipu.py", line 26, in <module>
    generate_ids  = model.generate(**generate_input)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/generation/utils.py", line 1575, in generate
    result = self._sample(
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/generation/utils.py", line 2697, in _sample
    outputs = self(
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1016, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 739, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 360, in forward
    cos, sin = self.rotary_emb(value_states, position_ids)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 140, in forward
    with torch.autocast(device_type=device_type, enabled=False):
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 208, in __init__
    self.fast_dtype = torch.xpu.get_autocast_xpu_dtype()  # type: ignore[attr-defined]
  File "/root/miniconda3/envs/pt/lib/python3.8/site-packages/torch/__init__.py", line 1833, in __getattr__
    raise AttributeError(f"module '{__name__}' has no attribute '{name}'")
AttributeError: module 'torch' has no attribute 'xpu'

wiryls commented 6 months ago

dipu device will show as cuda device 只是一个提示，表示当前 device 在 C++ 层会被视为 CUDA（需要复用相关逻辑）。总之不需要设置环境变量变更行为，变更了反而对不上。

lljbash commented 6 months ago

dipu device will show as cuda device 只是一个提示，表示当前 device 在 C++ 层会被视为 CUDA（需要复用相关逻辑）。总之不需要设置环境变量变更行为，变更了反而对不上。

先不要改环境变量，就能跑了。这个变量没有需求的情况下不要设置。

虽然正常跑不需要改这个变量，但是现在这个功能确实是坏掉了。@fandaoyi

DeepLink-org / deeplink.framework

[DIPU]华为上设置DIPU_PYTHON_DEVICE_AS_CUDA=false报错module 'torch' has no attribute 'xpu' #804