Ascend / pytorch

Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
https://ascend.github.io/docs/
Other
263 stars 15 forks source link

您好,使用torch_npu时报错 #7

Open HankLiu10 opened 1 year ago

HankLiu10 commented 1 year ago

torch版本是2.1.0,驱动版本也核对无误7.0.RC1.alpha003, python3.8 上周初可以运行,这周发现如下报错:

$ python Python 3.8.18 (default, Sep 11 2023, 13:19:25) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import torch import torc>>> import torch_npu x = torch.randn(2,2).npu() y = torch.randn(2,2).npu() print(x) EH9999: Inner Error! EH9999 [Init][Env]init env failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] TraceBack (most recent call last): build op model failed, result = 500001[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]

Aborted (core dumped)

请问如何解决,谢谢!

sunchuhan-930 commented 1 year ago

这个问题是依赖安装不全导致的 参考下列依赖 numpy>=1.19.2 decorator>=4.4.0 sympy>=1.5.1 cffi>=1.12.3 protobuf>=3.13.0 attrs、pyyaml、pathlib2、scipy、requests、psutil、absl-py

Yikun commented 1 year ago
  1. 文档:考虑下把安装依赖写到readme?
  2. 长远看,异常提示优化:如果是必选依赖,考虑import,import失败则raise exception?
    try:
        import XXX
    except ImportError as error:
         raise ImportError("xxx")
liyifango commented 1 year ago

额,求助。 torch版本是2.1.0,驱动版本7.0.RC1.alpha003, python 3.10.12, 按上面下载依赖后报错误:

>>> import torch_npu
Traceback (most recent call last):
  File "/root/workspaces/mamba/envs/cann/lib/python3.10/site-packages/torch_npu/__init__.py", line 14, in <module>
    import torch_npu.npu
  File "/root/workspaces/mamba/envs/cann/lib/python3.10/site-packages/torch_npu/npu/__init__.py", line 106, in <module>
    from .utils import (synchronize, device_count, can_device_access_peer, set_device, current_device, get_device_name,
  File "/root/workspaces/mamba/envs/cann/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 10, in <module>
    import torch_npu._C
ImportError: /root/workspaces/mamba/envs/cann/bin/../lib/libgomp.so.1: cannot allocate memory in static TLS block

是啥原因?多谢

-- 貌似解决了。 重新配置了下python环境,安装了依赖:

mamba create --name cann python=3.10.12
mamba activate cann
mamba install pyyaml setuptools wheel typing_extensions numpy protobuf attrs pathlib2 scipy requests psutil absl-py decorator

pip3 install torch==2.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install torch-npu==2.1.0rc1 -i https://pypi.tuna.tsinghua.edu.cn/simple

然后可以输出了

tensor([[1.0919, 0.1245],
        [0.4220, 0.1443]], device='npu:0')

不过过程中会报一大堆警告

<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/context/op_context.py:38: DeprecationWarning: currentThread() is deprecated, use current_thread() instead
  return _contexts.setdefault(threading.currentThread().ident, [])
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:671: ImportWarning: TEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
>>> <frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()

Warning: Device do not support double dtype now, dtype cast repalce with float.

还不清楚是否对后续模型训练 执行有无影响。

sunchuhan-930 commented 1 year ago

后续模型训练是否产生影响呢

sunchuhan-930 commented 1 year ago

https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/envdeployment/instg/instg_0026.html 这个链接上有安装前的准备工作描述

HankLiu10 commented 1 year ago

此问题已解决,最终选择严格按照内核版本重装系统解决,谢谢

anaivebird commented 1 year ago

此问题已解决,最终选择严格按照内核版本重装系统解决,谢谢

您好,是怎么解决的呢,能否分享一下,谢谢

HankLiu10 commented 1 year ago

此问题已解决,最终选择严格按照内核版本重装系统解决,谢谢

您好,是怎么解决的呢,能否分享一下,谢谢

我于提出此issue后重装了系统,此后未遇到该issue,而是遇到了其他问题,通过在终端输入uname -r发现系统内核版本与支持的版本不一致,最终通过重装内核版本一致的系统解决。

anaivebird commented 1 year ago

此问题已解决,最终选择严格按照内核版本重装系统解决,谢谢

您好,是怎么解决的呢,能否分享一下,谢谢

我于提出此issue后重装了系统,此后未遇到该issue,而是遇到了其他问题,通过在终端输入uname -r发现系统内核版本与支持的版本不一致,最终通过重装内核版本一致的系统解决。

谢谢

pengchengpi commented 6 months ago

额,求助。 torch版本是2.1.0,驱动版本7.0.RC1.alpha003, python 3.10.12, 按上面下载依赖后报错误:

>>> import torch_npu
Traceback (most recent call last):
  File "/root/workspaces/mamba/envs/cann/lib/python3.10/site-packages/torch_npu/__init__.py", line 14, in <module>
    import torch_npu.npu
  File "/root/workspaces/mamba/envs/cann/lib/python3.10/site-packages/torch_npu/npu/__init__.py", line 106, in <module>
    from .utils import (synchronize, device_count, can_device_access_peer, set_device, current_device, get_device_name,
  File "/root/workspaces/mamba/envs/cann/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 10, in <module>
    import torch_npu._C
ImportError: /root/workspaces/mamba/envs/cann/bin/../lib/libgomp.so.1: cannot allocate memory in static TLS block

是啥原因?多谢

-- 貌似解决了。 重新配置了下python环境,安装了依赖:

mamba create --name cann python=3.10.12
mamba activate cann
mamba install pyyaml setuptools wheel typing_extensions numpy protobuf attrs pathlib2 scipy requests psutil absl-py decorator

pip3 install torch==2.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install torch-npu==2.1.0rc1 -i https://pypi.tuna.tsinghua.edu.cn/simple

然后可以输出了

tensor([[1.0919, 0.1245],
        [0.4220, 0.1443]], device='npu:0')

不过过程中会报一大堆警告

<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/context/op_context.py:38: DeprecationWarning: currentThread() is deprecated, use current_thread() instead
  return _contexts.setdefault(threading.currentThread().ident, [])
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:671: ImportWarning: TEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
>>> <frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()

Warning: Device do not support double dtype now, dtype cast repalce with float.

还不清楚是否对后续模型训练 执行有无影响。

我把Python指定成3.9就没有这些warning了