DeepLink-org / DeepLinkExt

BSD 3-Clause "New" or "Revised" License
11 stars 0 forks source link

找不到 from torch_dipu.profiler.ascend.ascend_profiler_merger import ( ModuleNotFoundError: No module named 'torch_dipu.profiler.ascend' #122

Open tangpanyu opened 1 month ago

tangpanyu commented 1 month ago
[W OperatorEntry.cpp:153] Warning: Warning only once for all operators,  other operators may also be overrided.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::cat(Tensor[] tensors, int dim=0) -> Tensor
    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XLA
  previous kernel: registered at /pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1079
       new kernel: registered at /home/HwHiAiUser/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:3427 (function operator())
dipu device will show as cuda device. if it's not expected behavior, please set env DIPU_PYTHON_DEVICE_AS_CUDA=false
Fri Jul 19 10:58:07 2024 dipu | git hash:3f68025a-dirty
No NVTX under your environment, ignore related API under this condition.
test_abs (unittest.loader._FailedTest) ... ERROR
test_adaptive_avg_pool2d (unittest.loader._FailedTest) ... ERROR
test_adaptive_avg_pool2d_backward (unittest.loader._FailedTest) ... ERROR
test_add (unittest.loader._FailedTest) ... ERROR
test_addc (unittest.loader._FailedTest) ... ERROR
test_addmm (unittest.loader._FailedTest) ... ERROR
test_all (unittest.loader._FailedTest) ... ERROR
test_amax (unittest.loader._FailedTest) ... ERROR
test_amp (unittest.loader._FailedTest) ... ERROR
test_amp_init_dtype_multithread (unittest.loader._FailedTest) ... ERROR
test_any (unittest.loader._FailedTest) ... ERROR
test_arange (unittest.loader._FailedTest) ... ERROR
test_argmax (unittest.loader._FailedTest) ... ERROR
test_asstride (unittest.loader._FailedTest) ... ERROR
test_atan (unittest.loader._FailedTest) ... ERROR
test_avg_pool2d (unittest.loader._FailedTest) ... ERROR
test_baddbmm (unittest.loader._FailedTest) ... ERROR
test_batch_norm (unittest.loader._FailedTest) ... ERROR
test_batch_norm_backward (unittest.loader._FailedTest) ... ERROR
test_binary_cross_entropy (unittest.loader._FailedTest) ... ERROR
test_binary_cross_entropy_with_logits (unittest.loader._FailedTest) ... ERROR
test_bitwise_not (unittest.loader._FailedTest) ... ERROR
test_bitwise_or (unittest.loader._FailedTest) ... ERROR
test_bmm (unittest.loader._FailedTest) ... ERROR
test_cast (unittest.loader._FailedTest) ... ERROR
test_cat (unittest.loader._FailedTest) ... ERROR
test_cdist (unittest.loader._FailedTest) ... ERROR
test_ceil (unittest.loader._FailedTest) ... ERROR
test_clamp (unittest.loader._FailedTest) ... ERROR
test_conv2d (unittest.loader._FailedTest) ... ERROR
test_convtranspose2d (unittest.loader._FailedTest) ... ERROR
test_copy (unittest.loader._FailedTest) ... ERROR
test_cos (unittest.loader._FailedTest) ... ERROR
test_cross_entropy_loss (unittest.loader._FailedTest) ... ERROR
test_ctc_loss (unittest.loader._FailedTest) ... ERROR
test_cumsum (unittest.loader._FailedTest) ... ERROR
test_div (unittest.loader._FailedTest) ... ERROR
test_dropout (unittest.loader._FailedTest) ... ERROR
test_embedding_backward (unittest.loader._FailedTest) ... ERROR
test_equal (unittest.loader._FailedTest) ... ERROR
test_erfinv (unittest.loader._FailedTest) ... ERROR
test_exp (unittest.loader._FailedTest) ... ERROR
test_fill (unittest.loader._FailedTest) ... ERROR
test_flip (unittest.loader._FailedTest) ... ERROR
test_floor (unittest.loader._FailedTest) ... ERROR
test_floor_divide (unittest.loader._FailedTest) ... ERROR
test_foreach_op (unittest.loader._FailedTest) ... ERROR
test_format_cast (unittest.loader._FailedTest) ... ERROR
test_gather (unittest.loader._FailedTest) ... ERROR
test_ge (unittest.loader._FailedTest) ... ERROR
test_gelu (unittest.loader._FailedTest) ... ERROR
test_generator (unittest.loader._FailedTest) ... ERROR
test_group_norm (unittest.loader._FailedTest) ... ERROR
test_hardswish (unittest.loader._FailedTest) ... ERROR
test_hardtanh (unittest.loader._FailedTest) ... ERROR
test_im2col (unittest.loader._FailedTest) ... ERROR
test_index (unittest.loader._FailedTest) ... ERROR
test_isnan (unittest.loader._FailedTest) ... ERROR
test_layer_norm (unittest.loader._FailedTest) ... ERROR
test_leaky_relu (unittest.loader._FailedTest) ... ERROR
test_lerp (unittest.loader._FailedTest) ... ERROR
test_linalg_qr (unittest.loader._FailedTest) ... ERROR
test_linalg_vec_norm (unittest.loader._FailedTest) ... ERROR
test_linear (unittest.loader._FailedTest) ... ERROR
test_linspace (unittest.loader._FailedTest) ... ERROR
test_load (unittest.loader._FailedTest) ... ERROR
test_log (unittest.loader._FailedTest) ... ERROR
test_log_softmax (unittest.loader._FailedTest) ... ERROR
test_log_softmax_backward (unittest.loader._FailedTest) ... ERROR
test_logical_and (unittest.loader._FailedTest) ... ERROR
test_logical_not (unittest.loader._FailedTest) ... ERROR
test_logical_or (unittest.loader._FailedTest) ... ERROR
test_masked_fill (unittest.loader._FailedTest) ... ERROR
test_masked_select (unittest.loader._FailedTest) ... ERROR
test_matmul (unittest.loader._FailedTest) ... ERROR
test_max_pool2d (unittest.loader._FailedTest) ... ERROR
test_mean_std (unittest.loader._FailedTest) ... ERROR
test_min_max (unittest.loader._FailedTest) ... ERROR
test_minimum_maximum (unittest.loader._FailedTest) ... ERROR
test_mm (unittest.loader._FailedTest) ... ERROR
test_mock_cudatensor (unittest.loader._FailedTest) ... ERROR
test_mseloss (unittest.loader._FailedTest) ... ERROR
test_mul (unittest.loader._FailedTest) ... ERROR
test_multihead_attention (unittest.loader._FailedTest) ... ERROR
test_multinomial (unittest.loader._FailedTest) ... ERROR
test_neg (unittest.loader._FailedTest) ... ERROR
test_nll_loss (unittest.loader._FailedTest) ... ERROR
test_nonzero (unittest.loader._FailedTest) ... ERROR
test_norm (unittest.loader._FailedTest) ... ERROR
test_normal (unittest.loader._FailedTest) ... ERROR
test_one_hot (unittest.loader._FailedTest) ... ERROR
test_ones (unittest.loader._FailedTest) ... ERROR
test_op_on_different_device (unittest.loader._FailedTest) ... ERROR
test_optimizer (unittest.loader._FailedTest) ... ERROR
test_pin_memory (unittest.loader._FailedTest) ... ERROR
test_polar (unittest.loader._FailedTest) ... ERROR
test_prod (unittest.loader._FailedTest) ... ERROR
test_profiler_vendor (unittest.loader._FailedTest) ... ERROR
test_python_device (unittest.loader._FailedTest) ... ERROR
test_random (unittest.loader._FailedTest) ... ERROR
test_randperm (unittest.loader._FailedTest) ... ERROR
test_reciprocal (unittest.loader._FailedTest) ... ERROR
test_relu (unittest.loader._FailedTest) ... ERROR
test_remainder (unittest.loader._FailedTest) ... ERROR
test_repeat (unittest.loader._FailedTest) ... ERROR
test_roll (unittest.loader._FailedTest) ... ERROR
test_rsqrt (unittest.loader._FailedTest) ... ERROR
test_rsub (unittest.loader._FailedTest) ... ERROR
test_scatter (unittest.loader._FailedTest) ... ERROR
test_sgn (unittest.loader._FailedTest) ... ERROR
test_sigmoid (unittest.loader._FailedTest) ... ERROR
test_sign (unittest.loader._FailedTest) ... ERROR
test_silu (unittest.loader._FailedTest) ... ERROR
test_sin (unittest.loader._FailedTest) ... ERROR
test_softmax (unittest.loader._FailedTest) ... ERROR
test_softmax_backward (unittest.loader._FailedTest) ... ERROR
test_sort (unittest.loader._FailedTest) ... ERROR
test_sqrt (unittest.loader._FailedTest) ... ERROR
test_stack (unittest.loader._FailedTest) ... ERROR
test_storage (unittest.loader._FailedTest) ... ERROR
test_sub (unittest.loader._FailedTest) ... ERROR
test_sum (unittest.loader._FailedTest) ... ERROR
test_tanh (unittest.loader._FailedTest) ... ERROR
test_tensor_new (unittest.loader._FailedTest) ... ERROR
test_topk (unittest.loader._FailedTest) ... ERROR
test_transpose (unittest.loader._FailedTest) ... ERROR
test_tri (unittest.loader._FailedTest) ... ERROR
test_unfold (unittest.loader._FailedTest) ... ERROR
test_uniform (unittest.loader._FailedTest) ... ERROR
test_unique (unittest.loader._FailedTest) ... ERROR
test_upsample (unittest.loader._FailedTest) ... ERROR
test_where (unittest.loader._FailedTest) ... ERROR
test_zeros (unittest.loader._FailedTest) ... ERROR
test_allocator (unittest_autogened_for_individual_scripts.TestIndividualScripts) ... [W OperatorEntry.cpp:153] Warning: Warning only once for all operators,  other operators may also be overrided.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::cat(Tensor[] tensors, int dim=0) -> Tensor
    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XLA
  previous kernel: registered at /pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1079
       new kernel: registered at /home/HwHiAiUser/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:3427 (function operator())
Process Process-1:
Traceback (most recent call last):
  File "/root/miniconda3/envs/torch_2.1/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/torch_2.1/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/HwHiAiUser/code/deeplink.framework/dipu/tests/python/individual_scripts/test_allocator.py", line 19, in test_allocator
    import torch_dipu
  File "/root/miniconda3/envs/torch_2.1/lib/python3.10/site-packages/torch_dipu/__init__.py", line 128, in <module>
    apply_patches()
  File "/root/miniconda3/envs/torch_2.1/lib/python3.10/site-packages/torch_dipu/__init__.py", line 120, in apply_patches
    apply_profiler_patch()
  File "/root/miniconda3/envs/torch_2.1/lib/python3.10/site-packages/torch_dipu/profiler/profiler.py", line 509, in apply_profiler_patch
    _apply_ascend_profiler_patch()
  File "/root/miniconda3/envs/torch_2.1/lib/python3.10/site-packages/torch_dipu/profiler/profiler.py", line 513, in _apply_ascend_profiler_patch
    from torch_dipu.profiler.ascend.ascend_profiler_merger import (
ModuleNotFoundError: No module named 'torch_dipu.profiler.ascend'
export DIOPI_ROOT=/home/$USER/code/dipu/third_party/DIOPI/impl/lib
export DIPU_ROOT=/home/$USER/code/dipu/torch_dipu
export LIBRARY_PATH=$DIPU_ROOT:$DIOPI_ROOT:$LIBRARY_PATH;
export LD_LIBRARY_PATH=$DIPU_ROOT:$DIOPI_ROOT:$LD_LIBRARY_PATH

DIOPI_ROOT,DIPU_ROOT修改为自己的路径对应的环境变量后,zsh ./tests/python/run_tests.sh(run_tests.sh语法已经修改适合zsh)。 生产环境:docker挂载单卡910B,torch2.1

dengyingxu commented 4 weeks ago

同样问题,请问有大佬解答嘛?

caikun-pjlab commented 4 weeks ago

请问您环境中PYTHONPATH是什么?

dengyingxu commented 4 weeks ago

请问您环境中PYTHONPATH是什么?

export DIPU_DEVICE=ascend PYTHON_INCLUDE_DIR=/usr/local/anaconda3/envs/new_env/include/python3.8 PYTORCH_DIR=/usr/local/anaconda3/envs/new_env/lib/python3.8/site-packages/.
pytorch 是用conda安装的conda install pytorch==2.1.1 torchvision==0.16.1 cpuonly -c pytorch gcc版本是gcc (GCC) 9.3.0

image

PYTHONPATH环境中包括了dipu,请问可以看下下面log嘛?目前这个环境是可用的嘛?测试脚本里面出现很多错误

dengyingxu commented 4 weeks ago

output.log 运行测试脚本,输出如上文件

tangpanyu commented 4 weeks ago

output.log 运行测试脚本,输出如上文件

https://deeplink.readthedocs.io/zh-cn/latest/doc/example/huawei_2024.html 按照上面走,当时我走这个流程的时候中间缺少了包,需要加一个手动安装一个pip install -e <你目标位置> ,装在环境目录下会少一个文件,走通之后除了卷积算子其他的都能测试通过,卷积算子都不支持一种内存格式,但是大模型推理有bug,具体我忘了,反正我到最后没走通。

caikun-pjlab commented 4 weeks ago

我看了一下你的日志,跑测例缺少一个包,pip install expecttest

dengyingxu commented 4 weeks ago

tmp.log 十分感谢,但是实际测试下来,存在一些不支持,甚至core dump情况,请问这符合预期嘛?日志如上所示

dengyingxu commented 4 weeks ago

十分感谢,但是实际测试下来,存在一些不支持,甚至core dump情况,请问这符合预期嘛?日志如上所示

我看了一下你的日志,跑测例缺少一个包,pip install expecttest

十分感谢,但是实际测试下来,存在一些不支持,甚至core dump情况,请问这符合预期嘛?日志如下所示