InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.35k stars 390 forks source link

Test Ascend Atlas 300I Duo NPU device by LMDeploy V0.6.0 #2471

Open zer0py2c opened 3 weeks ago

zer0py2c commented 3 weeks ago

Does it support Ascend Atlas 300I Duo NPU devices?

lvhan028 commented 2 weeks ago

No, it doesn't.

jinminxi104 commented 2 weeks ago

Since we don't have 300I device, 300I device is not tested. But, huawei document shows that almost all operators supported on 800t are also supported on 300 inference series. We recommend you to try it on 300I and let us know if there is any error on 300i device.

zer0py2c commented 2 weeks ago

Since we don't have 300I device, 300I device is not tested. But, huawei document shows that almost all operators supported on 800t are also supported on 300 inference series. We recommend you to try it on 300I and let us know if there is any error on 300i device.

thanks, I will try and give feedback soon.

zer0py2c commented 2 weeks ago

Now, I can tell everyone about the testing of the Atlas 300I Duo device:

My environment

Python code

# 离线推理测试
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig

pipe = pipeline("/opt/models/Qwen2-7B-Instruct", backend_config = PytorchEngineConfig(tp=1, device_type="ascend"))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question)
print(response)

Response

root@c486e2f96ded:/opt/lmdeploy# python3 offline_test.py
[W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:301: ImportWarning:
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************

  warnings.warn(msg, ImportWarning)
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
  warnings.warn(msg, RuntimeWarning)
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import packaging  # type: ignore[attr-defined]
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
/opt/lmdeploy/lmdeploy/serve/utils.py:22: DeprecationWarning: There is no current event loop
  event_loop = asyncio.get_event_loop()
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/storage.py:38: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  if self.device.type != 'cpu':
[E compiler_depend.ts:270] call aclnnFlashAttentionVarLenScore failed, detail:EZ9999: Inner Error!
EZ9999: 2024-09-20-07:49:24.783.260  Op FlashAttentionScore does not has any binary.
        TraceBack (most recent call last):
        Kernel Run failed. opType: 29, FlashAttentionScore
        launch failed for FlashAttentionScore, errno:561000.

[ERROR] 2024-09-20-07:49:24 (PID:759, Device:0, RankID:-1) ERR01005 OPS internal error
Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:452 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x68 (0xfffd264dd898 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x6c (0xfffd264962a8 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0xd4f864 (0xfffce481f864 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: <unknown function> + 0xe789a0 (0xfffce49489a0 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: <unknown function> + 0x5eab64 (0xfffce40bab64 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: <unknown function> + 0x5eb018 (0xfffce40bb018 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: <unknown function> + 0x5e8520 (0xfffce40b8520 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: <unknown function> + 0x946ec (0xfffd265046ec in /usr/local/python3.10.5/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x7624 (0xfffd6cc07624 in /lib/aarch64-linux-gnu/libpthread.so.0)
frame #9: <unknown function> + 0xd162c (0xfffd6cd3162c in /lib/aarch64-linux-gnu/libc.so.6)

2024-09-20 07:49:24,787 - lmdeploy - ERROR - Engine loop failed with error: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionVarLenScore.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-09-20-07:49:24 (PID:759, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Traceback (most recent call last):
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/request.py", line 17, in _raise_exception_on_finish
    task.result()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 944, in async_loop
    await self._async_loop()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 934, in _async_loop
    await __step(True)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 920, in __step
    raise e
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 912, in __step
    raise out
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 856, in _async_loop_background
    await self._async_step_background(
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 735, in _async_step_background
    output = await self._async_model_forward(
  File "/opt/lmdeploy/lmdeploy/utils.py", line 237, in __tmp
    return (await func(*args, **kwargs))
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 633, in _async_model_forward
    ret = await __forward(inputs)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 611, in __forward
    return await self.model_agent.async_forward(
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 332, in async_forward
    output = self._forward_impl(inputs,
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 299, in _forward_impl
    output = model_forward(
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 154, in model_forward
    output = model(**input_dict)
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 25, in __call__
    return self.model(**kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 340, in forward
    hidden_states = self.model(
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 278, in forward
    hidden_states, residual = decoder_layer(
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 194, in forward
    hidden_states = self.self_attn(
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 101, in forward
    attn_output = self.o_proj(attn_output)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 921, in forward
    return self.impl.forward(x, self.weight, self.bias, all_reduce)
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/default/linear.py", line 20, in forward
    out = F.linear(x, weight, bias)
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionVarLenScore.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-09-20-07:49:24 (PID:759, Device:0, RankID:-1) ERR00100 PTA call acl api failed
/usr/local/python3.10.5/lib/python3.10/tempfile.py:837: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpe97gupob'>
  _warnings.warn(warn_message, ResourceWarning)
zer0py2c commented 2 weeks ago

maybe this operator is not supported :( 算子

github-actions[bot] commented 1 week ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

jinminxi104 commented 1 week ago

maybe this operator is not supported

Oh..We use this training-op for more efficient calculation. We'll solve this issue in next month.