Open zer0py2c opened 3 weeks ago
No, it doesn't.
Since we don't have 300I device, 300I device is not tested. But, huawei document shows that almost all operators supported on 800t are also supported on 300 inference series. We recommend you to try it on 300I and let us know if there is any error on 300i device.
Since we don't have 300I device, 300I device is not tested. But, huawei document shows that almost all operators supported on 800t are also supported on 300 inference series. We recommend you to try it on 300I and let us know if there is any error on 300i device.
thanks, I will try and give feedback soon.
Now, I can tell everyone about the testing of the Atlas 300I Duo device:
# 离线推理测试
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
pipe = pipeline("/opt/models/Qwen2-7B-Instruct", backend_config = PytorchEngineConfig(tp=1, device_type="ascend"))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question)
print(response)
root@c486e2f96ded:/opt/lmdeploy# python3 offline_test.py
[W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:301: ImportWarning:
*************************************************************************************************************
The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
The backend in torch.distributed.init_process_group set to hccl now..
The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
The device parameters have been replaced with npu in the function below:
torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
*************************************************************************************************************
warnings.warn(msg, ImportWarning)
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
warnings.warn(msg, RuntimeWarning)
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import packaging # type: ignore[attr-defined]
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
/opt/lmdeploy/lmdeploy/serve/utils.py:22: DeprecationWarning: There is no current event loop
event_loop = asyncio.get_event_loop()
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/storage.py:38: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
if self.device.type != 'cpu':
[E compiler_depend.ts:270] call aclnnFlashAttentionVarLenScore failed, detail:EZ9999: Inner Error!
EZ9999: 2024-09-20-07:49:24.783.260 Op FlashAttentionScore does not has any binary.
TraceBack (most recent call last):
Kernel Run failed. opType: 29, FlashAttentionScore
launch failed for FlashAttentionScore, errno:561000.
[ERROR] 2024-09-20-07:49:24 (PID:759, Device:0, RankID:-1) ERR01005 OPS internal error
Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:452 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x68 (0xfffd264dd898 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x6c (0xfffd264962a8 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0xd4f864 (0xfffce481f864 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: <unknown function> + 0xe789a0 (0xfffce49489a0 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: <unknown function> + 0x5eab64 (0xfffce40bab64 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: <unknown function> + 0x5eb018 (0xfffce40bb018 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: <unknown function> + 0x5e8520 (0xfffce40b8520 in /usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: <unknown function> + 0x946ec (0xfffd265046ec in /usr/local/python3.10.5/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x7624 (0xfffd6cc07624 in /lib/aarch64-linux-gnu/libpthread.so.0)
frame #9: <unknown function> + 0xd162c (0xfffd6cd3162c in /lib/aarch64-linux-gnu/libc.so.6)
2024-09-20 07:49:24,787 - lmdeploy - ERROR - Engine loop failed with error: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionVarLenScore.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-09-20-07:49:24 (PID:759, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/request.py", line 17, in _raise_exception_on_finish
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 944, in async_loop
await self._async_loop()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 934, in _async_loop
await __step(True)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 920, in __step
raise e
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 912, in __step
raise out
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 856, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 735, in _async_step_background
output = await self._async_model_forward(
File "/opt/lmdeploy/lmdeploy/utils.py", line 237, in __tmp
return (await func(*args, **kwargs))
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 633, in _async_model_forward
ret = await __forward(inputs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 611, in __forward
return await self.model_agent.async_forward(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 332, in async_forward
output = self._forward_impl(inputs,
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 299, in _forward_impl
output = model_forward(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 154, in model_forward
output = model(**input_dict)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 25, in __call__
return self.model(**kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 340, in forward
hidden_states = self.model(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 278, in forward
hidden_states, residual = decoder_layer(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 194, in forward
hidden_states = self.self_attn(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 101, in forward
attn_output = self.o_proj(attn_output)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 921, in forward
return self.impl.forward(x, self.weight, self.bias, all_reduce)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/default/linear.py", line 20, in forward
out = F.linear(x, weight, bias)
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionVarLenScore.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-09-20-07:49:24 (PID:759, Device:0, RankID:-1) ERR00100 PTA call acl api failed
/usr/local/python3.10.5/lib/python3.10/tempfile.py:837: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpe97gupob'>
_warnings.warn(warn_message, ResourceWarning)
maybe this operator is not supported :(
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
maybe this operator is not supported
Oh..We use this training-op for more efficient calculation. We'll solve this issue in next month.
Does it support Ascend Atlas 300I Duo NPU devices?