Ascend / pytorch

Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
https://ascend.github.io/docs/
Other
263 stars 15 forks source link

昇腾310p LLM推理报错:RuntimeError: copy_d2d:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:104 NPU function error: #49

Closed Serious-H closed 2 months ago

Serious-H commented 2 months ago

一、问题现象(附报错日志上下文): RuntimeError: copy_d2d:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:104 NPU function error: c10_npu::acl::AclrtSynchronizeStreamWithTimeout(copy_stream), error code is 507013 [ERROR] 2024-09-04-02:09:03 (PID:2526494, Device:1, RankID:-1) ERR00100 PTA call acl api failed [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EI9999: Inner Error! The error from device(1), serial number is 13. there is a sdma error, sdma channel is 0, the channel exist the following problems: The SMMU returns a Terminate error during page table translation.. the value of CQE status is 2. the description of CQE status: When the SQE translates a page table, the SMMU returns a Terminate error.it's config include: setting1=0xc000080880e0000, setting2=0xff009000ff004c, setting3=0, sq base addr=0x800d00801003d000[FUNC:ProcessSdmaErrorInfo][FILE:device_error_proc.cc][LINE:704] EI9999: 2024-09-04-02:09:03.196.977 Memory async copy failed, device_id=1, stream_id=3, task_id=703, flip_num=0, copy_type=2, memcpy_type=0, copy_data_type=0, length=40960[FUNC:GetError][FILE:stream.cc][LINE:1082] TraceBack (most recent call last): rtStreamSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 507013[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

DEVICE[1] PID[2526494]: EXCEPTION STREAM: Exception info:TGID=2526494, model id=65535, stream id=3, stream phase=3 Message info[0]:RTS_HWTS: hwts sdma error, slot_id=29, stream_id=3 Other info[0]:time=2024-09-04-02:08:53.201.667, function=int_process_hwts_sdma_error, line=2070, error code=0x20b [W compiler_depend.ts:409] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.218.111 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeUsedDevices) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.220.229 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.222.326 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.224.399 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.226.481 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.228.566 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.230.811 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.233.321 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507013[Error]: [Error]: System Direct Memory Access (DMA) hardware execution error. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: 2024-09-04-02:09:03.235.839 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice)

二、软件版本: -- CANN 版本 (e.g., CANN 3.0.x,5.x.x): CANN 8.0.RC2 --Tensorflow/Pytorch/MindSpore 版本: pytorch 2.1.0 torch_npu2.1.0.post6
--Python 版本 (e.g., Python 3.7.5):Python 3.10.14 --操作系统版本 (e.g., Ubuntu 18.04): Ubuntu 20.04.5 LTS (Focal Fossa) 三、测试步骤:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("/baichuan-2-chat-pytorch-7b", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/baichuan-2-chat-pytorch-7b", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("/baichuan-2-chat-pytorch-7b")
messages = []
messages.append({"role": "user", "content": "介绍一下自己。"})
response = model.chat(tokenizer, messages)
print(response)

模型能加载进来,但是推理会报错。

四、日志信息: ascend/log/debug/plog/plog-2514126_20240904013504490.log 日志见附件 plog-2514126_20240904013504490.log

Ycpljl commented 2 months ago

Same problem, have you solved it?

Lidarker commented 2 months ago

您用的是AMD平台还是INTEL平台呀,我现在AMD平台和你也是一样的问题,关闭了IOMMU也是一样的问题

yunyiyun commented 2 months ago

该芯片暂不支持copy_d2d,可以尝试使用训练系列芯片,或者考虑device_map="auto"改为具体的卡,例如device_map="npu:0"

Serious-H commented 2 months ago

这个问题好像是通信算子超时问题,昇腾310p不能这样直接transformer走torch npu推理,需要用mindie。 @Lidarker @Ycpljl 针对上面这个错误的话设置单卡运行可以避免:export ASCEND_RT_VISIBLE_DEVICES=0 ;但是推理时间几十分钟,根本没法用。

jasonXue653 commented 3 weeks ago

Atlas300I Pro 使用 mindIE 刚刚支持了 GLM4 ,但是推理报错 RuntimeError,权重量化代码也不行,量化代码中也没有指定卡这么一说

附权重量化代码: `# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.

from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig from atb_llm.models.chatglm.config_chatglm import ChatglmConfig from examples.models.chatglm.v2_6b.quant_utils \ import get_model_and_tokenizer, get_calib_dataset, read_dataset from examples.convert.convert_utils import copy_tokenizer_files, modify_config from examples.convert.model_slim.quantifier import parse_arguments

NPU = "npu"

def main(): args = parse_arguments() fp16_path = args.model_path # 原始浮点模型路径 model, tokenizer = get_model_and_tokenizer(fp16_path, True)

quant_config = QuantConfig(
    a_bit=8,
    w_bit=8,
    disable_names=None,
    dev_type=NPU,
    act_method=3,
    pr=1.0,
    w_sym=True,
    mm_tensor=False,
    use_kvcache_quant=args.use_kvcache_quant
)

calib_set = read_dataset(args.calib_file)
dataset_calib = get_calib_dataset(tokenizer, calib_set, NPU)
calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L25')
calibrator.run()  # 执行PTQ量化校准
calibrator.save(args.save_directory, save_type=["safe_tensor"])  # "safe_tensor"对应safetensors格式权重
copy_tokenizer_files(fp16_path, args.save_directory)
config = ChatglmConfig.from_pretrained(fp16_path)
modify_config(fp16_path, args.save_directory, config.torch_dtype, 'w8a8', args)

if name == 'main': main()`

===================== 执行脚本:

python quant_glm4_w8a8.py --model_path /root/workspace/GLM-4/basic_demo/THUDM/glm4-9b --save_directory ./glm4-9b_w8a8 --calib_file ./CEval/val/Other/civil_servant.jsonl --device_type npu 6f09a7411ab0315f6a28cb3374b0db2

yunyiyun commented 3 weeks ago

这里显示初始化报错,请查看plog日志,根据报错原因进行解决