请提出你的问题

paddle框架编译npu版本check成功： FLAGS(name='FLAGS_allocator_strategy', current_value='naive_best_fit', default_value='auto_growth')

I0430 15:37:53.522773 32875 tcp_utils.cc:130] Successfully connected to 127.0.0.1:60423 I0430 15:38:17.834956 32959 tcp_store.cc:293] receive shutdown event and so quit from MasterDaemon run loop PaddlePaddle works well on 8 npus. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

代码分支：develop paddle框架编译的docker镜像：registry.baidubce.com/device/paddle-npu:cann80T2-910B-ubuntu18-aarch64 npu-info： +------------------------------------------------------------------------------------------------+ | npu-smi 23.0.0 Version: 23.0.0 | +---------------------------+---------------+----------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)| | Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) | +===========================+===============+====================================================+ | 0 910B3 | OK | 94.4 39 0 / 0 | | 0 | 0000:C1:00.0 | 0 0 / 0 3315 / 65536 | +===========================+===============+====================================================+ | 1 910B3 | OK | 91.6 37 0 / 0 | | 0 | 0000:C2:00.0 | 0 0 / 0 3315 / 65536 | +===========================+===============+====================================================+ | 2 910B3 | OK | 92.3 38 0 / 0 | | 0 | 0000:81:00.0 | 0 0 / 0 3315 / 65536 | +===========================+===============+====================================================+ | 3 910B3 | OK | 92.6 39 0 / 0 | | 0 | 0000:82:00.0 | 0 0 / 0 3315 / 65536 |

模型训练错误日志： Traceback (most recent call last): File "/work/PaddleNLP/model_zoo/uie/finetune.py", line 262, in main() File "/work/PaddleNLP/model_zoo/uie/finetune.py", line 193, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/opt/py39/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 888, in train self._maybe_log_save_evaluate(tr_loss, model, epoch, ignore_keys_for_eval, inputs=inputs) File "/opt/py39/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 1024, in _maybe_log_save_evaluate tr_loss_scalar = self._nested_gather(tr_loss).mean().item() File "/opt/py39/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 2544, in _nested_gather tensors = distributed_concat(tensors) File "/opt/py39/lib/python3.9/site-packages/paddlenlp/trainer/utils/helper.py", line 41, in distributed_concat outputtensors = [t if len(t.shape) > 0 else t.reshape([-1]) for t in output_tensors] File "/opt/py39/lib/python3.9/site-packages/paddlenlp/trainer/utils/helper.py", line 41, in outputtensors = [t if len(t.shape) > 0 else t.reshape([-1]) for t in output_tensors] File "/opt/py39/lib/python3.9/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "/opt/py39/lib/python3.9/site-packages/paddle/base/wrapped_decorator.py", line 26, in impl return wrapped_func(*args, *kwargs) File "/opt/py39/lib/python3.9/site-packages/paddle/utils/inplace_utils.py", line 45, in impl return func(args, kwargs) File "/opt/py39/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 4635, in reshape_ out = _Cops.reshape(x, shape) OSError: (External) ACL error, the error code is : 100000. (at /work/PaddleCustomDevice/backends/npu/kernels/funcs/npu_op_runner.cc:223)

PaddlePaddle / PaddleNLP

[Question]: uie-base模型在昇腾服务器上训练错误 #8354

请提出你的问题

paddle框架编译npu版本check成功： FLAGS(name='FLAGS_allocator_strategy', current_value='naive_best_fit', default_value='auto_growth')

/bin/bash