Closed Newcomer-CL closed 1 year ago
您好,咨询了下昆仑同学,造成这个问题的原因是“topk的size太大了,目前XDNN的api暂时还没有支持”,可以尝试输出如下环境变量 export XPU_BLACK_LIST=topk
将 topk 算子加入XPU黑名单,使其fallback到CPU上运行来解决这个问题。
还是报错
一样的错
您能打开 "export GLOG_v=10 && export XPU_BLACK_LIST=topk" 跑一下然后所有的输出重定向到一个log文件之后上传一下吗?谢谢
另外也有可能是需要 “export XPU_BLACK_LIST=topk," 加个逗号试一试哈
我修改了源码中的paddleseg/model/loss/cross_entropy_loss.py,将topk拿到cpu上计算,再将结果拿到xpu上迭代了1000轮没报错。
wangjn7 @.***
------------------ 原始邮件 ------------------ 发件人: "PaddlePaddle/Paddle" @.>; 发送时间: 2023年8月24日(星期四) 下午5:04 @.>; @.**@.>; 主题: Re: [PaddlePaddle/Paddle] 训练paddleseg中的panoptic-deeplab报错 (Issue #56464)
您能打开 "export GLOG_v=10 && export XPU_BLACK_LIST=topk" 跑一下然后所有的输出重定向到一个log文件之后上传一下吗?谢谢
另外也有可能是需要 “export XPU_BLACK_LIST=topk," 加个逗号试一试哈
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
请提出你的问题 Please ask your question
环境: cpu:飞腾D2000 加速卡:昆仑芯R200 OS:kylin v10 paddle-xpu 2.5 paddleseg 2.7.0
执行命令: python3 -m paddle.distributed.launch train.py --config configs/panoptic_deeplab/panoptic_deeplab_resnet50_os32_cityscapes_1025x513_bs8_90k_lr00005.yml --do_eval --use_vdl --save_interval 5000 --save_dir output --batch_size 4
报错: /home/zkjr/.local/lib/python3.9/site-packages/paddleseg/transforms/functional.py:105: RuntimeWarning: invalid value encountered in cast im[:, :, 0] = im[:, :, 0] + hue_delta /home/zkjr/.local/lib/python3.9/site-packages/paddle/nn/layer/norm.py:777: UserWarning: When training, we now always track global mean and variance. warnings.warn( Traceback (most recent call last): File "/home/zkjr/fangtian/PaddleSeg-2.7.0/contrib/PanopticDeepLab/train.py", line 176, in
main(args)
File "/home/zkjr/fangtian/PaddleSeg-2.7.0/contrib/PanopticDeepLab/train.py", line 154, in main
train(
File "/home/zkjr/fangtian/PaddleSeg-2.7.0/contrib/PanopticDeepLab/core/train.py", line 174, in train
loss_list = loss_computation(
File "/home/zkjr/fangtian/PaddleSeg-2.7.0/contrib/PanopticDeepLab/core/train.py", line 39, in loss_computation
semantic_loss = losses['types'][0](logits_list[0], semantic,
File "/home/zkjr/.local/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "/home/zkjr/.local/lib/python3.9/site-packages/paddleseg/models/losses/cross_entropy_loss.py", line 88, in forward
return self._post_process_loss(logit, label, semantic_weights, loss)
File "/home/zkjr/.local/lib/python3.9/site-packages/paddleseg/models/losses/cross_entropy_loss.py", line 132, in _post_process_loss
loss, indices = paddle.topk(loss, top_k_pixels)
File "/home/zkjr/.local/lib/python3.9/site-packages/paddle/tensor/search.py", line 913, in topk
out, indices = _C_ops.topk(x, k, axis, largest, sorted)
OSError: (External) sorted_topk XDNN Error, XDNN_INVALID_PARAM (at /workspace/Paddle/paddle/phi/kernels/xpu/top_k_kernel.cc:76)