PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.24k stars 5.58k forks source link

RuntimeError: (NotFound) No Input(Y) found for ElementwiseOp operator #62249

Open Happy-zyy opened 8 months ago

Happy-zyy commented 8 months ago

bug描述 Describe the Bug

报错信息: Traceback (most recent call last): File "ernie/run_classifier.py", line 651, in main(args) File "ernie/run_classifier.py", line 133, in main train_feed_list2, graph_vars2, ema = create_graph(args, model_fn, True, train_program_2, startup_prog, ernie_config, "step2") File "ernie/run_classifier.py", line 246, in create_graph step=step_name File "/root/paddlejob/workspace/aurora/aurora_finetune.metric_compute_acc.fix_noise/ernie/reranker/aurora_v3.py", line 223, in create_model loss += loss_val File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/layers/math_op_patch.py", line 423, in impl attrs={'axis': axis}) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 4046, in append_op attrs=kwargs.get("attrs", None), File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 3037, in init self.desc.infer_shape(self.block.desc) RuntimeError: (NotFound) No Input(Y) found for ElementwiseOp operator. [Hint: Expected ctx->HasInput("Y") == true, but received ctx->HasInput("Y"):0 != true:1.] (at /root/paddlejob/Paddle/paddle/fluid/operators/elementwise/elementwise_op.h:43) [operator < elementwise_add > error]

代码位置: for loss_name, loss_val, in Task_fetcher['loss'].items(): logging.info("optimize loss: {}".format(loss_name)) logging.info("zyy_debug loss: {}".format(loss_val)) loss += loss_val

就是简单的加法op,就报错了 我打印了每个loss_val的维度,都是一致的,为什么会报错呢? [INFO] 2024-02-29 20:23:57,523 [aurora_v3.py: 221]: optimize loss: loss_semantics [INFO] 2024-02-29 20:23:57,523 [aurora_v3.py: 222]: zyy_debug loss: var tmp_69 : LOD_TENSOR.shape(1,).dtype(float32).stop_gradient(False) [INFO] 2024-02-29 20:23:57,523 [aurora_v3.py: 221]: optimize loss: loss_satisfaction_topkloss [INFO] 2024-02-29 20:23:57,523 [aurora_v3.py: 222]: zyy_debug loss: var tmp_183 : LOD_TENSOR.shape(1,).dtype(float32).stop_gradient(False)

其他补充信息 Additional Supplementary Information

No response

jeff41404 commented 8 months ago

看上面的日志是 += 操作调用了旧版本中fluid API,而fluid API存在较多问题,已经在Paddle 2.6版本完全下线,建议使用Paddle 2.6版本运行看看