PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.25k stars 5.59k forks source link

静态图实现对抗训练(FGM),梯度运算过程出现[operator < elementwise_add > error]错误,交换加法算符位置可解决 #24915

Closed xwen99 closed 3 years ago

xwen99 commented 4 years ago

对抗损失 adv_loss 在 build_env 时被加到正常的损失上:

    def _build_env(self):
        """
        building the program and strategy for specific running phase.
        """
        if self.env.is_inititalized:
            return

        self._build_env_start_event()
        self.env.is_inititalized = True
        self.env.main_program = clone_program(
            self._base_main_program, for_test=False)

        self.env.startup_program = fluid.Program()
        with fluid.program_guard(self.env.main_program,
                                 self._base_startup_program):
            with fluid.unique_name.guard(self.env.UNG):
                self.env.outputs = self._build_net()
                if self.is_train_phase or self.is_test_phase:
                    self.env.labels = self._add_label()
                    self.env.loss = self._add_loss()
                    self.env.metrics = self._add_metrics()

        if self.is_predict_phase or self.is_test_phase:
            self.env.main_program = clone_program(
                self.env.main_program, for_test=True)
            hub.common.paddle_helper.set_op_attr(
                self.env.main_program, is_test=True)

        if self.is_train_phase:
            with fluid.program_guard(self.env.main_program,
                                    self._base_startup_program):
                with fluid.unique_name.guard(self.env.UNG):
                    self.env.adv_loss = self.adversarial_loss(self.loss)
                    self.env.loss += self.env.adv_loss # 这里
        ...

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) 2 paddle::operators::ElementwiseOp::InferShape(paddle::framework::InferShapeContext*) const 3 paddle::framework::OpDesc::InferShape(paddle::framework::BlockDesc const&) const


Python Call Stacks (More useful to users):

File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py", line 242, in impl attrs={'axis': axis}) File "/home/aistudio/work/DuReader-Robust-With-Paddlehub/my_reading_comprehension_task.py", line 520, in adversarial_loss adv_loss = self.cl_loss_from_embedding(self.feature + perturb) File "/home/aistudio/work/DuReader-Robust-With-Paddlehub/my_reading_comprehension_task.py", line 563, in _build_env self.env.adv_loss = self.adversarial_loss(self.loss) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 508, in main_program self._build_env() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 834, in load_checkpoint main_program=self.main_program) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 367, in init_if_necessary if not self.load_checkpoint(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 883, in finetune self.init_if_necessary() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 868, in finetune_and_eval return self.finetune(do_eval=True) File "reading_comprehension.py", line 104, in reading_comprehension_task.finetune_and_eval()


Error Message Summary:

NotFoundError: No Input(Y) found for ElementwiseOp operator. [Hint: Expected ctx->HasInput("Y") == true, but received ctx->HasInput("Y"):0 != true:1.] at (/paddle/paddle/fluid/operators/elementwise/elementwise_op.h:42) [operator < elementwise_add > error]

xwen99 commented 4 years ago

复现环境:https://aistudio.baidu.com/aistudio/projectdetail/445781

jerrywgz commented 4 years ago

可以先尝试升级至paddle2.0.0.a0,目前1.8版本中已知elementwise的操作可能会出现问题,在新版本已经修复

paddle-bot-old[bot] commented 3 years ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。