Trainer Gradient Update For Scorer in both train_step and update_language_sampler

Hi Cindy,

I was studying your code for in trainer.py and it seems like that you perform update for RL scorer (the data actor) in both update_language_sampler function and train_step function. Initially I thought you only update the RL in update_language_sampler() where you compute the cosine similarity of two gradients, but then I saw this block of code (which seems to only update ave_emb actor, so I wonder if you actually use this block of code?)

# optimize data actor
            for k in cached_loss.keys():
                reward = 1./eta * (cur_loss[k] - cached_loss[k])
                if self.args.out_score_type == 'sigmoid':
                    #loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                    loss = -(data_actor_out[k] * reward.data)
                elif self.args.out_score_type == 'exp':
                    loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                if cur_loss[k].size(0) > 0:
                    loss.div_(cur_loss[k].size(0))
                loss.sum().backward()
            if self.args.data_actor == 'ave_emb': 
                self.data_optimizer.step()
                self.data_optimizer.zero_grad()

Thank you for your help and clarification!

cindyxinyiwang / multiDDS

Trainer Gradient Update For Scorer in both train_step and update_language_sampler #3