Open xljhtq opened 6 years ago
total_loss = task_loss + adv_loss + diff_loss + l2_loss
@FrankWork In your code, "total_loss = task_loss + adv_loss + diff_loss + l2_loss" , to minimize the total_loss, then the adv_loss will be decreasing. But in reality, we should let adv_loss increase in order to get the shared feature. So what I should do to maximize the adv_loss or minimize the adv_loss ???
there is a function flip_gradient
to maximize the adv_loss
Hi, do you know equivalent function of flip_gradient in pytorch
Hi, I want to know where the adv loss is different from the domain loss?? In another word, the adv loss in the paper "Adversarial Multi-task Learning for Text Classification" has not described clearly. So i want to know what the equation is??