Hi, Thanks for sharing the code. However, I noticed that the loss function in the code misses the aspect discriminator loss (Equation (8) in the paper). And there's an additional context_predition loss originally from the PETER model. May you explain the reason for that?
Hi, Thanks for sharing the code. However, I noticed that the loss function in the code misses the aspect discriminator loss (Equation (8) in the paper). And there's an additional context_predition loss originally from the PETER model. May you explain the reason for that?