Adversarial reward in GCPN

Hi torchdrug team, thank you for the awesome project! I am interested in trying to reproduce GCPN according to your tutorial. But I found that the reward design in your code is different from the GCPN in original paper. Reward in origin paper includes adversarial loss and property score. However, I just find property score as reward in your code.


if task == "plogp":
    plogp = metrics.penalized_logP(graph)
    metric["Penalized logP"] = plogp.mean()
    metric["Penalized logP (max)"] = plogp.max()
    self.update_best_result(graph, plogp, "Penalized logP")
    reward += (plogp / self.reward_temperature).exp()
` ``
Did i miss something? Looking forward to your reply!

DeepGraphLearning / torchdrug

Adversarial reward in GCPN #87