Closed Xingtao closed 4 years ago
"When do classifier training, it seems the gradient will be applied to policy network? Is this the case?" This is not the case. The ._get_classifier_training_op
function (inherited from SACClassifier
) ensures that the classifier training op only updates the classifier variables. See this line: https://github.com/avisingh599/reward-learning-rl/blob/8070d93e9379204f153e9044e03079bd9a354183/softlearning/algorithms/sac_classifier.py#L91
Ok, thanks
Hi,
In vice.py, the classifier loss defined as
When do classifier training, it seems the gradient will be applied to policy network? Is this the case?
In paper, it is said policy network trained with classifier's output as reward, but not updated when do classifier training. What am I missing ?
Thanks