Closed ZRZ-Unknow closed 3 years ago
About the network architecture, you can use the code base as it is and this should work fine. I may have updated the network architecture for making it work on retraining (will check and update the paper). But both architecture should work for imitation. And about the discriminator formula, output of torch.exp often not stable to use in the gradient update and thus the later equation that you mentioned did not work for me either. So I decided to avoid it.
In the paper "discriminator is 2 layer MLP of 100 hidden units with tanh activation. Our generator consists of separate Actor and Critic neural network and follows the architecture used in [5, 8], where both of these networks have 2 layer MLP of 400 and 300 hidden units with ReLU activation" But in your implementation the hidden units and activation in the network is not designed as you have described in the paper, why? And When computing the discriminator's loss, you use:
Why this works? D is the output of disciminator, according to the formula in the paper, I think this should make sense:
But this doesnot seem to work well.