Open ethancaballero opened 7 years ago
The unified version is already supported by the current implementation (at least in theory). The idea is to use a SharedModel for Q value and 2 heads (without trainable parameters) for pi and V using the formula in the paper.
Here is a quick implementation: https://github.com/lyx-x/chainerrl/blob/ab6cb4f9ff1dd419573d8fa3fc8c05840548d74d/examples/gym/train_pcl_gym.py#L155
I believe we can close this issue.
See section 5.1 for new more performant update to PCL: https://arxiv.org/pdf/1702.08892.pdf