Kaixhin / ACER

Actor-critic with experience replay
MIT License
251 stars 46 forks source link

Doubts #11

Closed random-user-x closed 6 years ago

random-user-x commented 6 years ago

Could you please let me know why there is a negative sign. I think that since we have already defined kl-divergence in the step before, we do not need a negative sign here. Please let me know how do you see this.

https://github.com/Kaixhin/ACER/blob/f22b07cebd9ec278c5b604b2652e6657df4b61ab/train.py#L81

Kaixhin commented 6 years ago

The trust region function should be a translation of the one from ChainerRL, apart from the fact that the trust region involves some parameters, so if you think it should be the other way around then you should raise an issue there to let them know too.

random-user-x commented 6 years ago

I think since we have already defined the kl divergence before, we do not really need the negative sign. I am not sure why there is a negative sign.

Please let me know your views. Do you think it should be the other way round?

Kaixhin commented 6 years ago

I'm not sure either, I just ported the code from Chainer for this part.

random-user-x commented 6 years ago

Let me know the difference in your implementation and OpenAI baselines.

Kaixhin commented 6 years ago

I've not compared the two at all, and this is very low priority for me at the moment.

random-user-x commented 6 years ago

@Kaixhin, thank you for your input. I will look into the chainerrl and OpenAI baselines to get more insight about the implementation. I will just send a PR if needed.

Thanks

random-user-x commented 6 years ago

I am closing this issue because the current implementation seems to be correct. However, I think we need to detach the z_star_p for better results. Please refer to #13 .