Closed tjnkyqcy closed 3 months ago
No the actor critic part is an enhancement. In the original paper they use policy reinforce algorithm. But results were better with actor critic algorithm.
If you want the original version you can look at original-paper branch (but I don't remember if it was working). With no update on your side can I consider it as resolved and close the issue?
Where is the ORIGINAL-PAPER branch you mentioned, sorry I couldn't find it, could you provide me with a link, thanks.
Also, is your code based on the paper with new modifications, the code involves A2C-like strategies that don't seem to be presented in the paper, which is a bit unclear to me. I hope you can help.