Open gianlucadest opened 3 years ago
Hello,
I tried to test the algorithm on CartPole-v1 and, depending on the update_timestep parameter, it crashes due to NaNs in the KL divergence. Is this a bug or is it just a sensitive parameter?
update_timestep = 100 # leads to occasional crashes update_timestep = 200 # lesser crashes than with 100 update_timestep = 500 # Is stable
Hi Gianluca, I'm currently very busy with my PhD application, and I have almost forgotten this project. Your issue seems to be general, you could try to debug it yourself or with your friends.
Hello, I tried to test the algorithm on CartPole-v1 and, depending on the update_timestep parameter, it crashes due to NaNs in the KL divergence. Is this a bug or is it just a sensitive parameter? update_timestep = 100 # leads to occasional crashes update_timestep = 200 # lesser crashes than with 100 update_timestep = 500 # Is stable
Hi Gianluca, I'm currently very busy with my PhD application, and I have almost forgotten this project. Your issue seems to be general, you could try to debug it yourself or with your friends.
Hello YYCAAA,
thank you for your answer. There seems to be an issue with your reward estimation. In the current case, your code just works with full episodes because you never call the critic network with the final state. This needs to be fixed to work properly. This will probably fix the issue.
Hello,
I tried to test the algorithm on CartPole-v1 and, depending on the update_timestep parameter, it crashes due to NaNs in the KL divergence. Is this a bug or is it just a sensitive parameter?
update_timestep = 100 # leads to occasional crashes update_timestep = 200 # lesser crashes than with 100 update_timestep = 500 # Is stable