Open JankowskiChristopher opened 7 months ago
@adityab @danielpalen in order to reproduce this error I provide more information below. Tested tasks and seeds that crashed:
Wandb charts for dog-stand seed 0 (training crashed after 400k steps):
Action values were nan
and nan
values were present mostly in BatchRenorm layers, but also in some dense layers - similar to the log above with pendulum-swingup.
Hello, When running the code on deepmind/pendulum-swingup the training crashes as the action becomes
nan
. I attach stack trace below (I added some more logging to catch exactly which part of the agent producesnan
action, the original error was later when interacting with the environment, but the cause is here). I believe that more envs share this problem as in my previous runs I also experienced this - happened mostly fordog
tasks, but as I was using my custom wrapper instead of shimmy I thought that maybe it had been some problem with my wrapper. Now it happens withshimmy
so it is not the case of the wrapper but probably some instabilities (maybe with BatchNorm?).When the error happens I added printing the state of the actor and the observation.
nan
values are mostly present inBatchRenorm
:The log is not complete as it has more than 100KB in size, so I attach just the beginning.