Closed Dastyn closed 3 years ago
Hey @Dastyn, ugh NaN's can be pretty crappy to debug. First thing I'd check is if there's any logs on the C# side saying that there are NaN observations. And check to see that the observations and rewards are all of reasonable size (around -1 to 1, and no huge positive and negative values).
Also, from your plots is Curiosity or GAIL reward hitting NaN first? I suspect it might be coming from one of those modules. Since Curiosity doesn't work too well with SAC anyways, turning that off might help.
Hi @ervteng, thanks for your reply.
Wrt NaN values in observations: I've already verified that no error-prone values were generated by behaviors, observations, or rewards. All state observations are properly Mathf.Clamp'ed to ensure that they are between [-1f, 1f] (however, by design, they already belong to the correct interval).
About the turning point of NaN: Impossible to say at this point: NaN appearing between two (1000-large) checkpoints for the two measurements.
I'll take your advice and turn Curiosity off in the next tries. Thanks again.
Definitely difficult to investigate. Then closing the issue.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi,
I'm facing a NaN received by OnActionReceived() during training and inference. After a certain amount of steps, for instance during the learning, the log displays:
Note, to clarify a bit: in my code, there is a call to AddReward inside the OnActionReceived function body. The value added to the reward, there, is partly based on the values of act[]. As act[] is apparently full of NaN after 771k steps, an error is raised after calling AddReward. This is the corresponding callstack:
These are the observation_space and action_space:
Inputs (2) visual_observation_0 shape: (-1,50,50,12) vector_observation shape: (-1,1,1,9)
Outputs (3) policy/concat/concat shape: (-1,-1,-1,-1) action shape: (-1,-1,-1,-1) action_probs shape: (-1,-1,-1,-1)
The yaml configuration file is as follows:
TensorBoard shows the following graphs:
Question: Something happens around 771k steps and I would like to understand where it comes from (my guess is that something went wrong in sac processing ... but where to look for?). Please, could you give some hints and especially how to instrument the sac code to investigate further?
Thanks in advance!
Environment: