adityab / CrossQ

Official code release for "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity"
http://aditya.bhatts.org/CrossQ
Other
57 stars 4 forks source link

nan values in networks #8

Open JankowskiChristopher opened 7 months ago

JankowskiChristopher commented 7 months ago

Hello, When running the code on deepmind/pendulum-swingup the training crashes as the action becomes nan. I attach stack trace below (I added some more logging to catch exactly which part of the agent produces nan action, the original error was later when interacting with the environment, but the cause is here). I believe that more envs share this problem as in my previous runs I also experienced this - happened mostly for dog tasks, but as I was using my custom wrapper instead of shimmy I thought that maybe it had been some problem with my wrapper. Now it happens with shimmy so it is not the case of the wrapper but probably some instabilities (maybe with BatchNorm?).

237 Traceback (most recent call last):
 238   File "/home/src/crossq/train.py", line 264, in <module>
 239     model.learn(total_timesteps=total_timesteps, progress_bar=True, callback=callback_list)
 240   File "/home/src/crossq/sbx/sac/sac.py", line 187, in learn
 241     return super().learn(
 242            ^^^^^^^^^^^^^^
 243   File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 312, in learn
 244     rollout = self.collect_rollouts(
 245               ^^^^^^^^^^^^^^^^^^^^^^
 246   File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 541, in collect_rollouts
 247     actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
 248                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 249   File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 373, in _sample_action
 250     unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
 251                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 252   File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/base_class.py", line 555, in predict
 253     return self.policy.predict(observation, state, episode_start, deterministic)
 254            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 255   File "/home/src/crossq/sbx/common/policies.py", line 64, in predict
 256     actions = self._predict(observation, deterministic=deterministic)
 257               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 258   File "/home/src/crossq/sbx/sac/policies.py", line 482, in _predict
 259     self.debug_log_action(observation, action, "_predict")
 260   File "/home/src/crossq/sbx/sac/policies.py", line 531, in debug_log_action
 261     raise ValueError("Action is None")
 262 ValueError: Action is None

When the error happens I added printing the state of the actor and the observation. nan values are mostly present in BatchRenorm:

Observations:  [[-0.98452299 -0.17525546 -0.13700339]]
Actor state:  ActorTrainState(step=Array(72082, dtype=int32, weak_type=True), apply_fn=<bound method Module.apply of Actor(
    # attributes
    net_arch = [256, 256]
    action_dim = 1
    batch_norm_momentum = 0.99
    log_std_min = -20
    log_std_max = 2
    use_batch_norm = True
    bn_mode = 'brn_actor'
)>, params={'BatchRenorm_0': {'bias': Array([nan, nan, nan], dtype=float32), 'scale': Array([nan, nan, nan], dtype=float32)}, 'BatchRenorm_1': {'bias': Array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32), 'scale': Array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32)}, 'BatchRenorm_2': {'bias': Array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32), 'scale': Array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32)}, 'Dense_0': {'bias': Array([        nan,         nan, -0.72661287,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan, -0.7797777 ,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan, -0.81983244,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
       -0.65440995,         nan, -0.5785902 ,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan, -0.3541636 ,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan], dtype=float32), 'kernel': Array([[        nan,         nan, -0.14071447,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,  0.22611286,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan, -0.01891099,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
         0.11054939,         nan,  0.04355304,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan, -0.9748605 ,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan],
       [        nan,         nan,  0.07693207,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,  0.05971847,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan, -0.07279737,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
         0.06947228,         nan,  0.04604982,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan, -0.05010498,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan],
       [        nan,         nan, -0.03049578,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan, -0.00323228,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,  0.0017229 ,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
        -0.02165468,         nan, -0.02158494,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,  0.00452778,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan]], dtype=float32)}, 'Dense_1': {'bias': Array([-3.55079502e-01, -2.61681288e-01, -3.62839073e-01, -1.15681604e-01,
       -5.08833826e-01, -1.33646116e-01,             nan, -1.64892986e-01,
       -9.87671614e-02, -3.30210567e-01,  6.11294284e-02, -2.26123795e-01,
       -2.73534119e-01, -3.34397793e-01,             nan, -8.75263959e-02,
       -1.58562064e-01, -3.51377517e-01, -1.74645379e-01, -8.94286670e-03,
       -1.91893145e-01, -1.28213629e-01,  2.03128159e-02, -2.56696284e-01,
       -1.50657192e-01, -3.45063061e-01, -2.13366076e-01, -1.69571996e-01,
       -3.34517241e-01, -3.00842196e-01, -1.06576160e-01, -1.35408074e-01,
       -6.20634668e-02, -5.48866615e-02, -2.52332807e-01, -1.78462148e-01,
       -2.34845892e-01, -1.56766266e-01, -4.78359222e-01, -1.16198920e-01,
       -1.25731722e-01, -2.61006474e-01,             nan, -6.05887733e-02,
       -2.15052500e-01,             nan, -1.48657292e-01, -3.27274710e-01,
        1.07243955e-01, -1.11210242e-01, -3.31136845e-02, -5.49518578e-02,
       -2.12549612e-01, -2.13353574e-01, -1.78537995e-01, -2.18994096e-02,
       -7.21647069e-02, -1.74253643e-01, -3.13391834e-01,  2.16715410e-02,
       -1.14866629e-01, -4.00419235e-01, -2.60464311e-01, -3.07593644e-01,
                   nan, -2.45736688e-01, -1.73763752e-01, -6.66186884e-02,
        1.08856119e-01,             nan, -2.16983825e-01,  2.44164586e-01,
                   nan,             nan, -3.26141238e-01, -8.73360708e-02,
       -3.75555217e-01,             nan, -3.91870797e-01, -2.39072606e-01,
       -1.24068327e-01, -4.32559103e-01,  2.30513979e-02,             nan,
       -2.23912150e-01, -2.17534795e-01, -1.92928210e-01, -1.64950922e-01,
                   nan, -3.08977306e-01, -3.68163049e-01,  2.81006261e-03,
       -5.07392526e-01, -1.65657967e-01, -3.15613002e-01, -1.74545765e-01,
       -2.78588176e-01, -4.34532404e-01, -2.61619866e-01, -1.43855408e-01,
                   nan,             nan, -3.72981817e-01, -1.94371045e-01,
        1.83636006e-02, -4.24602851e-02, -8.58307257e-02, -2.71321237e-01,
       -1.97004348e-01, -4.95876729e-01, -4.74496722e-01,             nan,
       -3.33254598e-02, -4.79034781e-01, -2.55109280e-01, -1.87851325e-01,
       -3.39175999e-01, -4.80552763e-01, -4.50025231e-01, -1.03966720e-01,
       -7.74463296e-01, -5.16545363e-02,  1.01213539e-02,             nan,
       -1.24744892e-01, -2.21584707e-01, -2.19108924e-01, -4.01318192e-01,
       -2.04100892e-01, -2.66580433e-01, -7.59028137e-01,             nan,
       -2.50042826e-01, -4.02819782e-01, -2.02461675e-01, -3.39741558e-01,
       -5.28345779e-02, -8.42932388e-02, -1.62568614e-01,  1.58098206e-01,
       -1.10761724e-01,  2.35181837e-03, -3.26542675e-01,             nan,
        5.05282357e-03, -1.25751108e-01,             nan, -3.03706586e-01,
                   nan, -2.41995305e-01, -2.53088415e-01, -2.43461326e-01,
       -2.04102136e-02, -1.84795737e-01, -2.18806162e-01,             nan,
       -2.36812025e-01, -1.80641860e-01, -3.41657400e-01, -3.14457595e-01,
       -2.63056546e-01, -3.97427410e-01, -2.54380584e-01,             nan,
                   nan, -1.74986079e-01, -2.74913579e-01, -1.29359856e-01,
       -3.59678119e-02,  3.35261613e-01, -8.78777653e-02, -5.04442751e-01,
       -4.00152236e-01, -2.42632881e-01,             nan, -4.05929804e-01,
       -2.45563030e-01, -1.88916773e-01, -2.40435839e-01, -1.00784957e-01,
                   nan,             nan, -3.35613132e-01, -1.17802337e-01,
                   nan,             nan, -2.27645561e-01,             nan,
       -3.03044826e-01,  5.99465857e-04, -1.56689212e-01, -1.43252518e-02,
       -1.03414640e-01, -3.61972488e-02, -2.86053956e-01,  1.54133691e-02,
       -2.91877747e-01, -4.44084078e-01, -1.14257067e-01, -3.59545112e-01,
       -1.86518461e-01, -4.90693688e-01, -1.78244472e-01, -4.35604304e-01,
       -1.25659660e-01, -1.01315916e-01, -1.45916626e-01, -2.43625432e-01,
       -1.30847663e-01,             nan, -1.70976147e-01, -1.98871285e-01,
                   nan,  3.85484435e-02, -3.26892465e-01, -2.91502178e-01,
                   nan, -1.78116813e-01, -1.97384760e-01, -2.32053742e-01,
       -2.82236040e-01, -1.08087726e-01, -4.09883052e-01, -5.29915988e-01,
       -3.24332803e-01, -1.00257874e-01,             nan,             nan,
                   nan, -2.54310161e-01,             nan, -5.29110789e-01,
       -2.61053085e-01, -5.08699298e-01, -2.21153900e-01, -5.59086382e-01,
       -2.46261105e-01,             nan,             nan, -3.05840541e-02,
       -2.34860018e-01, -3.22149009e-01, -3.99790168e-01, -2.31906787e-01,
       -3.94329689e-02,  8.35715458e-02, -2.45865479e-01, -4.13744181e-01,
                   nan, -4.18445647e-01,             nan, -3.28062594e-01,
                   nan, -4.72936690e-01, -1.81261748e-01,  1.46970540e-01],      dtype=float32), 'kernel': Array([[ 4.8442278e-02, -2.5370871e-04,  1.9979328e-01, ...,
         1.5068804e-01,  3.6767788e-02,  1.3192339e-01],
       [-6.7832903e-03, -5.0304595e-02, -2.1431591e-01, ...,
        -1.1181718e-01,  2.0614813e-01,  1.2734850e-01],
       [ 1.3255176e-01,  5.4206248e-02,  1.9638033e-01, ...,
        -5.9157098e-03, -1.5652535e-02, -5.7662982e-03],
       ...,
       [ 2.7093706e-03, -4.8780489e-01, -1.7505699e-01, ...,
         7.1161285e-02,  2.8860131e-02,  5.7024822e-02],
       [ 8.8524841e-02, -6.1251257e-02, -2.8650817e-02, ...,
        -9.4492398e-02,  2.4803801e-01,  7.7640779e-02],
       [-1.5573186e-01, -1.6367893e-01, -1.5592015e-01, ...,
        -1.1927266e-01, -2.0962511e-01, -9.1291368e-02]], dtype=float32)}, [skipped many lines]

        Action:  [[nan]]

The log is not complete as it has more than 100KB in size, so I attach just the beginning.

JankowskiChristopher commented 7 months ago

@adityab @danielpalen in order to reproduce this error I provide more information below. Tested tasks and seeds that crashed:

Wandb charts for dog-stand seed 0 (training crashed after 400k steps): image image

Action values were nan and nan values were present mostly in BatchRenorm layers, but also in some dense layers - similar to the log above with pendulum-swingup.