frt03 / mxt_bench

A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation (ICLR2023)
https://arxiv.org/abs/2211.14296
12 stars 4 forks source link

AssertionError: losses/entropy_loss 0 NaN #3

Open Caiyishuai opened 1 year ago

Caiyishuai commented 1 year ago

The bugs:

I1014 21:54:03.384178 140228452884864 ppo_mlp.py:499] starting iteration 0 14.847318649291992 I1014 21:54:40.566177 140228452884864 ppo_mlp.py:541] {'eval/episode_distance': DeviceArray(8957.285, dtype=float32), 'eval/episode_reward': DeviceArray(-8957.285, dtype=float32), 'eval/episode_reward_agent1_distance': DeviceArray(-8957.285, dtype=float32), 'eval/episode_score': DeviceArray(8957.285, dtype=float32), 'eval/episode_score_agent1___distance': DeviceArray(8957.285, dtype=float32), 'eval/completed_episodes': DeviceArray(128., dtype=float32), 'eval/avg_episode_length': DeviceArray(1000., dtype=float32), 'speed/sps': 0, 'speed/eval_sps': 3443.126173938599, 'speed/training_walltime': 0, 'speed/eval_walltime': 37.17551374435425, 'speed/timestamp': 0, 'num_timesteps': 0, 'eval/avg_final_distance': DeviceArray(9.663254, dtype=float32), 'eval/success_rate': DeviceArray(0., dtype=float32)} I1014 21:54:44.653600 140228452884864 ppo_mlp.py:499] starting iteration 1 56.11674904823303 I1014 21:54:48.082901 140228452884864 ppo_mlp.py:541] {'eval/episode_distance': DeviceArray(8651.73, dtype=float32), 'eval/episode_reward': DeviceArray(-8651.73, dtype=float32), 'eval/episode_rewardagent1distance': DeviceArray(-8651.73, dtype=float32), 'eval/episode_score': DeviceArray(8651.73, dtype=float32), 'eval/episode_score_agent1_distance': DeviceArray(8651.73, dtype=float32), 'losses/entropy_loss': DeviceArray(nan, dtype=float32), 'losses/policy_loss': DeviceArray(nan, dtype=float32), 'losses/total_loss': DeviceArray(nan, dtype=float32), 'losses/v_loss': DeviceArray(nan, dtype=float32), 'eval/completed_episodes': DeviceArray(128., dtype=float32), 'eval/avg_episode_length': DeviceArray(1000., dtype=float32), 'speed/sps': DeviceArray(0., dtype=float32), 'speed/eval_sps': 37404.16184777793, 'speed/training_walltime': 3.1930909156799316, 'speed/eval_walltime': 40.59758901596069, 'speed/timestamp': 3.1930909156799316, 'num_timesteps': 0, 'eval/avg_final_distance': DeviceArray(9.345238, dtype=float32), 'eval/success_rate': DeviceArray(0., dtype=float32)} Traceback (most recent call last): File "/home/caiyishuai/workTable/mxt_bench/mxt_bench/train_ppo_mlp.py", line 172, in app.run(main) File "/home/caiyishuai/systems/anaconda3/envs/brax_env/lib/python3.8/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/caiyishuai/systems/anaconda3/envs/brax_env/lib/python3.8/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/home/caiyishuai/workTable/mxt_bench/mxt_bench/train_ppo_mlp.py", line 136, in main inferencefn, params, = ppo_mlp.train( File "/home/caiyishuai/workTable/mxt_bench/mxt_bench/algo/ppo_mlp.py", line 543, in train progress_fn(num_timesteps, metrics, None) File "/home/caiyishuai/workTable/mxt_bench/mxt_bench/brax/brax/experimental/braxlines/experiments/init__.py", line 270, in progress assert not np.isnan(v), f'{key} {num_steps} NaN' AssertionError: losses/entropy_loss 0 NaN

I don't konw how to deal with it? Could you please help me?