we can get a trained wandb log curve like the following ,
and my question is why disc_reward_mean converges to near zero but mean reward go to high,after convergence.
and what's the meaning of disc_reward_mean , is it represent the amp style learning not well,
What does disc_reward_mean=1 mean, and what does disc_reward_mean=0 mean? Does a value closer to 0 indicate that the AMP learning style is more similar? Why does the video GIF effect look good after 24,000 iterations in Isaac Gym, but disc_reward_mean has already approached 0? Should disc_reward_mean be better the closer it is to 0, or the closer it is to 1?
and the related code in amp_agent.py:
@ZhengyiLuo ,Hello,by train g1 with command:
python phc/run_hydra.py project_name=Robot_IM robot=unitree_g1 env=env_im_g1_phc env.motion_file=sample_data/0-DanceDB_20120807_CliodelaVara_Clio_Haniotikos_C3D_poses.pkl learning=im_pnn_big exp_name=unitree_g1_DanceDB_20120807_CliodelaVara_Clio_Haniotikos_C3D_poses sim=robot_sim control=robot_control learning.params.network.space.continuous.sigma_init.val=-1.7
we can get a trained wandb log curve like the following ,
and my question is why disc_reward_mean converges to near zero but mean reward go to high,after convergence. and what's the meaning of disc_reward_mean , is it represent the amp style learning not well,
What does disc_reward_mean=1 mean, and what does disc_reward_mean=0 mean? Does a value closer to 0 indicate that the AMP learning style is more similar? Why does the video GIF effect look good after 24,000 iterations in Isaac Gym, but disc_reward_mean has already approached 0? Should disc_reward_mean be better the closer it is to 0, or the closer it is to 1? and the related code in amp_agent.py:
and related code in amp_players.py