ZhengyiLuo / PHC

Official Implementation of the ICCV 2023 paper: Perpetual Humanoid Control for Real-time Simulated Avatars
https://zhengyiluo.github.io/PHC/
Other
511 stars 45 forks source link

why disc_reward_mean converges to zero after convergence. #102

Open dbdxnuliba opened 6 days ago

dbdxnuliba commented 6 days ago

@ZhengyiLuo ,Hello,by train g1 with command:

python phc/run_hydra.py project_name=Robot_IM robot=unitree_g1 env=env_im_g1_phc env.motion_file=sample_data/0-DanceDB_20120807_CliodelaVara_Clio_Haniotikos_C3D_poses.pkl learning=im_pnn_big exp_name=unitree_g1_DanceDB_20120807_CliodelaVara_Clio_Haniotikos_C3D_poses sim=robot_sim control=robot_control learning.params.network.space.continuous.sigma_init.val=-1.7

we can get a trained wandb log curve like the following , image image

111111

and my question is why disc_reward_mean converges to near zero but mean reward go to high,after convergence. and what's the meaning of disc_reward_mean , is it represent the amp style learning not well,

What does disc_reward_mean=1 mean, and what does disc_reward_mean=0 mean? Does a value closer to 0 indicate that the AMP learning style is more similar? Why does the video GIF effect look good after 24,000 iterations in Isaac Gym, but disc_reward_mean has already approached 0? Should disc_reward_mean be better the closer it is to 0, or the closer it is to 1? and the related code in amp_agent.py:

image and related code in amp_players.py image

luoye2333 commented 1 day ago

that is loss of the amp dicriminator. go to zero means good. See this paper https://bit.ly/3hpvbD6