h1 dance not well when play the policy after 4500 steps train

dbdxnuliba commented 2 weeks ago

Quesition: we can see in the following gif h1 will fall down when play dance policy after 4500steps trained ,why h1 did't dance well, and what can I do if we h1 dance well,thanks

after trained h1 4500 steps following command

python phc/run_hydra.py  project_name=Robot_IM   robot=unitree_h1      env=env_im_h1_phc env.motion_file=sample_data/sample_dance_h1.pkl learning=im_pnn_big   exp_name=unitree_h1_pnn_realsim_092924 sim=robot_sim control=robot_control

play the trained result

python phc/run_hydra.py project_name=Robot_IM learning=im_pnn_big exp_name=unitree_h1_pnn_realsim_092924 epoch=-1 test=True env=env_im_h1_phc robot=unitree_h1 env.motion_file=sample_data/dance_sample_h1.pkl   env.num_envs=1  headless=False sim=robot_sim control=robot_control

phc_1_failed_4500

and the corresponding wandb curve of trained h1 dance

and this is the terminal output when play

(isaac) rob@rob:~/rl/PHC$ python phc/run_hydra.py project_name=Robot_IM learning=im_pnn_big exp_name=unitree_h1_pnn_realsim_092924 epoch=-1 test=True env=env_im_h1_phc robot=unitree_h1 env.motion_file=sample_data/dance_sample_h1.pkl   env.num_envs=1  headless=False sim=robot_sim control=robot_control 
Importing module 'gym_38' (/home/rob/rl/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/rob/rl/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 1.13.1+cu117
Device count 1
/home/rob/rl/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/rob/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Emitting ninja build file /home/rob/.cache/torch_extensions/py38_cu117/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
2024-11-11 14:08:05,702 - INFO - logger - logger initialized
MOVING MOTION DATA TO GPU, USING CACHE: False
MOVING MOTION DATA TO GPU, USING CACHE: False
MOVING MOTION DATA TO GPU, USING CACHE: False
Importing module 'rlgpu_38' (/home/rob/rl/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/linux-x86_64/rlgpu_38.so)
2024-11-11 14:08:06,888 - DEBUG - Popen(['git', 'version'], cwd=/home/rob/rl/PHC, stdin=None, shell=False, universal_newlines=False)
2024-11-11 14:08:06,900 - DEBUG - Popen(['git', 'version'], cwd=/home/rob/rl/PHC, stdin=None, shell=False, universal_newlines=False)
2024-11-11 14:08:06,913 - DEBUG - Trying paths: ['/home/rob/.docker/config.json', '/home/rob/.dockercfg']
2024-11-11 14:08:06,913 - DEBUG - No config file found
2024-11-11 14:08:07,062 - DEBUG - Setting JobRuntime:name=UNKNOWN_NAME
2024-11-11 14:08:07,062 - DEBUG - Setting JobRuntime:name=run_hydra
torch_deterministic: False
torch_deterministic: False
torch_deterministic: False
Setting seed: 0
Found checkpoint
output/HumanoidIm/unitree_h1_pnn_realsim_092924/Humanoid.pth
Started to play
/home/rob/anaconda3/envs/isaac/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
/home/rob/rl/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/torch_utils.py:16: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return torch.tensor(x, dtype=dtype, device=device, requires_grad=requires_grad)
Humanoid Weights [58.43699926137924]
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
No torque limit, set to 350
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 96.00it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 197047.84it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 96.07it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 176868.24it/s]

****************************** Current motion keys ******************************
Sampling motion: tensor([0], device='cuda:0')
0-Transitions_mocap_mazen_c3d_dance_stand_poses
*********************************************************************************

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.64it/s]
0it [00:00, ?it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 31536.12it/s]
Loaded 1 motions with a total length of 8.433s and 254 frames.
/home/rob/anaconda3/envs/isaac/lib/python3.8/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
RL device:  cuda:0
1
19
778
0
{'observation_space': Box(-inf, inf, (778,), float32), 'action_space': Box(-1.0, 1.0, (19,), float32), 'agents': 1, 'value_size': 1}
RunningMeanStd:  (778,)
build mlp: 778
build mlp: 778
build mlp: 630
build mlp: 778
build mlp: 778
build mlp: 778
sigma
critic_mlp.0.weight
critic_mlp.0.bias
critic_mlp.2.weight
critic_mlp.2.bias
critic_mlp.4.weight
critic_mlp.4.bias
critic_mlp.6.weight
critic_mlp.6.bias
critic_mlp.8.weight
critic_mlp.8.bias
critic_mlp.10.weight
critic_mlp.10.bias
value.weight
value.bias
mu.weight
mu.bias
_disc_mlp.0.weight
_disc_mlp.0.bias
_disc_mlp.2.weight
_disc_mlp.2.bias
_disc_logits.weight
_disc_logits.bias
pnn.actors.0.0.weight
pnn.actors.0.0.bias
pnn.actors.0.2.weight
pnn.actors.0.2.bias
pnn.actors.0.4.weight
pnn.actors.0.4.bias
pnn.actors.0.6.weight
pnn.actors.0.6.bias
pnn.actors.0.8.weight
pnn.actors.0.8.bias
pnn.actors.0.10.weight
pnn.actors.0.10.bias
pnn.actors.0.12.weight
pnn.actors.0.12.bias
pnn.actors.1.0.weight
pnn.actors.1.0.bias
pnn.actors.1.2.weight
pnn.actors.1.2.bias
pnn.actors.1.4.weight
pnn.actors.1.4.bias
pnn.actors.1.6.weight
pnn.actors.1.6.bias
pnn.actors.1.8.weight
pnn.actors.1.8.bias
pnn.actors.1.10.weight
pnn.actors.1.10.bias
pnn.actors.1.12.weight
pnn.actors.1.12.bias
pnn.actors.2.0.weight
pnn.actors.2.0.bias
pnn.actors.2.2.weight
pnn.actors.2.2.bias
pnn.actors.2.4.weight
pnn.actors.2.4.bias
pnn.actors.2.6.weight
pnn.actors.2.6.bias
pnn.actors.2.8.weight
pnn.actors.2.8.bias
pnn.actors.2.10.weight
pnn.actors.2.10.bias
pnn.actors.2.12.weight
pnn.actors.2.12.bias
RunningMeanStd:  (630,)
=> loading checkpoint 'output/HumanoidIm/unitree_h1_pnn_realsim_092924/Humanoid.pth'
=> loading checkpoint 'output/HumanoidIm/unitree_h1_pnn_realsim_092924/Humanoid.pth'
reward: 52.98905944824219 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0
reward: 53.031768798828125 steps: 83.0

luoye2333 commented 2 weeks ago

seems good in your gif

dbdxnuliba commented 1 week ago

seems good in your gif @luoye2333 @ZhengyiLuo @kexul No well actually, this GIF plays in a loop, and at the end of each loop, you can see the robot standing on its toes and then quickly falling over. so I want to know how to solve the problem

smilefish110 commented 3 days ago

Hello, I have the same problem. And my 4090 get error for CUDA out of memory after about 9000 training iterations. And I note that the evaluation program still has a success rate of 0 on the dance dataset. Is there a solution? Thank you for your help! @dbdxnuliba @ZhengyiLuo

ZhengyiLuo commented 3 days ago

Hi, I just realized I missed the part about learning.params.network.space.continuous.sigma_init.val=-1.7 for the H1 policy. My bad! In my training run it learned the policy after 7500 epoches.

ZhengyiLuo / PHC

h1 dance not well when play the policy after 4500 steps train #94