NVlabs / CALM

Other
527 stars 57 forks source link

Using CALM with a different humanoid.xml #16

Open VineetTambe opened 1 month ago

VineetTambe commented 1 month ago

I am trying setup training with a different mjcx/humnoid.xml file however I am facing a lot of dimension issues and observation space issues if I just replace the asset_file.

What files and changes do I have to do in order for the repo to work on a custom humanoid.xml with retargetted motions

Edit 1: I have a humanoid with the 33 nodes in the SkeletonTree and a matching retargetted mocap. But don't want to actuate all of the joints - only want to actuate a subset similar to that of the AMP humanoid. Is there a way I can extend the current repo to match the above?

Edit 2: Can you also elaborate on how the following vars in the observation space is constructed?

self._dof_obs_size = 72
self._num_obs = 1 + 15 * (3 + 6 + 3 + 3) - 3
tesslerc commented 1 month ago

num_obs : height + num_bodies * (pos + rot + vel + ang_vel) - root_pos height is single dim. num_bodies is 15 for the amp humanoid. for each body part the position is 3dim, rotation is in 6d, vel and angular velocity in 3d. finally the root pos is removed, so that's 3 dims.

For dof_obs_size you can see the dof_to_obs function. Or alternatively run the function and see the expected dimensions.

tesslerc commented 1 month ago

If you don't want to actuate a joint, I would try to set the corresponding entry in the action vector to 0.

VineetTambe commented 1 month ago

A follow up to the above - turns out I might have been running it with the wrong config

after running it with

python calm/run.py --task HumanoidAMPGetup --cfg_env calm/data/cfg/humanoid_calm_sword_shield_getup.yaml --cfg_train calm/data/cfg/train/rlg/custom_calm_beta.yaml --motion_file calm/data/motions/beta_npy/beta_07_01_cmu4.npy --headless 

In the above command I have replaced the AMP Humanoid .xml with my custom humanoid and replaced the motion with my custom retargetted data.

But I end up getting this error:

Traceback (most recent call last):
  File "calm/run.py", line 274, in <module>
    main()
  File "calm/run.py", line 268, in main
    runner.run(vargs)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/rl_games/torch_runner.py", line 139, in run
    self.run_train()
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/rl_games/torch_runner.py", line 125, in run_train
    agent.train()
  File "/home/vineet/1x/CALM/calm/learning/common_agent.py", line 120, in train
    train_info = self.train_epoch()
  File "/home/vineet/1x/CALM/calm/learning/calm_agent.py", line 200, in train_epoch
    batch_dict = self.play_steps()
  File "/home/vineet/1x/CALM/calm/learning/calm_agent.py", line 85, in play_steps
    res_dict = self.get_action_values(self.obs, self._calm_latents, self._rand_action_probs)
  File "/home/vineet/1x/CALM/calm/learning/calm_agent.py", line 164, in get_action_values
    res_dict = self.model(input_dict)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/1x/CALM/calm/learning/calm_models.py", line 50, in forward
    result = super().forward(input_dict)
  File "/home/vineet/1x/CALM/calm/learning/amp_models.py", line 51, in forward
    result = super().forward(input_dict)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/rl_games/algos_torch/models.py", line 229, in forward
    distr = torch.distributions.Normal(mu, sigma)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/torch/distributions/normal.py", line 56, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/vineet/miniconda3/envs/CALM/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1024, 33)) of distribution Normal(loc: torch.Size([1024, 33]), scale: torch.Size([1024, 33])) to satisfy the constraint Real(), but found invalid values:
tensor([[-0.0145,  0.1211,  0.0497,  ..., -0.0845,  0.0583, -0.0713],
        [ 0.0640,  0.0314,  0.0291,  ...,  0.0107, -0.0104, -0.0126],
        [ 0.0036,  0.0790,  0.0056,  ...,  0.0328,  0.0317, -0.0004],
        ...,
        [ 0.0726,  0.0994,  0.0919,  ..., -0.0465,  0.0093, -0.0204],
        [-0.0061,  0.1919,  0.0032,  ..., -0.0424, -0.0283, -0.0463],
        [-0.0562,  0.0312,  0.0701,  ...,  0.0411, -0.0324, -0.0607]],
       device='cuda:0')

Any clue as to what might be the issue here? My sigmas are set to non trainable and constant.


Edit1: It seems that the way HumanoidAMPGetup calculates "fall state" might be inducing the instability. If I understand the implementation correctly - the fall state is obtained by randomly initializing the humanoid and simulating 150 sim steps to obtain a final "fall configuraion" to which actors are reset at random during training. This might be the reason why isaac gym gives NaN? [ref]

tesslerc commented 1 month ago

From my experience that's usually when you have NANs either in your model weights or in your inputs.

Does it work correctly without the added changes and with the default AMP humanoid?

VineetTambe commented 1 month ago

So here's what's weird to me is -

The training runs without any issues if I use the following configs:

  1. default CALM configs with human sword shield with HumanoidAMPGetup env
  2. CALM configs with AMP humanoid and humanoid.yaml for environment and calm_humanoid.yaml as training config with HumanoidAMP
  3. CALM configs with my custom humanoid.xml and humanoid.yaml for environment and calm_humanoid.yaml as training config with HumanoidAMP

It crashes when I use: CALM configs with my custom humanoid.xml and humanoid_calm_sword_shield_getup.yaml for environment and calm_humanoid.yaml as training config with HumanoidAMPGetup.

I have ensured by explicitly checking the inputs i.e. observations are always non nan values before they are returned.

Edit1: After experimenting for a while - it seems that the random fall initialization does not play well with isaac gym as some joints may be initialized to a completely invalid state and is very finicky and fails and crashes after a random number of iterations. Specifically this var self._rigid_body_pos has one of the rows as NaN's.

Thanks again for all the help!

tesslerc commented 1 month ago

Are the tensors generated in https://github.com/NVlabs/CALM/blob/4f6bdb9d6536d3075b83bcfc8e066c331700faa0/calm/env/tasks/humanoid_amp_getup.py#L65 ok? Are self._fall_root_states and self._fall_dof_pos without any NANs?

VineetTambe commented 1 month ago

I managed to figure out the issue! It was caused due incorrect stiffness and damping params - the CALM codebase expects them to be in the .xml itself, my xml did not specify those - adding them to the code seemed to have solved the issue with NaN's.

VineetTambe commented 1 month ago

A follow up question on this -

  1. Is there any signal (training curves / reward curves) that I should look for in order to get some signs of life on training? It is a bit too inefficient to wait for 13-14 hours to see if the training is succeeding?
  2. I was monitoring the mean episode length / itr curve but given that I am using only a single .npy file from the cmu opensource dataset, how long should I expect my episode length to be? The .npy file is a retargetted speed walk from the 07_01.fbx which is about 1.5 ish seconds in length?

Currently the mean episode length I get is around 15-20 by itter 5k. I get a similar range if I run the default command:

python calm/run.py --task HumanoidAMP --cfg_env calm/data/cfg/humanoid_calm_sword_shield.yaml --cfg_train calm/data/cfg/train/rlg/amp_humanoid.yaml --motion_file calm/data/motions/amp_humanoid_walk.npy --headless  --track

The command I am testing on is:

python calm/run.py --task HumanoidAMPGetup --cfg_env calm/data/cfg/humanoid_calm_sword_shield.yaml --cfg_train calm/data/cfg/train/rlg/calm_custom_humanoid.yaml --motion_file calm/data/motions/custom_humanoid_walk.npy --headless  --track

However when I run the default llc training for sword shield humanoid:

python calm/run.py --task HumanoidAMPGetup --cfg_env calm/data/cfg/humanoid_calm_sword_shield_getup.yaml --cfg_train calm/data/cfg/train/rlg/calm_humanoid.yaml --motion_file calm/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless  --track

I get about 200 ep mean length after 2k itters

tesslerc commented 1 month ago

Episodes should be up to 300 frames and terminate early the agent falls down. It should relatively quickly learn to get up and the episode length should spike upwards. Over time it reaches ~290, which means it learns to execute commands relatively stable without falling very often.

Other metrics for learning is to track the discriminator reward. However, since we are training with a discriminative objective, it is a bit tricky (like with the entire GAN line of work) to track performance. What I find works well in these types of models (discriminative-based) is to periodically visualize the saved model. With the sword and shield agent -- you see a point where it stands up. Then it typically learns to turn and walk around. Then over more time it starts to learn the more complex skills such as sword attacks.

I am not sure what to expect with your model, as it is both a different structured humanoid and you seem to have changed some control parameters.

VineetTambe commented 4 weeks ago

Okay - after fixing the XLM and motion_lib - I am getting an ep mean length of about 250 which is still less than the expected 290. There still seems to be something that I might be doing incorrectly. Is there anything else I could do like varying params of training like - task_reward_w and disc_reward_w? [ref] currently I have set the following values for my training - referring to amp_humanoid_task.yaml in the training configs

    task_reward_w: 0.5
    disc_reward_w: 0.1
    conditional_disc_reward_w: 1.0

Could you shed light on the intuition behind tuning these values?

tesslerc commented 4 weeks ago

It depends what you're trying to solve. The LLC policy in CALM (and also ASE) is typically trained without a task reward. The task is an environment-given reward, for example a reward for following a provided path.

As you can see here: https://github.com/NVlabs/CALM/blob/4f6bdb9d6536d3075b83bcfc8e066c331700faa0/calm/data/cfg/train/rlg/calm_humanoid.yaml#L115 the default parameters for CALM only use the conditional discriminator reward.

From my experience combining the conditional and unconditional discriminator doesn't always work well. The unconditional discriminator attempts to push the controller towards the "average" data distribution, whereas the conditional one pushes it to match the state distribution for the current conditioned motions. The two may be combined in a smart way, by providing discriminative rewards in the transition periods between motions, but we have not attempted this and it is mostly speculation.

VineetTambe commented 3 weeks ago

Thanks for the insight. I am now able to get a episode length of about 250-280 when training with my custom xml. But weirdly enough the policy learns to walk on it's toes instead of having it's foot flat. At first I thought this might be due to some inaccuracies in the reference data and I regenerated the reference trajectories by updating the root_height_offset in the retargetting script. However, changing that does not seem to have any effect on the behavior.

Do you have any clue as to why this might be happening? Is there a hard coded reference height somewhere in the code base?

Edit: Playing around env params rootHeightObs and refined the data to ensure that the reference has a flat foot - however I still see that training results in the robot learning to walk on it's toes.

tesslerc commented 1 week ago

I don't recall any offsets for the reference motions. For example, when training on the sword and shield dataset the character does not learn to tip-toe.

VineetTambe commented 6 days ago

Yes, maybe it's an issue with the way I have configured my xml - Could you point me where is the actual "discriminator loss" being calculated - a workaround which I would like to try is to remove the motion matching loss with respect to the ankle joints and the foot I hope doing so might mitigate the peculiar "tip-toe" behaviour.