LeCAR-Lab / human2humanoid

[IROS 2024] Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation. [CoRL 2024] OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning
https://omni.human2humanoid.com/
196 stars 8 forks source link

Question about Training Details and Efficiency #10

Closed xuai05 closed 3 days ago

xuai05 commented 4 days ago

I have successfully replicated the training setup using your code, but I would like to ask for some clarification on specific training details, particularly related to training efficiency. I am running the teacher model on an A5000 server with the following training command:

python legged_gym/scripts/train_hydra.py --config-name=config_teleop task=h1:teleop run_name=OmniH2O_TEACHER env.num_observations=913 env.num_privileged_obs=990 motion.teleop_obs_version=v-teleop-extend-max-full motion=motion_full motion.extend_head=True num_envs=4096 asset.zero_out_far=False asset.termination_scales.max_ref_motion_distance=1.5 sim_device=cuda:0 motion.motion_file=resources/motions/h1/amass_phc_filtered.pkl rewards=rewards_teleop_omnih2o_teacher rewards.penalty_curriculum=True rewards.penalty_scale=0.5

The training has taken 3 days to reach iteration 83,000, but in your setting, the parameter max_iteration is set to 1,000,000. Does this imply that a complete training run will take about a month to finish?

I would really appreciate it if you could provide some insights into the expected time required for full training and whether there are any recommended adjustments for improving training efficiency, especially when running on an A5000 GPU.

TairanHe commented 3 days ago

Hi iteration 83000 should give you resonalble results. The reason I set it max iter as 1000000 is that I don't want the script to auto-stop.

xuai05 commented 3 days ago

Is it correct to set the reward parameter to rewards_teleop_omnih2o_teacher during the training of the H2O and OmniH2O student models? After evaluating the H2O teacher model at 83,000 iterations, I obtained the following results: image

After evaluating the H2O model at 125,000 iterations, I obtained the following results:

image

The results did not meet those reported in the original paper. Is this outcome normal?

TairanHe commented 3 days ago

This is weird. what num_envs are you using?

xuai05 commented 3 days ago

num_envs is 4096. My training command is python legged_gym/scripts/train_hydra.py --config-name=config_teleop task=h1:teleop run_name=OmniH2O_TEACHER env.num_observations=913 env.num_privileged_obs=990 motion.teleop_obs_version=v-teleop-extend-max-full motion=motion_full motion.extend_head=True num_envs=4096 asset.zero_out_far=False asset.termination_scales.max_ref_motion_distance=1.5 sim_device=cuda:0 motion.motion_file=resources/motions/h1/amass_phc_filtered.pkl rewards=rewards_teleop_omnih2o_teacher rewards.penalty_curriculum=True rewards.penalty_scale=0.5

my env_cfg.json and train_cfg.json is as follows: env_cfg.json train_cfg.json

My training environment is torch 1.11.0+cu113 torchaudio 0.11.0+cu113 torchgeometry 0.1.2 torchmetrics 1.4.3 torchvision 0.12.0+cu113 rsl_rl 1.0.2 poselib 0.0.42 phc 1.0.0 legged_gym 1.0.0 isaacgym 1.0rc4

we train the method in the 20.04.1-Ubuntu. The CPU is Hygon C86 7151 16-core Processor

TairanHe commented 2 days ago

The command looks correct. Could you share your eval script?

xuai05 commented 2 days ago

I modified the training file so that it calls the runner.load() and runner.eval() function as follows: ` @hydra.main( version_base=None, config_path="../cfg", config_name="config", ) def train(cfg_hydra: DictConfig) -> None: cfg_hydra = EasyDict(OmegaConf.to_container(cfg_hydra, resolve=True)) cfg_hydra.physics_engine = gymapi.SIM_PHYSX env, env_cfg = task_registry.make_env_hydra(name=cfg_hydra.task, hydra_cfg=cfg_hydra, env_cfg=cfg_hydra) ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=cfg_hydra.task, args=cfg_hydra, train_cfg=cfg_hydra.train) log_dir = ppo_runner.log_dir

env_cfg_dict = helpers.class_to_dict(env_cfg)
train_cfg_dict = helpers.class_to_dict(train_cfg)
del env_cfg_dict['physics_engine']
# Save cfgs

os.makedirs(log_dir, exist_ok=True)
import json
with open(os.path.join(log_dir, 'env_cfg.json'), 'w') as f:
    json.dump(env_cfg_dict, f, indent=4)
with open(os.path.join(log_dir, 'train_cfg.json'), 'w') as f:
    json.dump(train_cfg_dict, f, indent=4)
if cfg_hydra.use_wandb:
    run_id = wandb.util.generate_id()
    run = wandb.init(name=cfg_hydra.task, config=cfg_hydra, id=run_id, dir=log_dir, sync_tensorboard=True)
    wandb.run.name = cfg_hydra.run_name
ppo_runner.load("/data/wangjinwen/huxiaobo/humanoid/human2humanoid/legged_gym/logs/h1:teleop/24_10_11_19-46-05_H2O_Policy/model_125000.pt")
ppo_runner.eval()

if name == 'main': train() ` I did not make any modifications to functions eval() and load()

`

def load(self, path, load_optimizer=True):
    loaded_dict = torch.load(path, map_location=self.device)
    self.alg.actor_critic.load_state_dict(loaded_dict['model_state_dict'])
    if load_optimizer:
        self.alg.optimizer.load_state_dict(loaded_dict['optimizer_state_dict'])
    self.current_learning_iteration = loaded_dict['iter']
    return loaded_dict['infos']

def eval(self):
    info = self.run_eval_loop()
    if self.cfg.auto_negative_samping:
        self.update_training_data(info['failed_keys'])
    del self.terminate_state, self.terminate_memory, self.mpjpe, self.mpjpe_all
    return info["eval_info"]        

`