Diffusion Policy parameters

yxKryptonite commented 8 months ago

Hi authors,

I used your 50 demo episodes to train ACT and it worked very well, achieving success rate up to 90% on the cube-transferring task. However, after I changed the algorithm to Diffusion Policy, it turned out that diffusion policy had very low success. I tried multiple hyperparameter settings in your commands.txt but they all couldn't work. The results are below:

(The green is ACT, while others are Diffusion Policy with different hyperparam sets)

So I wonder why that happens and could you share the best-working diffusion policy parameters? Thank you very much!

yxKryptonite commented 8 months ago

Hi authors, thanks for your work! But I tried multiple sets of diffusion policy parameters but they all couldn't work? Could you please kindly guide me how to train diffusion policy on the cube-transferring task or provide any trained checkpoints? Thank you very much!

Wallong commented 6 months ago

@yxKryptonite, the same question. Have you tried the command like this? conda activate mobile export MUJOCO_GL=egl cd /home/tonyzhao/Research/act-plus-plus CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \ --task_name sim_transfer_cube_scripted \ --ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_4_regressionTest \ --policy_class Diffusion --chunk_size 32 \ --batch_size 32 --lr 1e-4 --seed 0 \ --num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000

LanrenzzzZ commented 5 months ago

@yxKryptonite, I also meet this question, Have you solved this problem now?

barsm42 commented 3 months ago

Hello,

Same question here. I trained Ziploc Slide ( we created our dataset but same task), and ACT worked well for the task. Then we tried Diffusion Policy class. The validation losses of Diffusion Policy were way much better than ACT for the training. But during the inference, Diffusion Policy did not work. Robots could not even start for the task. I will share the error later to ask. It was an error about "self.ema.averaged_model".

`

class DiffusionPolicy(nn.Module): def init(self, args_override): super().init()

    self.camera_names = args_override['camera_names']
    self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS
    self.action_horizon = args_override['action_horizon'] # apply chunk size
    self.prediction_horizon = args_override['prediction_horizon'] # chunk size
    self.num_inference_timesteps = args_override['num_inference_timesteps']
    self.ema_power = args_override['ema_power']
    self.lr = args_override['lr']
    self.weight_decay = 0

    self.num_kp = 32
    self.feature_dimension = 64
    self.ac_dim = args_override['action_dim'] # 14 + 2
    self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio

    backbones = []
    pools = []
    linears = []
    for _ in self.camera_names:
        backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))
        pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))
        linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))
    backbones = nn.ModuleList(backbones)
    pools = nn.ModuleList(pools)
    linears = nn.ModuleList(linears)

    backbones = replace_bn_with_gn(backbones) # TODO

    noise_pred_net = ConditionalUnet1D(
        input_dim=self.ac_dim,
        global_cond_dim=self.obs_dim*self.observation_horizon
    )

    nets = nn.ModuleDict({
        'policy': nn.ModuleDict({
            'backbones': backbones,
            'pools': pools,
            'linears': linears,
            'noise_pred_net': noise_pred_net
        })
    })

    nets = nets.float().cuda()
    ENABLE_EMA = True
    if ENABLE_EMA:
        ema = EMAModel(parameters=nets, power=self.ema_power)#power=self.ema_power
    else:
        ema = None
    self.nets = nets
    self.ema = ema

    # setup noise scheduler
    self.noise_scheduler = DDIMScheduler(
        num_train_timesteps=50,
        beta_schedule='squaredcos_cap_v2',
        clip_sample=True,
        set_alpha_to_one=True,
        steps_offset=0,
        prediction_type='epsilon'
    )

    n_parameters = sum(p.numel() for p in self.parameters())
    print("number of parameters: %.2fM" % (n_parameters/1e6,))

def configure_optimizers(self):
    optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)
    return optimizer

def __call__(self, qpos, image, actions=None, is_pad=None):
    B = qpos.shape[0]
    if actions is not None: # training time
        nets = self.nets
        all_features = []
        for cam_id in range(len(self.camera_names)):
            cam_image = image[:, cam_id]
            cam_features = nets['policy']['backbones'][cam_id](cam_image)
            pool_features = nets['policy']['pools'][cam_id](cam_features)
            pool_features = torch.flatten(pool_features, start_dim=1)
            out_features = nets['policy']['linears'][cam_id](pool_features)
            all_features.append(out_features)

        obs_cond = torch.cat(all_features + [qpos], dim=1)

        # sample noise to add to actions
        noise = torch.randn(actions.shape, device=obs_cond.device)

        # sample a diffusion iteration for each data point
        timesteps = torch.randint(
            0, self.noise_scheduler.config.num_train_timesteps, 
            (B,), device=obs_cond.device
        ).long()

        # add noise to the clean actions according to the noise magnitude at each diffusion iteration
        # (this is the forward diffusion process)
        noisy_actions = self.noise_scheduler.add_noise(
            actions, noise, timesteps)

        # predict the noise residual
        noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)

        # L2 loss
        all_l2 = F.mse_loss(noise_pred, noise, reduction='none')
        loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()

        loss_dict = {}
        loss_dict['l2_loss'] = loss
        loss_dict['loss'] = loss

        if self.training and self.ema is not None:
            self.ema.step(nets)
        return loss_dict
    else: # inference time
        To = self.observation_horizon
        Ta = self.action_horizon
        Tp = self.prediction_horizon
        action_dim = self.ac_dim

        nets = self.nets
        if self.ema is not None:
            nets = self.ema.averaged_model

        all_features = []
        for cam_id in range(len(self.camera_names)):
            cam_image = image[:, cam_id]
            cam_features = nets['policy']['backbones'][cam_id](cam_image)
            pool_features = nets['policy']['pools'][cam_id](cam_features)
            pool_features = torch.flatten(pool_features, start_dim=1)
            out_features = nets['policy']['linears'][cam_id](pool_features)
            all_features.append(out_features)

        obs_cond = torch.cat(all_features + [qpos], dim=1)

        # initialize action from Guassian noise
        noisy_action = torch.randn(
            (B, Tp, action_dim), device=obs_cond.device)
        naction = noisy_action

        # init scheduler
        self.noise_scheduler.set_timesteps(self.num_inference_timesteps)

        for k in self.noise_scheduler.timesteps:
            # predict noise
            noise_pred = nets['policy']['noise_pred_net'](
                sample=naction, 
                timestep=k,
                global_cond=obs_cond
            )

            # inverse diffusion step (remove noise)
            naction = self.noise_scheduler.step(
                model_output=noise_pred,
                timestep=k,
                sample=naction
            ).prev_sample

        return naction

def serialize(self):
    return {
        "nets": self.nets.state_dict(),
        "ema": self.ema.averaged_model.state_dict() if self.ema is not None else None,
    }

def deserialize(self, model_dict):
    status = self.nets.load_state_dict(model_dict["nets"])
    print('Loaded model')
    if model_dict.get("ema", None) is not None:
        print('Loaded EMA')
        status_ema = self.ema.averaged_model.load_state_dict(model_dict["ema"])
        status = [status, status_ema]
    return status

`

Training command: python3 imitate_episodes.py --task_name aloha_slide_exp1 --ckpt_dir C:/Users/aa/Desktop/act-main/ckpt --policy_class DiffusionPolicy --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 4 --dim_feedforward 3200 --num_epochs 200 --lr 1e-4 --seed 0

woltium commented 2 months ago

@barsm42 same question, Have you solved this problem?

barsm42 commented 2 months ago

@barsm42 same question, Have you solved this problem?

@woltium We are trying to solve. I trained two policies with same parameters. The only difference is "ENABLE_EMA = True" or "ENABLE_EMA = False" line. We will evaluate, and check if it gives EMAModel error during the inference.

The inference worked with "ENABLE_EMA = False" setting. But the results were not good. It seems more detailed research is needed for our side.

When "ENABLE_EMA = True" it gives error on "nets = self.ema.averaged_model" line, and the robots don't move.

lanlankilkil commented 1 month ago

@yxKryptonite Hello, I am also replicating aloha diffusion, saying that a parameter is required, then I change the model in ema=EMAMODEL (model=nets, power=self.ema.power) to parameters, After the previous error was reported, but later it was reported that EMAMODEL object no average_model, can you add a contact information for help

lanlankilkil commented 1 month ago

Hello,

Same question here. I trained Ziploc Slide ( we created our dataset but same task), and ACT worked well for the task. Then we tried Diffusion Policy class. The validation losses of Diffusion Policy were way much better than ACT for the training. But during the inference, Diffusion Policy did not work. Robots could not even start for the task. I will share the error later to ask. It was an error about "self.ema.averaged_model".

`

class DiffusionPolicy(nn.Module): def init(self, args_override): super().init()

    self.camera_names = args_override['camera_names']
    self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS
    self.action_horizon = args_override['action_horizon'] # apply chunk size
    self.prediction_horizon = args_override['prediction_horizon'] # chunk size
    self.num_inference_timesteps = args_override['num_inference_timesteps']
    self.ema_power = args_override['ema_power']
    self.lr = args_override['lr']
    self.weight_decay = 0

    self.num_kp = 32
    self.feature_dimension = 64
    self.ac_dim = args_override['action_dim'] # 14 + 2
    self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio

    backbones = []
    pools = []
    linears = []
    for _ in self.camera_names:
        backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))
        pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))
        linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))
    backbones = nn.ModuleList(backbones)
    pools = nn.ModuleList(pools)
    linears = nn.ModuleList(linears)

    backbones = replace_bn_with_gn(backbones) # TODO

    noise_pred_net = ConditionalUnet1D(
        input_dim=self.ac_dim,
        global_cond_dim=self.obs_dim*self.observation_horizon
    )

    nets = nn.ModuleDict({
        'policy': nn.ModuleDict({
            'backbones': backbones,
            'pools': pools,
            'linears': linears,
            'noise_pred_net': noise_pred_net
        })
    })

    nets = nets.float().cuda()
    ENABLE_EMA = True
    if ENABLE_EMA:
        ema = EMAModel(parameters=nets, power=self.ema_power)#power=self.ema_power
    else:
        ema = None
    self.nets = nets
    self.ema = ema

    # setup noise scheduler
    self.noise_scheduler = DDIMScheduler(
        num_train_timesteps=50,
        beta_schedule='squaredcos_cap_v2',
        clip_sample=True,
        set_alpha_to_one=True,
        steps_offset=0,
        prediction_type='epsilon'
    )

    n_parameters = sum(p.numel() for p in self.parameters())
    print("number of parameters: %.2fM" % (n_parameters/1e6,))

def configure_optimizers(self):
    optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)
    return optimizer

def __call__(self, qpos, image, actions=None, is_pad=None):
    B = qpos.shape[0]
    if actions is not None: # training time
        nets = self.nets
        all_features = []
        for cam_id in range(len(self.camera_names)):
            cam_image = image[:, cam_id]
            cam_features = nets['policy']['backbones'][cam_id](cam_image)
            pool_features = nets['policy']['pools'][cam_id](cam_features)
            pool_features = torch.flatten(pool_features, start_dim=1)
            out_features = nets['policy']['linears'][cam_id](pool_features)
            all_features.append(out_features)

        obs_cond = torch.cat(all_features + [qpos], dim=1)

        # sample noise to add to actions
        noise = torch.randn(actions.shape, device=obs_cond.device)

        # sample a diffusion iteration for each data point
        timesteps = torch.randint(
            0, self.noise_scheduler.config.num_train_timesteps, 
            (B,), device=obs_cond.device
        ).long()

        # add noise to the clean actions according to the noise magnitude at each diffusion iteration
        # (this is the forward diffusion process)
        noisy_actions = self.noise_scheduler.add_noise(
            actions, noise, timesteps)

        # predict the noise residual
        noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)

        # L2 loss
        all_l2 = F.mse_loss(noise_pred, noise, reduction='none')
        loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()

        loss_dict = {}
        loss_dict['l2_loss'] = loss
        loss_dict['loss'] = loss

        if self.training and self.ema is not None:
            self.ema.step(nets)
        return loss_dict
    else: # inference time
        To = self.observation_horizon
        Ta = self.action_horizon
        Tp = self.prediction_horizon
        action_dim = self.ac_dim

        nets = self.nets
        if self.ema is not None:
            nets = self.ema.averaged_model

        all_features = []
        for cam_id in range(len(self.camera_names)):
            cam_image = image[:, cam_id]
            cam_features = nets['policy']['backbones'][cam_id](cam_image)
            pool_features = nets['policy']['pools'][cam_id](cam_features)
            pool_features = torch.flatten(pool_features, start_dim=1)
            out_features = nets['policy']['linears'][cam_id](pool_features)
            all_features.append(out_features)

        obs_cond = torch.cat(all_features + [qpos], dim=1)

        # initialize action from Guassian noise
        noisy_action = torch.randn(
            (B, Tp, action_dim), device=obs_cond.device)
        naction = noisy_action

        # init scheduler
        self.noise_scheduler.set_timesteps(self.num_inference_timesteps)

        for k in self.noise_scheduler.timesteps:
            # predict noise
            noise_pred = nets['policy']['noise_pred_net'](
                sample=naction, 
                timestep=k,
                global_cond=obs_cond
            )

            # inverse diffusion step (remove noise)
            naction = self.noise_scheduler.step(
                model_output=noise_pred,
                timestep=k,
                sample=naction
            ).prev_sample

        return naction

def serialize(self):
    return {
        "nets": self.nets.state_dict(),
        "ema": self.ema.averaged_model.state_dict() if self.ema is not None else None,
    }

def deserialize(self, model_dict):
    status = self.nets.load_state_dict(model_dict["nets"])
    print('Loaded model')
    if model_dict.get("ema", None) is not None:
        print('Loaded EMA')
        status_ema = self.ema.averaged_model.load_state_dict(model_dict["ema"])
        status = [status, status_ema]
    return status

`

Training command: python3 imitate_episodes.py --task_name aloha_slide_exp1 --ckpt_dir C:/Users/aa/Desktop/act-main/ckpt --policy_class DiffusionPolicy --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 4 --dim_feedforward 3200 --num_epochs 200 --lr 1e-4 --seed 0 @barsm42 Hello, I am also replicating aloha diffusion, saying that a parameter is required, then I change the model in ema=EMAMODEL (model=nets, power=self.ema.power) to parameters, After the previous error was reported, but later it was reported that EMAMODEL object no average_model, can you add a contact information for help

brisyramshere commented 1 month ago

@yxKryptonite Hello, I am also replicating aloha diffusion, saying that a parameter is required, then I change the model in ema=EMAMODEL (model=nets, power=self.ema.power) to parameters, After the previous error was reported, but later it was reported that EMAMODEL object no average_model, can you add a contact information for help

have you addressed this problem?

MarkFzp / act-plus-plus

Diffusion Policy parameters #21