Open yxKryptonite opened 8 months ago
Hi authors, thanks for your work! But I tried multiple sets of diffusion policy parameters but they all couldn't work? Could you please kindly guide me how to train diffusion policy on the cube-transferring task or provide any trained checkpoints? Thank you very much!
@yxKryptonite, the same question. Have you tried the command like this?
conda activate mobile export MUJOCO_GL=egl cd /home/tonyzhao/Research/act-plus-plus CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \ --task_name sim_transfer_cube_scripted \ --ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_4_regressionTest \ --policy_class Diffusion --chunk_size 32 \ --batch_size 32 --lr 1e-4 --seed 0 \ --num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000
@yxKryptonite, I also meet this question, Have you solved this problem now?
Hello,
Same question here. I trained Ziploc Slide ( we created our dataset but same task), and ACT worked well for the task. Then we tried Diffusion Policy class. The validation losses of Diffusion Policy were way much better than ACT for the training. But during the inference, Diffusion Policy did not work. Robots could not even start for the task. I will share the error later to ask. It was an error about "self.ema.averaged_model".
`
class DiffusionPolicy(nn.Module): def init(self, args_override): super().init()
self.camera_names = args_override['camera_names']
self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS
self.action_horizon = args_override['action_horizon'] # apply chunk size
self.prediction_horizon = args_override['prediction_horizon'] # chunk size
self.num_inference_timesteps = args_override['num_inference_timesteps']
self.ema_power = args_override['ema_power']
self.lr = args_override['lr']
self.weight_decay = 0
self.num_kp = 32
self.feature_dimension = 64
self.ac_dim = args_override['action_dim'] # 14 + 2
self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio
backbones = []
pools = []
linears = []
for _ in self.camera_names:
backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))
pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))
linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))
backbones = nn.ModuleList(backbones)
pools = nn.ModuleList(pools)
linears = nn.ModuleList(linears)
backbones = replace_bn_with_gn(backbones) # TODO
noise_pred_net = ConditionalUnet1D(
input_dim=self.ac_dim,
global_cond_dim=self.obs_dim*self.observation_horizon
)
nets = nn.ModuleDict({
'policy': nn.ModuleDict({
'backbones': backbones,
'pools': pools,
'linears': linears,
'noise_pred_net': noise_pred_net
})
})
nets = nets.float().cuda()
ENABLE_EMA = True
if ENABLE_EMA:
ema = EMAModel(parameters=nets, power=self.ema_power)#power=self.ema_power
else:
ema = None
self.nets = nets
self.ema = ema
# setup noise scheduler
self.noise_scheduler = DDIMScheduler(
num_train_timesteps=50,
beta_schedule='squaredcos_cap_v2',
clip_sample=True,
set_alpha_to_one=True,
steps_offset=0,
prediction_type='epsilon'
)
n_parameters = sum(p.numel() for p in self.parameters())
print("number of parameters: %.2fM" % (n_parameters/1e6,))
def configure_optimizers(self):
optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)
return optimizer
def __call__(self, qpos, image, actions=None, is_pad=None):
B = qpos.shape[0]
if actions is not None: # training time
nets = self.nets
all_features = []
for cam_id in range(len(self.camera_names)):
cam_image = image[:, cam_id]
cam_features = nets['policy']['backbones'][cam_id](cam_image)
pool_features = nets['policy']['pools'][cam_id](cam_features)
pool_features = torch.flatten(pool_features, start_dim=1)
out_features = nets['policy']['linears'][cam_id](pool_features)
all_features.append(out_features)
obs_cond = torch.cat(all_features + [qpos], dim=1)
# sample noise to add to actions
noise = torch.randn(actions.shape, device=obs_cond.device)
# sample a diffusion iteration for each data point
timesteps = torch.randint(
0, self.noise_scheduler.config.num_train_timesteps,
(B,), device=obs_cond.device
).long()
# add noise to the clean actions according to the noise magnitude at each diffusion iteration
# (this is the forward diffusion process)
noisy_actions = self.noise_scheduler.add_noise(
actions, noise, timesteps)
# predict the noise residual
noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)
# L2 loss
all_l2 = F.mse_loss(noise_pred, noise, reduction='none')
loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()
loss_dict = {}
loss_dict['l2_loss'] = loss
loss_dict['loss'] = loss
if self.training and self.ema is not None:
self.ema.step(nets)
return loss_dict
else: # inference time
To = self.observation_horizon
Ta = self.action_horizon
Tp = self.prediction_horizon
action_dim = self.ac_dim
nets = self.nets
if self.ema is not None:
nets = self.ema.averaged_model
all_features = []
for cam_id in range(len(self.camera_names)):
cam_image = image[:, cam_id]
cam_features = nets['policy']['backbones'][cam_id](cam_image)
pool_features = nets['policy']['pools'][cam_id](cam_features)
pool_features = torch.flatten(pool_features, start_dim=1)
out_features = nets['policy']['linears'][cam_id](pool_features)
all_features.append(out_features)
obs_cond = torch.cat(all_features + [qpos], dim=1)
# initialize action from Guassian noise
noisy_action = torch.randn(
(B, Tp, action_dim), device=obs_cond.device)
naction = noisy_action
# init scheduler
self.noise_scheduler.set_timesteps(self.num_inference_timesteps)
for k in self.noise_scheduler.timesteps:
# predict noise
noise_pred = nets['policy']['noise_pred_net'](
sample=naction,
timestep=k,
global_cond=obs_cond
)
# inverse diffusion step (remove noise)
naction = self.noise_scheduler.step(
model_output=noise_pred,
timestep=k,
sample=naction
).prev_sample
return naction
def serialize(self):
return {
"nets": self.nets.state_dict(),
"ema": self.ema.averaged_model.state_dict() if self.ema is not None else None,
}
def deserialize(self, model_dict):
status = self.nets.load_state_dict(model_dict["nets"])
print('Loaded model')
if model_dict.get("ema", None) is not None:
print('Loaded EMA')
status_ema = self.ema.averaged_model.load_state_dict(model_dict["ema"])
status = [status, status_ema]
return status
`
Training command: python3 imitate_episodes.py --task_name aloha_slide_exp1 --ckpt_dir C:/Users/aa/Desktop/act-main/ckpt --policy_class DiffusionPolicy --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 4 --dim_feedforward 3200 --num_epochs 200 --lr 1e-4 --seed 0
@barsm42 same question, Have you solved this problem?
@barsm42 same question, Have you solved this problem?
@woltium We are trying to solve. I trained two policies with same parameters. The only difference is "ENABLE_EMA = True" or "ENABLE_EMA = False" line. We will evaluate, and check if it gives EMAModel error during the inference.
The inference worked with "ENABLE_EMA = False" setting. But the results were not good. It seems more detailed research is needed for our side.
When "ENABLE_EMA = True" it gives error on "nets = self.ema.averaged_model" line, and the robots don't move.
@yxKryptonite Hello, I am also replicating aloha diffusion, saying that a parameter is required, then I change the model in ema=EMAMODEL (model=nets, power=self.ema.power) to parameters, After the previous error was reported, but later it was reported that EMAMODEL object no average_model, can you add a contact information for help
Hello,
Same question here. I trained Ziploc Slide ( we created our dataset but same task), and ACT worked well for the task. Then we tried Diffusion Policy class. The validation losses of Diffusion Policy were way much better than ACT for the training. But during the inference, Diffusion Policy did not work. Robots could not even start for the task. I will share the error later to ask. It was an error about "self.ema.averaged_model".
`
class DiffusionPolicy(nn.Module): def init(self, args_override): super().init()
self.camera_names = args_override['camera_names'] self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS self.action_horizon = args_override['action_horizon'] # apply chunk size self.prediction_horizon = args_override['prediction_horizon'] # chunk size self.num_inference_timesteps = args_override['num_inference_timesteps'] self.ema_power = args_override['ema_power'] self.lr = args_override['lr'] self.weight_decay = 0 self.num_kp = 32 self.feature_dimension = 64 self.ac_dim = args_override['action_dim'] # 14 + 2 self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio backbones = [] pools = [] linears = [] for _ in self.camera_names: backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False})) pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0})) linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension)) backbones = nn.ModuleList(backbones) pools = nn.ModuleList(pools) linears = nn.ModuleList(linears) backbones = replace_bn_with_gn(backbones) # TODO noise_pred_net = ConditionalUnet1D( input_dim=self.ac_dim, global_cond_dim=self.obs_dim*self.observation_horizon ) nets = nn.ModuleDict({ 'policy': nn.ModuleDict({ 'backbones': backbones, 'pools': pools, 'linears': linears, 'noise_pred_net': noise_pred_net }) }) nets = nets.float().cuda() ENABLE_EMA = True if ENABLE_EMA: ema = EMAModel(parameters=nets, power=self.ema_power)#power=self.ema_power else: ema = None self.nets = nets self.ema = ema # setup noise scheduler self.noise_scheduler = DDIMScheduler( num_train_timesteps=50, beta_schedule='squaredcos_cap_v2', clip_sample=True, set_alpha_to_one=True, steps_offset=0, prediction_type='epsilon' ) n_parameters = sum(p.numel() for p in self.parameters()) print("number of parameters: %.2fM" % (n_parameters/1e6,)) def configure_optimizers(self): optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay) return optimizer def __call__(self, qpos, image, actions=None, is_pad=None): B = qpos.shape[0] if actions is not None: # training time nets = self.nets all_features = [] for cam_id in range(len(self.camera_names)): cam_image = image[:, cam_id] cam_features = nets['policy']['backbones'][cam_id](cam_image) pool_features = nets['policy']['pools'][cam_id](cam_features) pool_features = torch.flatten(pool_features, start_dim=1) out_features = nets['policy']['linears'][cam_id](pool_features) all_features.append(out_features) obs_cond = torch.cat(all_features + [qpos], dim=1) # sample noise to add to actions noise = torch.randn(actions.shape, device=obs_cond.device) # sample a diffusion iteration for each data point timesteps = torch.randint( 0, self.noise_scheduler.config.num_train_timesteps, (B,), device=obs_cond.device ).long() # add noise to the clean actions according to the noise magnitude at each diffusion iteration # (this is the forward diffusion process) noisy_actions = self.noise_scheduler.add_noise( actions, noise, timesteps) # predict the noise residual noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond) # L2 loss all_l2 = F.mse_loss(noise_pred, noise, reduction='none') loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean() loss_dict = {} loss_dict['l2_loss'] = loss loss_dict['loss'] = loss if self.training and self.ema is not None: self.ema.step(nets) return loss_dict else: # inference time To = self.observation_horizon Ta = self.action_horizon Tp = self.prediction_horizon action_dim = self.ac_dim nets = self.nets if self.ema is not None: nets = self.ema.averaged_model all_features = [] for cam_id in range(len(self.camera_names)): cam_image = image[:, cam_id] cam_features = nets['policy']['backbones'][cam_id](cam_image) pool_features = nets['policy']['pools'][cam_id](cam_features) pool_features = torch.flatten(pool_features, start_dim=1) out_features = nets['policy']['linears'][cam_id](pool_features) all_features.append(out_features) obs_cond = torch.cat(all_features + [qpos], dim=1) # initialize action from Guassian noise noisy_action = torch.randn( (B, Tp, action_dim), device=obs_cond.device) naction = noisy_action # init scheduler self.noise_scheduler.set_timesteps(self.num_inference_timesteps) for k in self.noise_scheduler.timesteps: # predict noise noise_pred = nets['policy']['noise_pred_net']( sample=naction, timestep=k, global_cond=obs_cond ) # inverse diffusion step (remove noise) naction = self.noise_scheduler.step( model_output=noise_pred, timestep=k, sample=naction ).prev_sample return naction def serialize(self): return { "nets": self.nets.state_dict(), "ema": self.ema.averaged_model.state_dict() if self.ema is not None else None, } def deserialize(self, model_dict): status = self.nets.load_state_dict(model_dict["nets"]) print('Loaded model') if model_dict.get("ema", None) is not None: print('Loaded EMA') status_ema = self.ema.averaged_model.load_state_dict(model_dict["ema"]) status = [status, status_ema] return status
`
Training command: python3 imitate_episodes.py --task_name aloha_slide_exp1 --ckpt_dir C:/Users/aa/Desktop/act-main/ckpt --policy_class DiffusionPolicy --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 4 --dim_feedforward 3200 --num_epochs 200 --lr 1e-4 --seed 0 @barsm42 Hello, I am also replicating aloha diffusion, saying that a parameter is required, then I change the model in ema=EMAMODEL (model=nets, power=self.ema.power) to parameters, After the previous error was reported, but later it was reported that EMAMODEL object no average_model, can you add a contact information for help
@yxKryptonite Hello, I am also replicating aloha diffusion, saying that a parameter is required, then I change the model in ema=EMAMODEL (model=nets, power=self.ema.power) to parameters, After the previous error was reported, but later it was reported that EMAMODEL object no average_model, can you add a contact information for help
have you addressed this problem?
Hi authors,
I used your 50 demo episodes to train ACT and it worked very well, achieving success rate up to 90% on the cube-transferring task. However, after I changed the algorithm to Diffusion Policy, it turned out that diffusion policy had very low success. I tried multiple hyperparameter settings in your
commands.txt
but they all couldn't work. The results are below:(The green is ACT, while others are Diffusion Policy with different hyperparam sets)
So I wonder why that happens and could you share the best-working diffusion policy parameters? Thank you very much!