I choose dishwasher_close and drawer_top_close tasks for training, where in the original bigym paper, these two tasks' performance are both almost near 100% success rate on Diffusion Policy and ACT. The robobase doesn't figure out how to change the robobase_config.yaml to fit imitation learning pipeline, so I change the robobase_config.yaml according to the bigym paper. Here's my robobase_config:
defaults:
- _self_
- env: null
- method: null
- intrinsic_reward_module: null
- launch: null
- override hydra/launcher: joblib
# Universal settings
create_train_env: true
num_train_envs: 1
replay_size_before_train: 3000
num_pretrain_steps: 1100000
num_train_frames: 1100000
eval_every_steps: 10000
num_eval_episodes: 5
update_every_steps: 2
num_explore_steps: 2000
save_snapshot: false
snapshot_every_n: 10000
batch_size: 256
# Demonstration settings
demos: 200
demo_batch_size: 256 # null # If set to > 0, introduce a separate buffer for demos
use_self_imitation: true # false # When using a separate buffer for demos, If set to True, save successful (online) trajectories into the separate demo buffer
# Observation settings
pixels: false
visual_observation_shape: [84, 84]
frame_stack: 2
frame_stack_on_channel: true
use_onehot_time_and_no_bootstrap: false
# Action settings
action_repeat: 1
action_sequence: 16 # ActionSequenceWrapper
execution_length: 8 # If execution_length < action_sequence, we use receding horizon control
temporal_ensemble: true # Temporal ensemling only applicable to action sequence > 1
temporal_ensemble_gain: 0.01
use_standardization: true # Demo-based standardization for action space
use_min_max_normalization: true # Demo-based min-max normalization for action space
min_max_margin: 0.0 # If set to > 0, introduce margin for demo-driven min-max normalization
norm_obs: true
# Replay buffer settings
replay:
prioritization: false
size: 1000000
gamma: 0.99
demo_size: 1000000
save_dir: null
nstep: 3
num_workers: 4
pin_memory: true
alpha: 0.7 # prioritization
beta: 0.5 # prioritization
sequential: false
transition_seq_len: 1 # The length of transition sequence returned from sample() call. Only applicable if sequential is True
# logging settings
wandb: # weight and bias
use: true
project: ${oc.env:USER}RoboBase
name: null
tb: # TensorBoard
use: false
log_dir: /tmp/robobase_tb_logs
name: null
# Misc
experiment_name: exp
seed: 1
num_gpus: 1
log_every: 1000
log_train_video: false
log_eval_video: true
log_pretrain_every: 100
save_csv: false
hydra:
run:
dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M%S}_${hydra.job.override_dirname}
sweep:
dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M}_${hydra.job.override_dirname}
subdir: ${hydra.job.num}
Then I utilize the command to start training on diffusion policy:
But nothing works. Success rate is always zero. Besides, It seems that there's a bug in your temporal ensemble code. In the robobase/robobase/envs/wrappers/action_sequence.py, def step_sequence(), your temporal ensemble code is:
self._action_history[
self._cur_step, self._cur_step : self._cur_step + self._sequence_length
] = action
for i, sub_action in enumerate(action):
if self._temporal_ensemble and self._sequence_length > 1:
# Select all predicted actions for self._cur_step. This will cover the
# actions from [cur_step - sequence_length + 1, cur_step)
# Note that not all actions in this range will be valid as we might have
# execution_length > 1, which skips some of the intermediate steps.
cur_actions = self._action_history[:, self._cur_step]
indices = np.all(cur_actions != 0, axis=1)
cur_actions = cur_actions[indices]
# earlier predicted actions will have smaller weights.
exp_weights = np.exp(-self._gain * np.arange(len(cur_actions)))
exp_weights = (exp_weights / exp_weights.sum())[:, None]
sub_action = (cur_actions * exp_weights).sum(axis=0)
observation, reward, termination, truncation, info = self.env.step(
sub_action
)
self._cur_step += 1
if self.is_demo_env:
demo_actions[i] = info.pop("demo_action")
total_reward += reward
action_idx_reached += 1
if termination or truncation:
break
if not self.is_demo_env:
if action_idx_reached == self._execution_length:
break
It seems that even being set to temporal ensemble, the function still execute the ‘for' loop and output sub_action multiple times. And each time the sub_action is the same. So I change the code to the following code to avoid this:
self._action_history[
self._cur_step, self._cur_step : self._cur_step + self._sequence_length
] = action
for i, sub_action in enumerate(action):
if self._temporal_ensemble and self._sequence_length > 1:
# Select all predicted actions for self._cur_step. This will cover the
# actions from [cur_step - sequence_length + 1, cur_step)
# Note that not all actions in this range will be valid as we might have
# execution_length > 1, which skips some of the intermediate steps.
cur_actions = self._action_history[:, self._cur_step]
indices = np.all(cur_actions != 0, axis=1)
cur_actions = cur_actions[indices]
# earlier predicted actions will have smaller weights.
exp_weights = np.exp(-self._gain * np.arange(len(cur_actions)))
exp_weights = (exp_weights / exp_weights.sum())[:, None]
sub_action = (cur_actions * exp_weights).sum(axis=0)
observation, reward, termination, truncation, info = self.env.step(
sub_action
)
self._cur_step += 1
if self.is_demo_env:
demo_actions[i] = info.pop("demo_action")
total_reward += reward
action_idx_reached += 1
if termination or truncation:
break
if self._temporal_ensemble and self._sequence_length > 1:
break
if not self.is_demo_env:
if action_idx_reached == self._execution_length:
break
Then I launch another training but still nothing works.
Then I take a look at the eval video:
Drawer_top_close:
It seems that the robot completed the task to a certain degree, but the overall success rate is still zero and the task reward is zero, too. This is wired.
Dishwasher_close:
The robot fails to complete the task at all.
Besides I found you use DDIM for training. But as far as I know DDIM can be only used to accelerate sampling of DDPM, but the training code still relies on DDPM. I'm not a diffusion expert so I don't know if I'm wrong.
And so I tried to launch training on ACT code and it turns out:
Error executing job with overrides: ['method=act', 'env=bigym/dishwasher_close', 'replay.nstep=1']
Error in call to target 'robobase.method.act.ActBCAgent':
AttributeError("'NoneType' object has no attribute 'output_shape'")
full_key: method
I figure out that this is because the ACT code only supports image obs, where in above experiments I used state obs. So I change the pixel term in the robobase_config.yaml:
Then I want to know if there's anything wrong on the demos. But I can't run your demo replay code since my server doesn't have a monitor or GUI.
So I'm stuck here and really wish for your help. To be honest, I think your bigym is really an amazing benchmark since it's the only one I know that supports long-horizon mobile manipulation and has plenty of human demos and tasks. I believe that better maintenance on this respository will definitly enlarge the impact of this work in the community. I hope my feedback can aid you to better develop this.
Hi, @chernyadev! I follow [https://github.com/robobase-org/robobase.git]'s implementation and run the Diffusion Policy/ACT training code on bigym [https://github.com/chernyadev/bigym/tree/fix_original_demos] but even through extensive hardwork on debugging, I still can't make things work. Here's what I've tried:
I choose dishwasher_close and drawer_top_close tasks for training, where in the original bigym paper, these two tasks' performance are both almost near 100% success rate on Diffusion Policy and ACT. The robobase doesn't figure out how to change the robobase_config.yaml to fit imitation learning pipeline, so I change the robobase_config.yaml according to the bigym paper. Here's my robobase_config:
Then I utilize the command to start training on diffusion policy:
But nothing works. Success rate is always zero. Besides, It seems that there's a bug in your temporal ensemble code. In the robobase/robobase/envs/wrappers/action_sequence.py, def step_sequence(), your temporal ensemble code is:
It seems that even being set to temporal ensemble, the function still execute the ‘for' loop and output sub_action multiple times. And each time the sub_action is the same. So I change the code to the following code to avoid this:
Then I launch another training but still nothing works. Then I take a look at the eval video:
Drawer_top_close:
It seems that the robot completed the task to a certain degree, but the overall success rate is still zero and the task reward is zero, too. This is wired.
Dishwasher_close:
The robot fails to complete the task at all.
Besides I found you use DDIM for training. But as far as I know DDIM can be only used to accelerate sampling of DDPM, but the training code still relies on DDPM. I'm not a diffusion expert so I don't know if I'm wrong.
And so I tried to launch training on ACT code and it turns out:
I figure out that this is because the ACT code only supports image obs, where in above experiments I used state obs. So I change the pixel term in the robobase_config.yaml:
And it turns out [Inquiry Regarding Using Repo with BiGym · Issue #3 · robobase-org/robobase (github.com)](https://github.com/robobase-org/robobase/issues/3) which remains not answered.
Then I want to know if there's anything wrong on the demos. But I can't run your demo replay code since my server doesn't have a monitor or GUI.
So I'm stuck here and really wish for your help. To be honest, I think your bigym is really an amazing benchmark since it's the only one I know that supports long-horizon mobile manipulation and has plenty of human demos and tasks. I believe that better maintenance on this respository will definitly enlarge the impact of this work in the community. I hope my feedback can aid you to better develop this.