Train difusion policy using lerobot with custom dataset

HiroIshida commented 4 months ago

Resources

The lerobot repository does not provide detained explanation to how to create custom dataset, but @ojh6404 succeeded (tried?) to created by himself
https://github.com/ojh6404/imitator/tree/lerobot

Also, I'm posting issue here https://github.com/huggingface/lerobot/issues/304

Creation of dummy lerobot dataset (just a random dataset)

Referencing his source code, I'm trying to create custom dataset. The following script is the minimum working example to at least run the training pipeline (but on random data, so meaning less though). https://github.com/HiroIshida/snippets/blob/master/python/ext_examples/lerobot/build_data.py

Creation of more realistic lerobot datase (IK-reaching task using pybullet)

Also, plug the data created by https://github.com/HiroIshida/mohou/blob/master/example/kuka_reaching.py into lerobot in the same manner (see https://github.com/HiroIshida/snippets/blob/master/python/ext_examples/lerobot/from_mohou.py ), and confirmed that at least the loss is decreasing

h-ishida@azarashi:~/misc/snippets/python/ext_examples/lerobot$ python3 from_mohou.py 
INFO:datasets:PyTorch version 2.3.1 available.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 95/95 [00:00<00:00, 719.08it/s]
compute stats
Compute mean, min, max: 100%|██████████████████████████████████████████████████████████████████████▊| 395/396 [00:02<00:00, 133.22it/s]
Compute std: 100%|█████████████████████████████████████████████████████████████████████████████████▊| 395/396 [00:02<00:00, 139.20it/s]
finsh creating lerobot_dataset
step: 0 loss: 1.107
step: 250 loss: 0.023
step: 500 loss: 0.016
step: 750 loss: 0.012
step: 1000 loss: 0.008
step: 1250 loss: 0.017
step: 1500 loss: 0.008
step: 1750 loss: 0.005

Of course after some tweak of config

diff --git a/lerobot/common/policies/diffusion/configuration_diffusion.py b/lerobot/common/policies/diffusion/configuration_diffusion.py
index 2b7923a..2b71566 100644
--- a/lerobot/common/policies/diffusion/configuration_diffusion.py
+++ b/lerobot/common/policies/diffusion/configuration_diffusion.py
@@ -100,20 +100,20 @@ class DiffusionConfig:

     input_shapes: dict[str, list[int]] = field(
         default_factory=lambda: {
-            "observation.image": [3, 96, 96],
-            "observation.state": [2],
+            "observation.images": [3, 56, 56],
+            "observation.state": [7],
         }
     )
     output_shapes: dict[str, list[int]] = field(
         default_factory=lambda: {
-            "action": [2],
+            "action": [7],
         }
     )

     # Normalization / Unnormalization
     input_normalization_modes: dict[str, str] = field(
         default_factory=lambda: {
-            "observation.image": "mean_std",
+            "observation.images": "mean_std",
             "observation.state": "min_max",
         }
     )
@@ -122,7 +122,7 @@ class DiffusionConfig:
     # Architecture / modeling.
     # Vision backbone.
     vision_backbone: str = "resnet18"
-    crop_shape: tuple[int, int] | None = (84, 84)
+    crop_shape: tuple[int, int] | None = (28, 28)
     crop_is_random: bool = True
     pretrained_backbone_weights: str | None = None
     use_group_norm: bool = True

Demonstration example sample Execution example feedback_simulation

HiroIshida commented 4 months ago

(this is personal note, not actual question to anybody)

training phase

Q: However, when we use custom dataset, we need to specifcy? dataset.stats by myself otherwise dataset.stats is just None. A: Yes, call compute_stats to compute stats and set it to the LerobotDataset Q: error assert set(batch).issuperset({"observation.state", "observation.images", "action", "action_is_pad"}) A: writing... Q: Do we need both observation.images and observation.image?? A: observations.image is created from observations.images Q: where is "action_is_pad" supposed to be created? A: here. Not only for action but also for {key}_is_pad https://github.com/huggingface/lerobot/blob/461d5472d329a6a461f56a85175d4744b8e78e61/lerobot/common/datasets/utils.py#L288 This function is actually called from here. According to this, _is_pad is created only when delta_timestep is provided. Q: image size in input_shape in config (56 x 56) and actual image size (112 x 112) mismatch, but training seems to work fine, why? A: it seems that input_shape config param is used only in the initialization of normalization layer. For custom dataset case, anyway, we need to provide stats parameter. And, if that parameter is provided, the initialization of normalizer solely use the stats parameter. Thus, input_shape in config is not used.

deployment phase

Q: in training phase batch image shape is like [64, 2, 3, 112, 112] but in the execution phase [1, 3, 112, 112]. Due to the lack of dimension of "2", error occured =>
A: the extra dimension attributes to the observation step which is specified as delta_timestamp in a training phase. In the deployment phase, we must concat observations so that the sequence length matches to the one in delta_timestamp

ojh6404 commented 4 months ago

@HiroIshida I succeeded in building dataset and training lerobot diffusion policy using my script with latest lerobot version. I used raw dataset as hdf5 of robomimic dataset. Though I modified some codes for my custom library, you can easily modify my script to use with out it. The commands what I used to build and train are below.

conda create -n lerobot-env python=3.11 -y
conda activate lerobot-env
git clone https://github.com/ojh6404/imitator.git -b lerobot
cd /path/to/imitator && pip install -e ".[torch]"
mkdir ~/.imitator
cd ~/.imitator && gdown 1Sg8dyipeV5GpT0F_ZYlAnyzHGTRmlgLG # This is test dataset and config (robomimic Lift)
unzip lerobot_test.zip
cd /path/to/imitator/imitator
python scripts/lerobot_dataset_builder.py -pn lerobot_test
python scripts/visualize_lerobot_dataset.py --episode-index 0 --root /home/leus/.imitator/lerobot_test/data # to visualize
python scripts/lerobot_trainer.py hydra.job.name=lerobot_test    device=cuda    env=robomimic    dataset_repo_id=data/lerobot_dataset   policy=diffusion    training.eval_freq=-1  training.log_freq=250 training.offline_steps=1000 wandb.enable=true root=/home/leus/.imitator/lerobot_test

if you are not using latest ffmpeg, you need to modify lerobot's this line to use libx264 or libx265 for video codec, not libsvtav1 and pip install -e lerobot.

HiroIshida commented 4 months ago

@ojh6404 Oh, sorry, this is just my personal note. I'm leaving it here in case other people who encounter similar issues might find this page helpful. I didn't intend to mention you specifically (just wanted to reference the original creator of the code). Sorry for any disturbance.

I forgot to compute the stats value myself, so I just had to call compute_stats, and it started to work.

Anyway, thank you for your instructions! Having the actual data you provided definitely helps a lot.

HiroIshida commented 4 months ago

pitfalls

Crop size must be set to slightly bit smaller than that original size. I originally set it to the half of the original (just keep it default) and didnt work well. The below is image size vs crop size in the tasks of diffusion policy paper, and you can confirm that.
Must to set num_inference_steps (of diffusion) to some smaller value (e.g. 10). In the original implementation this value in the config file is None and then set to 100, which is the one in the training time. With 100, it takes 1 second even using decent GPU (e.g. 2080ti).

dennisushi commented 3 months ago

Further to this topic, I was able to create a custom dataset fairly easily by following the following steps. Unfortunately, it feels like a workaround rather than a proper method:

1) Save custom dataset as hdf5 format

import h5py
lr_ds = {}
for episode_id, t in enumerate(trajectories):
    lr_ds["/action"] = t["action"]
    lr_ds["/observations/qpos"] = t["obs"]["agent_pos"]
    # Note LeRobot Alpha format expects uint8 images with channels in the last dimension. 
    lr_ds["/observations/images/cam1"] =(t["obs"]["rgb"].transpose(0,2,3,1) * 255).astype(np.uint8) 
    # Aloha LeRobot also accepts /observations/qvel and /observations/effort
    # It does not support point clouds or depth images yet.

    # Note sim has to be the name of the last folder because currently 
    # from_raw_to_lerobot_format treats image arrays as 4 dimensional only 
    # if sim is in the name...
    savepath = f"/tmp/<ENV NAME>/sim/episode_{episode_id}.hdf5"
    os.makedirs(os.path.dirname(savepath), exist_ok=True)

    with h5py.File(savepath, "w") as f:
        for key, value in lr_ds.items():
            f.create_dataset(key, data=value)

2) Then load the dataset as:

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
from lerobot.common.datasets.push_dataset_to_hub.aloha_hdf5_format import from_raw_to_lerobot_format
from pathlib import Path
from lerobot.common.datasets.compute_stats import compute_stats

raw_dir = Path("/tmp/<ENV NAME>/sim/")
dataset, episode_data_index, info = from_raw_to_lerobot_format(raw_dir, videos_dir=None, video=False)
stats = compute_stats(dataset, batch_size=32, num_workers=16, max_num_samples=None)
dataset = LeRobotDataset.from_preloaded(
        root = raw_dir,
        hf_dataset = dataset,
        episode_data_index = episode_data_index,
        stats = stats,
        delta_timestamps = None,
        info = info,
    )

Then I can safely visualize it as normal. Ideally, in the future, the stats will be automatically parsed from the data. I have not tried training with it yet.

Edit 1: Training works errorless. Loss is going down. Whether or not it is learning anything useful, don't know yet.

python lerobot/scripts/train.py policy=act_custom env=real_world env.task=StackCube-v1

All I had to do is modify the dataset factory to load real_world datasets from raw as step 2) above, and change act_custom policy configuration to my config.

HiroIshida commented 3 months ago

@dennisushi thanks for the info!

HiroIshida / snippets