Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

This is the official implementation of our ICRA'24 paper Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning.

The code is adapted from Diffusion Policy.

News

7/1/2024, The 7 min presentation for ICRA'24 is online! [Youtube].
1/29/2024, our paper and the attached video have been aceepted to ICRA'24 🎉.
1/12/2024, a new version of our paper has been released.

Presentation and Demo

Click the GIF below to watch the full video!

Our Method

We propose Crossway Diffusion, a simple yet effective method to enhance diffusion-based visuomotor policy learning.

By introducing a carefully designed state decoder and a simple reconstruction objective, we explicitly regularize the intermediate representation of the diffusion model to capture the information of the input states, leading to enhanced performance across all datasets.

Our major contribution is included in:

diffusion_policy/workspace/train_crossway_diffusion_unet_hybrid_workspace.py (newly added)
diffusion_policy/policy/crossway_diffusion_unet_hybrid_image_policy.py (newly added)
diffusion_policy/model/diffusion/conv2d_components.py (newly added)
diffusion_policy/model/diffusion/conditional_unet1d.py (modified)

Installation

The Python environment used in this project is identical to Diffusion Policy. Please refer to this link for detailed installation instructions.

(Optional) To manually control the image rendering device through the environment variable EGL_DEVICE_ID, replace the original robomimic/envs/env_robosuite.py in robomimic with this modified file.

Download Datasets

Please follow the guide at this link to download the simulated datasets.

Our real-world datasets are available at Hugging Face Dataset. The dataset files have a similar structure as robomimic. Please check dataset_readme.md to train on our and your own datasets.

Training

To train a model on simulated datasets with a specific random seed:

EGL_DEVICE_ID=0 python train.py --config-dir=config/${task}/ --config-name=type[a-d].yaml training.seed=4[2-4]

where ${EGL_DEVICE_ID} defines which GPU is used for rendering simulated images, ${task} can be can_ph, can_mh, lift_ph, lift_mh, square_ph, square_mh, transport_ph, transport_mh, tool_hang_ph and pusht.

The result will be stored at outputs/ and wandb/. In our experiments, we use 42, 43 and 44 as the random seeds.

Evaluation

To evaluate a checkpoint:

EGL_DEVICE_ID=0 python eval.py --checkpoint <path to checkpoint.ckpt> --output_dir <path for output> --device cuda:0

By default, the code will evaluate the model for 50 episodes and the results will be available at <path for output>/eval_log.json.

Pretrained Models

Our pretrained models and evaluation results are now available at Hugging Face.

License

This repository is released under the MIT license. See LICENSE for additional details.

LostXine / crossway_diffusion

readme