# Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
[![python](https://img.shields.io/badge/-Python_3.8_%7C_3.9_%7C_3.10_%7C_3.11-blue?logo=python&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![pytorch](https://img.shields.io/badge/PyTorch_2.0+-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/locally/)
[![lightning](https://img.shields.io/badge/-Lightning_2.0+-792ee5?logo=pytorchlightning&logoColor=white)](https://pytorchlightning.ai/)
[![hydra](https://img.shields.io/badge/Config-Hydra_1.3-89b8cd)](https://hydra.cc/)
[![black](https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray)](https://black.readthedocs.io/en/stable/)
[![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![license](https://img.shields.io/badge/License-MIT-green.svg?labelColor=gray)](https://github.com/ashleve/lightning-hydra-template#license)
[**Project Page**](https://haoyizhu.github.io/pcm/) | [**Arxiv**](https://arxiv.org/abs/2402.02500)
[Haoyi Zhu](https://www.haoyizhu.site/), [Yating Wang](https://scholar.google.com/citations?hl=zh-CN&user=5SuBWh0AAAAJ), [Di Huang](https://dihuang.me/), [Weicai Ye](https://ywcmaike.github.io/), [Wanli Ouyang](https://wlouyang.github.io/), [Tong He](http://tonghe90.github.io/)
This is the official implementation of NeurIPS 2024 D&B track paper "Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning". Real-world codes can be found in RealRobot.
In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. Extensive experiments on diverse contact-rich manipulation tasks reveal a notable trend: point cloud-based methods, even those with the simplest designs, frequently outperform their RGB and RGB-D counterparts. This trend persists in both scenarios: training from scratch and utilizing pre-training. Furthermore, our findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions. These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks. We also suggest that incorporating both appearance and coordinate information can enhance the performance of point cloud methods. We hope our work provides valuable insights and guidance for designing more generalizable and robust robotic models.
:clipboard: Contents
:telescope: Project Structure
Our codebase draws significant inspiration from the excellent Lightning Hydra Template. The directory structure of this project is organized as follows:
Show directory structure
```
├── .github <- Github Actions workflows
│
├── configs <- Hydra configs
│ ├── callbacks <- Callbacks configs
│ ├── data <- Data configs
│ ├── debug <- Debugging configs
│ ├── exp_maniskill2_act_policy <- ManiSkill2 w. ACT policy experiment configs
| ├── exp_maniskill2_diffusion_policy <- ManiSkill2 w. diffusion policy experiment configs
│ ├── extras <- Extra utilities configs
│ ├── hydra <- Hydra configs
│ ├── local <- Local configs
│ ├── logger <- Logger configs
│ ├── model <- Model configs
│ ├── paths <- Project paths configs
│ ├── trainer <- Trainer configs
| |
│ └── train.yaml <- Main config for training
│
├── data <- Project data, e.g. ManiSkill2 replayed trajectories
│
├── logs <- Logs generated by hydra and lightning loggers
│
├── scripts <- Shell scripts
|
├── src <- Source code
│ ├── data <- Data scripts
│ ├── models <- Model scripts
│ ├── utils <- Utility scripts
│ │
│ ├── validate.py <- Run evaluation
│ └── train.py <- Run training
│
├── .gitignore <- List of files ignored by git
├── .project-root <- File for inferring the position of project root directory
├── requirements.txt <- File for installing python dependencies
├── setup.py <- File for installing project as a package
└── README.md
```
:hammer: Installation
Basics
```bash
# clone project
git clone https://github.com/HaoyiZhu/PointCloudMatters.git
cd PointCloudMatters
# crerate conda environment
conda create -n pcm python=3.11 -y
conda activate pcm
# install PyTorch, please refer to https://pytorch.org/ for other CUDA versions
# e.g. cuda 11.8:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# install basic packages
pip3 install -r requirements.txt
```
Point cloud related
```bash
# please install with your PyTorch and CUDA version
# e.g. torch 2.3.0 + cuda 118:
pip install torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-2.3.0+cu118
```
> **Note**: `spconv` must matches your CUDA version, see [official Github](https://github.com/traveller59/spconv) for more information.
```bash
# e.g. for CUDA 11.8:
pip3 install spconv-cu118
```
```bash
# build FPS sampling operations (CUDA required)
cd libs/pointops
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python setup.py install
cd ../..
```
ManiSkill2
```bash
pip install mani-skill2==0.5.3 && pip cache purge
```
You can test whether your `ManiSkill2` is installed successfully by running:
```bash
python -m mani_skill2.examples.demo_random_action
```
RLBench
> **Note**: Installing RLbench can be challenging. We recommend referring to [PerAct's installation guides](https://github.com/peract/peract?tab=readme-ov-file#installation) for more assistance.
#### 1. PyRep and Coppelia Simulator
Follow instructions from the official [PyRep](https://github.com/stepjam/PyRep) repo; reproduced here for convenience:
PyRep requires version **4.1** of CoppeliaSim. Download:
- [Ubuntu 16.04](https://downloads.coppeliarobotics.com/V4_1_0/CoppeliaSim_Player_V4_1_0_Ubuntu16_04.tar.xz)
- [Ubuntu 18.04](https://downloads.coppeliarobotics.com/V4_1_0/CoppeliaSim_Player_V4_1_0_Ubuntu18_04.tar.xz)
- [Ubuntu 20.04](https://downloads.coppeliarobotics.com/V4_1_0/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz)
Once you have downloaded CoppeliaSim, you can pull PyRep from git:
```bash
cd
git clone https://github.com/stepjam/PyRep.git
cd PyRep
```
Add the following to your *~/.bashrc* file: (__NOTE__: the 'EDIT ME' in the first line)
```bash
export COPPELIASIM_ROOT=/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
```
Remember to source your bashrc (`source ~/.bashrc`) or
zshrc (`source ~/.zshrc`) after this.
**Warning**: CoppeliaSim might cause conflicts with ROS workspaces.
Finally install the python library:
```bash
pip install -r requirements.txt
pip install .
```
You should be good to go!
You could try running one of the examples in the *examples/* folder.
If you encounter errors, please use the [PyRep issue tracker](https://github.com/stepjam/PyRep/issues).
#### 2. RLBench
We use [PerAct's RLBench fork](https://github.com/MohitShridhar/RLBench/tree/peract).
```bash
cd
git clone -b peract https://github.com/MohitShridhar/RLBench.git # note: 'peract' branch
cd RLBench
pip install -r requirements.txt
python setup.py develop
```
For [running in headless mode](https://github.com/MohitShridhar/RLBench/tree/peract#running-headless), tasks setups, and other issues, please refer to the [official repo](https://github.com/stepjam/RLBench).
:mag: Data Preparation
ManiSkill2
You can simply run the following to download and replay demonstrations:
```bash
bash scripts/download_and_replay_maniskill2.sh
```
RLBench
#### 1. Quick Start with PerAct's Pre-generated Datasets
[PerAct](https://github.com/peract/peract?tab=readme-ov-file#pre-generated-datasets) has provided [pre-generated RLBench demonstrations](https://drive.google.com/drive/folders/0B2LlLwoO3nfZfkFqMEhXWkxBdjJNNndGYl9uUDQwS1pfNkNHSzFDNGwzd1NnTmlpZXR1bVE?resourcekey=0-jRw5RaXEYRLe2W6aNrNFEQ&usp=share_link) for the 18 tasks it used. Each task contains 100 episodes for training, and 25 for testing and validation. Please download and extract them into `./data/rlbench/raw`. Your data directory structure may look like the following:
```
├── data
│ ├── ...
│ ├── rlbench
│ │ ├── raw
| | | ├── train
| | | | ├── close_jar
| | | | | ├── all_variations
| | | | | | ├── episodes
| | | | | | | ├── episode0
| | | | | | | ├── episode1
| | | | | | | ├── ...
| | | | ├── open_drawer
| | | | ├── ...
| | | ├── val
| | | | ├── ...
| | | ├── test
| | | | ├── ...
│ └── ...
```
To facilite the data loading speed during training, we provide a script to pre-process the raw data. You can run the following example command and it will generate processed data under `./data/rlbench/processed`.
```bash
# e.g. to pre-process task turn_tap with front camera:
python scripts/preprocess_rlbench.py --task_names turn_tap --camera_views front
```
#### 2. Data Generation by Your Own
You can also generate your own data on [all tasks RLBench supported](https://github.com/stepjam/RLBench/tree/master/rlbench/tasks).
Coming soon.
:rocket: Training and Evaluation
ManiSkill2
- Train with RGB(-D) image observation:
```bash
# ACT policy example:
python src/train.py exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
# Diffusion policy example:
python src/train.py exp_maniskill2_diffusion_policy=base exp_maniskill2_diffusion_policy/maniskill2_task@maniskill2_task=${task} exp_maniskill2_diffusion_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
```
- Train with point cloud observation:
```bash
# ACT policy example:
python src/train.py exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
# Diffusion policy example:
python src/train.py exp_maniskill2_diffusion_policy=base exp_maniskill2_diffusion_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_diffusion_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
```
- Evaluate a checkpoint:
```bash
python src/validate.py exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} ckpt_path=${path/to/checkpoint} seed=${seed}
```
- Zero-shot generalization evaluation:
- To evaluate camera view generalization experiments, run [scripts/run_maniskill2_camera_view.sh](scripts/run_maniskill2_camera_view.sh). The script evaluates the given `checkpoint` of the given `model` on the given `task` with four different camera views, using the specified `seed`. See the script for more details. For example:
```bash
bash scripts/run_maniskill2_camera_view.sh ${path/to/checkpoint} ${task} ${model} ${seed}
```
- To evaluate visual changes generalization experiments, run [scripts/run_maniskill2_visual_changes.sh](scripts/run_maniskill2_visual_changes.sh). The script evaluates the given `checkpoint` of the given `model` with different lighting conditions, noise levels and background colors, using the specified `seed`. See the script for more details. Note that currently only `StackCube` task is supported. For example:
```bash
bash scripts/run_maniskill2_visual_changes.sh ${path/to/checkpoint} ${model} ${seed}
```
Detailed configurations can be found in [configs/exp_maniskill2_act_policy](configs/exp_maniskill2_act_policy) and [configs/exp_maniskill2_diffusion_policy](configs/exp_maniskill2_diffusion_policy).
Currently supported tasks can be found in [configs/exp_maniskill2_act_policy/maniskill2_task](configs/exp_maniskill2_act_policy/maniskill2_task), [configs/exp_maniskill2_act_policy/maniskill2_pcd_task](configs/exp_maniskill2_act_policy/maniskill2_pcd_task), [configs/exp_maniskill2_diffusion_policy/maniskill2_task](configs/exp_maniskill2_diffusion_policy/maniskill2_task) and [configs/exp_maniskill2_diffusion_policy/maniskill2_pcd_task](configs/exp_maniskill2_diffusion_policy/maniskill2_pcd_task).
Currently supported models can be found in [configs/exp_maniskill2_act_policy/maniskill2_model](configs/exp_maniskill2_act_policy/maniskill2_model) and [configs/exp_maniskill2_diffusion_policy/maniskill2_model](configs/exp_maniskill2_diffusion_policy/maniskill2_model).
RLBench
- Train with RGB(-D) image observation:
```bash
# ACT policy example:
python src/train.py exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed}
# Diffusion policy example:
python src/train.py exp_rlbench_diffusion_policy=base rlbench_task=${task} exp_rlbench_diffusion_policy/rlbench_model@rlbench_model=${model} seed=${seed}
```
- Train with point cloud observation:
```bash
# ACT policy example:
python src/train.py exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed}
# Diffusion policy example:
python src/train.py exp_rlbench_diffusion_policy=base rlbench_task=${task} exp_rlbench_diffusion_policy/rlbench_model@rlbench_model=${model} seed=${seed}
```
- Evaluate a checkpoint:
```bash
# ACT policy example:
python src/test_rlbench_act.py exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed} ckpt_path=${path/to/checkpoint}
```
- Zero-shot camera-view generalization evaluation:
To evaluate camera view generalization experiments, run [scripts/run_rlbench_camera_view.sh](scripts/run_rlbench_camera_view.sh). The script evaluates the given `checkpoint` of the given `policy` and `model` on the given `task` with four different camera views, using the specified `seed`. See the script for more details. For example:
```bash
# policy: either diffusion or act
bash scripts/run_rlbench_camera_view.sh ${policy} ${path/to/checkpoint} ${task} ${model} ${seed}
```
Detailed configurations can be found in [configs/exp_rlbench_act_policy](configs/exp_rlbench_act_policy) and [configs/exp_rlbench_diffusion_policy](configs/exp_rlbench_diffusion_policy).
Currently supported models can be found in [configs/exp_rlbench_act_policy/rlbench_model](configs/exp_rlbench_act_policy/rlbench_model) and [configs/exp_rlbench_diffusion_policy/rlbench_model](configs/exp_rlbench_diffusion_policy/rlbench_model).
:tada: Gotchas
Override any config parameter from command line
This codebase is based on [Hydra](https://github.com/facebookresearch/hydra), which allows for convenient configuration overriding:
```bash
python src/train.py trainer.max_epochs=20 seed=300
```
> **Note**: You can also add new parameters with `+` sign.
```bash
python src/train.py +some_new_param=some_new_value
```
Train on CPU, GPU, multi-GPU and TPU
```bash
# train on CPU
python src/train.py trainer=cpu
# train on 1 GPU
python src/train.py trainer=gpu
# train on TPU
python src/train.py +trainer.tpu_cores=8
# train with DDP (Distributed Data Parallel) (4 GPUs)
python src/train.py trainer=ddp trainer.devices=4
# train with DDP (Distributed Data Parallel) (8 GPUs, 2 nodes)
python src/train.py trainer=ddp trainer.devices=4 trainer.num_nodes=2
# simulate DDP on CPU processes
python src/train.py trainer=ddp_sim trainer.devices=2
# accelerate training on mac
python src/train.py trainer=mps
```
Train with mixed precision
```bash
# train with pytorch native automatic mixed precision (AMP)
python src/train.py trainer=gpu +trainer.precision=16
```
Use different tricks available in Pytorch Lightning
```yaml
# gradient clipping may be enabled to avoid exploding gradients
python src/train.py trainer.gradient_clip_val=0.5
# run validation loop 4 times during a training epoch
python src/train.py +trainer.val_check_interval=0.25
# accumulate gradients
python src/train.py trainer.accumulate_grad_batches=10
# terminate training after 12 hours
python src/train.py +trainer.max_time="00:12:00:00"
```
> **Note**: PyTorch Lightning provides about [40+ useful trainer flags](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-flags).
Easily debug
```bash
# runs 1 epoch in default debugging mode
# changes logging directory to `logs/debugs/...`
# sets level of all command line loggers to 'DEBUG'
# enforces debug-friendly configuration
python src/train.py debug=default
# run 1 train, val and test loop, using only 1 batch
python src/train.py debug=fdr
# print execution time profiling
python src/train.py debug=profiler
# try overfitting to 1 batch
python src/train.py debug=overfit
# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
python src/train.py +trainer.detect_anomaly=true
# use only 20% of the data
python src/train.py +trainer.limit_train_batches=0.2 \
+trainer.limit_val_batches=0.2 +trainer.limit_test_batches=0.2
```
> **Note**: Visit [configs/debug/](configs/debug/) for different debugging configs.
Resume training from checkpoint
```yaml
python src/train.py ckpt_path="/path/to/ckpt/name.ckpt"
```
> **Note**: Checkpoint can be either path or URL.
> **Note**: Currently loading ckpt doesn't resume logger experiment, but it will be supported in future Lightning release.
Create a sweep over hyperparameters
```bash
# this will run 9 experiments one after the other,
# each with different combination of seed and learning rate
python src/train.py -m seed=100,200,300 model.optimizer.lr=0.0001,0.00005,0.00001
```
> **Note**: Hydra composes configs lazily at job launch time. If you change code or configs after launching a job/sweep, the final composed configs might be impacted.
Execute all experiments from folder
```bash
python src/train.py -m 'exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=glob(*)'
```
> **Note**: Hydra provides special syntax for controlling behavior of multiruns. Learn more [here](https://hydra.cc/docs/next/tutorials/basic/running_your_app/multi-run). The command above executes all task experiments from [configs/exp_maniskill2_act_policy/maniskill2_task](configs/experiment/).
Execute run for multiple different seeds
```bash
python src/train.py -m seed=100,200,300 trainer.deterministic=True
```
> **Note**: `trainer.deterministic=True` makes pytorch more deterministic but impacts the performance.
For more instructions, refer to the official documentation for Pytorch Lightning, Hydra, and Lightning Hydra Template.
:bulb: Trouble Shooting
See TroubleShooting.md.
:books: License
This repository is released under the MIT license.
:sparkles: Acknowledgement
Our code is primarily built upon Pytorch Lightning, Hydra, Lightning Hydra Template, ManiSkill2, RLBench, PerAct, ACT, Diffusion Policy, TIMM, PonderV2, MultiMAE, Pointcept, VC1, R3M. We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.
:pencil: Citation
@article{zhu2024point,
title={Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning},
author={Zhu, Haoyi and Wang, Yating and Huang, Di and Ye, Weicai and Ouyang, Wanli and He, Tong},
journal={arXiv preprint arXiv:2402.02500},
year={2024}
}