WangYixuan12 / d3fields

[CoRL 24] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation
https://robopil.github.io/d3fields/
MIT License
108 stars 6 forks source link
computer-vision manipulation perception robotics vision

D3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

Open In Colab

Website | Paper | Colab | Doc

D3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

Yixuan Wang1, Zhuoran Li2, 3, Mingtong Zhang1, Katherine Driggs-Campbell1, Jiajun Wu2, Li Fei-Fei2, Yunzhu Li1, 2

1University of Illinois Urbana-Champaign, 2Stanford University, 3National University of Singapore

https://github.com/WangYixuan12/d3fields/assets/32333199/a3fced3d-e827-4e7e-ad6a-e80889809fca

Try it in Colab!

In this notebook, we show how to build D3Fields and visualize reconstructed mesh, mask fields, and descriptor fields. We also demonstrate how to track keypoints of a video.

Installation

We recommend Mambaforge instead of the standard anaconda distribution for faster installation:

# create conda environment
mamba env create -f env.yaml
conda activate d3fields

# download pretrained models
bash scripts/download_ckpts.sh
bash scripts/download_data.sh

Visualization

python vis_repr.py # visualize the representation
python vis_tracking.py # visualize the tracking

Code Explanation

Fusion is the core class of D3Fields. It contains the following key functions:

Customized Dataset

To run D3Fields on your own dataset, you could follow the following steps:

  1. Prepare dataset in the following structure:
    dataset_name
    ├── camera_0
    │   ├── color
    |   |   ├── 0.png
    |   |   ├── 1.png
    |   |   ├── ...
    │   ├── depth
    |   |   ├── 0.png
    |   |   ├── 1.png
    |   |   ├── ...
    │   ├── camera_extrinsics.npy
    │   ├── camera_params.npy
    ├── camera_1
    ├── ...

    The definition of camera_extrinsics.npy and camera_params.npy is defined as follows:

    camera_extrinsics.npy: (4, 4) numpy array, the extrinsics of the camera, which transforms a point from world coordinate to camera coordinate
    camera_params.npy: (4,) numpy array, the camera parameters in the following order: fx, fy, cx, cy
  2. Prepare the PCA pickle file for the query texts. Find four images of the queries texts (e.g. mug) with clean bakcground and central objects. Change obj_type within scripts/prepare_pca.py and run it.
  3. Specify the workspace boundary as x_lower, x_upper, y_lower, y_upper, z_lower, z_upper.
  4. Run python vis_repr_custom.py, such as python vis_repr_custom.py --data_path data/2023-09-15-13-21-56-171587 --pca_path pca_model/mug.pkl --query_texts mug --query_thresholds 0.3 --x_lower -0.4 --x_upper 0.4 --y_upper 0.3 --y_lower -0.4 --z_upper 0.02 --z_lower -0.2

Tips for debugging:

Citation

If you find this repo useful for your research, please consider citing the paper

@article{wang2023d3fields,
    title={D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation},
    author={Wang, Yixuan and Li, Zhuoran and Zhang, Mingtong and Driggs-Campbell, Katherine and Wu, Jiajun and Fei-Fei, Li and Li, Yunzhu},
    journal={arXiv preprint arXiv:2309.16118},
    year={2023}
}