ECCV 2024
Project Page | Paper | ArXiv | Video | Poster
Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney
This repo contains the implementation of our recent work, EgoLifter: Open-world 3D Segmentation for Egocentric Perception.
git clone git@github.com:facebookresearch/egolifter.git
First download and install Anaconda or Miniconda. Then create the environment and install packages using the following commands
# This implementation is tested on Python 3.10
conda create -n egolifter python=3.10 pip
conda activate egolifter
# Install Pytorch (This implementation was tested with the following version of PyTorch)
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install plyfile tqdm hdbscan open3d fsspec opencv-python imageio distinctipy natsort plyfile wandb imageio[ffmpeg] moviepy tyro lightning pretrained-backbones-unet hydra-core projectaria-tools'[all]' vrs open_clip_torch git+https://github.com/openai/CLIP.git viser splines lightning[pytorch-extra]
This codebase implements a re-implementation of the original Gaussian Splatting training loop using pytorch-lightning and wandb. To enable wandb logging during training, login to your wandb account using the following command.
wandb login
This implementation uses gsplat as the 3DGS implementation. Please setup the gsplat implementation as follows. We tested with the gsplat at this commit (5fc940b648e32218ba0979355d7e4d7910f54476
) but newer versions should also work.
# Install gsplat using pip
pip install gsplat
# Or install from source code if you meet issue using pip
git clone --recursive git@github.com:nerfstudio-project/gsplat.git
cd gsplat; pip install -e .; cd ..
We use Grounded-SAM repo to compute segmentation results on 2D images. In EgoLifter, all experiments are based on the dense segmentation results from the original SAM, but it's possible to try 2D segmentation results of different granularity using Grounded-SAM or Semantic-SAM.
You can follow the above GitHub repo for setting up the repo. For your reference, the following commands are what we used for setting up the Grounded-SAM codebase.
# conda activate egolifter # Run this line if needed
git clone git@github.com:IDEA-Research/Grounded-Segment-Anything.git
cd Grounded-Segment-Anything/
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
# export CUDA_HOME=/usr/local/cuda-12.3/ # Run this line if needed
python -m pip install -e segment_anything
python -m pip install -e GroundingDINO
pip install --upgrade diffusers[torch]
# Download these to where you want to store them
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Make a copy of setup_env.bash.template
file and change the paths to the actual paths you used.
cp setup_env.bash.template setup_env.bash
# TODO: Then change the path in the setup_env.bash file
# Then run the following line to set the environment variables
source setup_env.bash
Environment variables in source setup_env.bash
are needed for some scripts in EgoLifter. You can run source setup_env.bash
to set the environment variables.
First access ADT through this link and download the ADT_download_urls.json
file, which contains the download links for the dataset.
Then prepare a directory where you want to save the downloaded and processed dataset as follows. And put the ADT_download_urls.json
in the $ADT_DATA_ROOT
directory.
# TODO: Change the following to directories where you want to save the dataset
export ADT_DATA_ROOT=/path/to/adt
export ADT_PROCESSED_ROOT=/path/to/adt_processed
mkdir -p $ADT_DATA_ROOT
mkdir -p $ADT_PROCESSED_ROOT
cp /path/to/ADT_download_urls.json $ADT_DATA_ROOT
Then run the following script to download and process the dataset.
# source setup_env.bash # Run this if you haven't
bash scripts/download_process_adt.bash
After downloading and processing the ADT dataset, you can train the EgoLifter and its variants using the following commands.
# Take the following scene as an example. Change SCENE_NAME if needed.
SCENE_NAME=Apartment_release_multiskeleton_party_seq121
# EgoLifter (full method)
python train_lightning.py \
scene.scene_name=${SCENE_NAME} \
scene.data_root=${ADT_PROCESSED_ROOT} \
model=unc_2d_unet \
model.unet_acti=sigmoid \
model.dim_extra=16 \
lift.use_contr=True \
exp_name=egolifter \
output_root=./output/adt \
wandb.project=egolifter_adt
# Egolifter-Static (baseline, without transient prediction)
python train_lightning.py \
scene.scene_name=${SCENE_NAME} \
scene.data_root=${ADT_PROCESSED_ROOT} \
model=unc_2d_unet \
model.unet_acti=baseline \
model.dim_extra=16 \
lift.use_contr=True \
exp_name=egolifter_static \
output_root=./output/adt \
wandb.project=egolifter_adt
# EgoLifter-Deform (baseline, using a deformation network)
python train_lightning.py \
scene.scene_name=${SCENE_NAME} \
scene.data_root=${ADT_PROCESSED_ROOT} \
model=deform \
model.weight_l1_reg_xyz=1e-1 \
model.weight_l1_reg_rot=1e-1 \
model.dim_extra=16 \
lift.use_contr=True \
exp_name=egolifter_deform \
output_root=./output/adt \
wandb.project=egolifter_adt
# The original 3DGS (without instance feature learning)
python train_lightning.py \
scene.scene_name=${SCENE_NAME} \
scene.data_root=${ADT_PROCESSED_ROOT} \
exp_name=3dgs \
output_root=./output/adt \
wandb.project=egolifter_adt
We adapt the web-based visualizer from gaussian-splatting-lightning for EgoLifter, which is based on viser. You can visualize the results of the trained model using the following command.
# Select one of the output folder from below
FOLDER_NAME=unc_2d_unet_egolifter
# FOLDER_NAME=unc_2d_unet_egolifter_static
# FOLDER_NAME=deform_egolifter_deform
# This will start a local server
# open the browser and go to the link for visualization
python viewer.py \
./output/adt/${SCENE_NAME}/${FOLDER_NAME} \
--data_root ${ADT_PROCESSED_ROOT} \
--reorient disable \
--feat_pca
To render out the images and video using the trained models, run the following command:
# Select one of the subset of images to render
# SUBSET=trainvalid # seen subset
SUBSET=novel # novel subset
python render_lightning.py \
model_path=./output/adt/${SCENE_NAME}/${FOLDER_NAME} \
render_subset=${SUBSET} \
source_path=${ADT_PROCESSED_ROOT}/${SCENE_NAME}
Query-based 2D segmentation evaluation:
# Run in-view evaluation
for FOLDER_NAME in unc_2d_unet_egolifter unc_2d_unet_egolifter_static deform_egolifter_deform; do
CKPT_FOLDER=./output/adt/${SCENE_NAME}/${FOLDER_NAME}
python eval_query_2dseg.py \
--ckpt_folder ${CKPT_FOLDER} \
--source_path ${ADT_PROCESSED_ROOT}/${SCENE_NAME} \
--threshold_mode gt
done
# Run cross-view evaluation
for FOLDER_NAME in unc_2d_unet_egolifter unc_2d_unet_egolifter_static deform_egolifter_deform; do
CKPT_FOLDER=./output/adt/${SCENE_NAME}/${FOLDER_NAME}
python eval_query_2dseg.py \
--ckpt_folder ${CKPT_FOLDER} \
--source_path ${ADT_PROCESSED_ROOT}/${SCENE_NAME} \
--threshold_mode gt \
--query_type crossview \
--n_query_samples 5
done
Query-based 3D segmentation evaluation:
for FOLDER_NAME in unc_2d_unet_egolifter unc_2d_unet_egolifter_static deform_egolifter_deform; do
CKPT_FOLDER=./output/adt/${SCENE_NAME}/${FOLDER_NAME}
python eval_query_3dbox.py \
--ckpt_folder ${CKPT_FOLDER} \
--source_path ${ADT_PROCESSED_ROOT}/${SCENE_NAME} \
--query_type 2davg \
--threshold_mode gt
done
Compute the PSNR metrics:
for FOLDER_NAME in unc_2d_unet_egolifter unc_2d_unet_egolifter_static deform_egolifter_deform; do
CKPT_FOLDER=./output/adt/${SCENE_NAME}/${FOLDER_NAME}
python eval_lightning.py \
model_path=${CKPT_FOLDER} \
source_path=${ADT_PROCESSED_ROOT}/${SCENE_NAME}
done
Evaluate the SAM baseline
# The trained model here is used to render the images at unseen views,
# which will be used as input to the SAM model
FOLDER_NAME=vanilla_3dgs
python eval_query_2dseg_sam.py \
--source_path ${ADT_PROCESSED_ROOT}/${SCENE_NAME} \
--ckpt_folder output/adt/${SCENE_NAME}/${FOLDER_NAME}
See notebooks/aggregate_adt_logs.ipynb
for how to aggregate the evaluation results obtained from EgoLifter and get the tables reported in the paper.
If you find this software useful in your research, please consider citing:
@article{gu2024egolifter,
author = {Gu, Qiao and Lv, Zhaoyang and Frost, Duncan and Green, Simon and Straub, Julian and Sweeney, Chris},
title = {EgoLifter: Open-world 3D Segmentation for Egocentric Perception},
journal = {arXiv preprint arXiv:2403.18118},
year = {2024},
}
This implementation has been inspired by the following repositories:
See the License file.
This project also includes adapted code from the following open-source projects:
Gaussian-Splatting
nerfstudio