ali-vilab / Infusion

Official implementation for paper: InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
MIT License
536 stars 16 forks source link

InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

Zhiheng Liu* · Hao Ouyang* · Qiuyu Wang · Ka Leong Cheng · Jie Xiao · Kai Zhu · Nan Xue · Yu Liu · Yujun Shen · Yang Cao

Paper PDF Project Page
USTC | HKUST | Ant Group | Alibaba Group

## News * **[2024.4.18]** 🔥 Release paper, inference code, and pretrained checkpoint. * **[On-going]** Clean and organize the masks corresponding to the dataset used in the experiments. * **[On-going]** Scale up the model with more training data and release stronger models as the foundation model for downstream tasks. * **[To-do]** Release training code. ## Installation Install with `conda`: ```bash conda env create -f environment.yaml conda activate infusion ``` * 🛠️ For rendering depth, we use the *diff-gaussian-rasterization-confidence* from [FSGS](https://github.com/VITA-Group/FSGS/tree/main/submodules/diff-gaussian-rasterization-confidence), thanks to their work! :) ## Download Checkpoints Download the Infusion checkpoint and put it in the 'checkpoints' folder: * [HuggingFace](https://huggingface.co/Johanan0528/Infusion/tree/main) ## Data Preparation Our experiments are conducted on the datasets provided by [Mip-NeRF](https://jonbarron.info/mipnerf360/), [Instruct-NeRF2NeRF](https://drive.google.com/drive/folders/1v4MLNoSwxvSlWb26xvjxeoHpgjhi_s-s?usp=share_link), and [SPIn-NeRF](https://drive.google.com/drive/folders/1N7D4-6IutYD40v9lfXGSVbWrd47UdJEC?usp=share_link). We will upload the masks used in the experiments and the challenge scene we shot ourselves in a few days. Taking "garden" in [Mip-NeRF](https://jonbarron.info/mipnerf360/) as an example, each scene folder should be organized as follows. ``` garden ├── images # RGB data │ ├── DSC07956.JPG │ ├── DSC07957.JPG │ └── ... ├── seg # Mask │ ├── DSC07956.JPG │ ├── DSC07957.JPG │ └── ... │ # The part that needs to be inpainted is white └── sparse # Colmap │ └── 0 │ └── ... ``` * 🛠️ You can prepare your own data according to such a structure. In addition, accurate masks are very important. Here, we recommend two image segmentation tools: [Segment and Track Anything](https://github.com/z-x-yang/Segment-and-Track-Anything) and [Grounded SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything). * 🛠️ To obtain camera parameters and initial point cloud, please refer to 'convert.py' in [Gaussian-Splatting](https://github.com/graphdeco-inria/gaussian-splatting/tree/main) :) ## Instructions The entire pipeline is divided into three stages: * Train the Incomplete Gaussians. * Inpaint Gaussians via Diffusion Prior. * Combine Inpainted Gaussians and Incomplete Gaussians. ### 🌺 Stage 1 Use pre-annotated masks to train incomplete Gaussians. ```bash cd gaussian_splatting # Train incomplete Gaussians python train.py -s -m -u nothing --mask_training #--color_aug # Obtain c2w matrix, intrinsic matrix, incomplete depth, and rgb rendering image python render.py -s -m -u nothing ``` * 🛠️ Tip: Sometimes, the rendered depth has too many empty points. Maybe you can use `--color_aug` during training, which will randomly select the background color when rendering depth, which may make the depth map more reliable. * 🛠️ Recently, some works focused on how to segment Gaussians. This is not the focus of this work, so a relatively simple method was chosen. :) ### 🌺 Stage 2 Inpaint Gaussians using depth inpainting model. * 📢 You need to select a **single image** and in *'path to output folder/train/ours_30000/renders'* and mark the area that needs to be inpainted and save it as **'mask.png'**. (*It doesn’t have to be precise but it needs to cover all the missing parts.*) * 📢 Next, you need to inpaint a single image. Here are some great tools to inpaint a single image: [Stable Diffusion XL Inpainting](https://huggingface.co/spaces/diffusers/stable-diffusion-xl-inpainting) and [Photoroom](https://app.photoroom.com/create). Here is an example:
```bash # Assume that the selected single image is named "DSC07956.JPG". cd depth_inpainting/run input_rgb_path= input_mask_path= input_depth_path= c2w= intri= model_path= # absolute path output_dir= CUDA_VISIBLE_DEVICES=0 python run_inference_inpainting.py \ --input_rgb_path $input_rgb_path \ --input_mask $input_mask_path \ --input_depth_path $input_depth_path \ --model_path $model_path \ --output_dir $output_dir \ --denoise_steps 20 \ --intri $intri \ --c2w $c2w \ --use_mask\ --blend # Whether to use 'Blended Diffusion (https://arxiv.org/abs/2111.14818)' during inference. ``` * 🛠️ Tip: If you feel that the depth map obtained by one inference is not satisfactory, you can use the newly obtained `output_dir/_depth_dis.npy` as the new `$input_depth_path` and loop two or three times to get better results. ### 🌺 Stage 3 Combine inpainted Gaussians and incomplete Gaussians and quickly fine-tune on inpainted single image. ```bash # Assume that the selected single image is named "DSC07956.JPG". origin_ply="path to output folder/point_cloud/iteration_30000/point_cloud.ply" supp_ply="path to output folder/DSC07956_mask.ply" save_ply="path to output folder/point_cloud/iteration_30001/point_cloud.ply" # Combine inpainted Gaussians and incomplete Gaussians. python compose.py --original_ply $origin_ply --supp_ply $supp_ply --save_ply $save_ply --nb_points 100 --threshold 1.0 # Fine-tune on an inpainted single image for 150 iterations. python train.py -s -m -u DSC07956.JPG -n --load_iteration 30001 --iteration 150 # Render python render.py -s -m -u nothing --iteration 150 ``` * 🛠️ The two parameters `--nb_points` and `--threshold` are used to remove floaters. Increasing their values will remove more surrounding points. Removing floaters is **very important** for the final rendering results. Here, we need to find the most suitable parameters for removing floaters from the scene. * 🛠️ As explicit points, Gaussian can be directly edited and cropped in actual applications, such as [KIRI Engine](https://www.kiriengine.com/) ## Acknowledgements This project is developed on the codebase of [Gaussian-Splatting](https://github.com/graphdeco-inria/gaussian-splatting/tree/main), [Marigold](https://github.com/prs-eth/marigold) and [Magicboomliu](https://github.com/Magicboomliu). We appreciate their great works! ## Citation If you find this repository useful in your work, consider citing the following papers and giving a ⭐ to the public repository to allow more people to discover this repo: ```BibTeX @article{liu2024infusion, title={InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior}, author={Liu, Zhiheng and Ouyang, Hao and Wang, Qiuyu and Cheng, Ka Leong and Xiao, Jie and Zhu, Kai and Xue, Nan and Liu, Yu and Shen, Yujun and Cao, Yang}, journal={arXiv preprint arXiv:2404.11613}, year={2024} } ```