ActiveVisionLab / gaussctrl

[ECCV 2024] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
https://gaussctrl.active.vision/
BSD 3-Clause "New" or "Revised" License
56 stars 4 forks source link
3d 3d-editing computer-vision diffusion-models eccv2024 gaussian-splatting pytorch

🎥 [ECCV 2024] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Jing Wu*1 , Jia-Wang Bian*2 , Xinghui Li1, Guangrun Wang1, Ian Reid2, Philip Torr1, Victor Adrian Prisacariu1
* denotes equal contribution
1University of Oxford,
2Mohamed bin Zayed University of Artificial Intelligence

[![Badge with Logo](https://img.shields.io/badge/arXiv-2403.08733-red?logo=arxiv) ](https://arxiv.org/abs/2403.08733) [![Badge with Logo](https://img.shields.io/badge/Project-Page-blue?logo=homepage)](https://gaussctrl.active.vision/) [![Badge with Logo](https://img.shields.io/badge/Download-Data-cyan)](https://github.com/jingwu2121/gaussctrl/tree/main/data) [![Badge with Logo](https://img.shields.io/badge/BSD-License-green)](LICENSE.txt)
![teaser](./assets/teaser.png) ## ✨ News - [9.4.2024] Our original results utilise stable-diffusion-v1-5 from runwayml for editing, which is now unavailable. Please change the diffusion checkpoint to other available models, e.g. `CompVis/stable-diffusion-v1-4`, by using `--pipeline.diffusion_ckpt "CompVis/stable-diffusion-v1-4"`. Reproduce our original results by using the checkpoint `--pipeline.diffusion_ckpt "jinggogogo/gaussctrl-sd15"` ## ⚙️ Installation - Tested on CUDA11.8 + Ubuntu22.04 + NeRFStudio1.0.0 (NVIDIA RTX A5000 24G) Clone the repo. ```bash git clone https://github.com/ActiveVisionLab/gaussctrl.git cd gaussctrl ``` ### 1. NeRFStudio and Lang-SAM ```bash conda create -n gaussctrl python=3.8 conda activate gaussctrl conda install cuda -c nvidia/label/cuda-11.8.0 ``` GaussCtrl is built upon NeRFStudio, follow [this link](https://docs.nerf.studio/quickstart/installation.html) to install NeRFStudio first. If you are failing to build tiny-cuda-nn, try building from scratch, see [here](https://github.com/NVlabs/tiny-cuda-nn/?tab=readme-ov-file#compilation-windows--linux). We recommend using NeRFStudio v1.0.0 with gsplat v0.1.3. ```bash pip install nerfstudio==1.0.0 pip install gsplat==0.1.3 ``` Install Lang-SAM for mask extraction. ```bash pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git pip install -r requirements.txt ``` ### 2. Install GaussCtrl ```bash pip install -e . ``` ### 3. Verify the install ```bash ns-train -h ``` ## 🗄️ Data ### Use Our Preprocessed Data Our preprocessed data are under the `data` folder, where - `fangzhou` is from [NeRF-Art](https://github.com/cassiePython/NeRF-Art/tree/main/data/fangzhou_nature) - `bear`, `face` are from [Instruct-NeRF2NeRF](https://drive.google.com/drive/folders/1v4MLNoSwxvSlWb26xvjxeoHpgjhi_s-s?usp=share_link) - `garden` is from [Mip-NeRF 360](http://storage.googleapis.com/gresearch/refraw360/360_v2.zip) - `stone horse` and `dinosaur` are from [BlendedMVS](https://github.com/YoYo000/BlendedMVS) We thank these authors for their great work! ### Customize Your Data We recommend to pre-process your data to 512x512, and following [this page](https://docs.nerf.studio/quickstart/custom_dataset.html) to process your data. ## :arrow_forward: Get Started ![Method](./assets/method.png) ### 1. Train a 3DGS To get started, you first need to train your 3DGS model. We use `splatfacto` from NeRFStudio. ```bash ns-train splatfacto --output-dir {output/folder} --experiment-name EXPEIMENT_NAME nerfstudio-data --data {path/to/your/data} ``` ### 2. Edit your model Once you finish training the `splatfacto` model, the checkpoints will be saved to `output/folder/EXPEIMENT_NAME` folder. Start editing your model by running: ```bash ns-train gaussctrl --load-checkpoint {output/folder/.../nerfstudio_models/step-000029999.ckpt} --experiment-name EXPEIMENT_NAME --output-dir {output/folder} --pipeline.datamanager.data {path/to/your/data} --pipeline.prompt "YOUR PROMPT" --pipeline.guidance_scale 5 --pipeline.chunk_size {batch size of images during editing} --pipeline.langsam_obj 'OBJECT TO BE EDITED' ``` Please note that the Lang-SAM is optional here. If you are editing the environment, please remove this argument. ```bash ns-train gaussctrl --load-checkpoint {output/folder/.../nerfstudio_models/step-000029999.ckpt} --experiment-name EXPEIMENT_NAME --output-dir {output/folder} --pipeline.datamanager.data {path/to/your/data} --pipeline.prompt "YOUR PROMPT" --pipeline.guidance_scale 5 --pipeline.chunk_size {batch size of images during editing} ``` Here, `--pipeline.guidance_scale` denotes the classifier-free guidance used when editing the images. `--pipeline.chunk_size` denotes the number of images edited together during 1 batch. We are using **NVIDIA RTX A5000** GPU (24G), and the maximum chunk size is 3. (~22G) Control the number of reference views using `--pipeline.ref_view_num`, by default, it is set to 4. ### Small Tips - If your editings are not as expected, please check the images edited by ControlNet. - Normally, conditioning your editing on the good ControlNet editing views is very helpful, which means choosing those good ControlNet editing views as reference views is better. ## :wrench: Reproduce Our Results Experiments in the main paper are included in the `scripts` folder. To reproduce the results, first train the `splatfacto` model. We take the `bear` case as an example here. ```bash ns-train splatfacto --output-dir unedited_models --experiment-name bear nerfstudio-data --data data/bear ``` Then edit the 3DGS by running: ```bash ns-train gaussctrl --load-checkpoint {unedited_models/bear/splatfacto/.../nerfstudio_models/step-000029999.ckpt} --experiment-name bear --output-dir outputs --pipeline.datamanager.data data/bear --pipeline.prompt "a photo of a polar bear in the forest" --pipeline.guidance_scale 5 --pipeline.chunk_size 3 --pipeline.langsam_obj 'bear' ``` In our experiments, We sampled 40 views randomly from the entire dataset to accelerate the method, which is set in `gc_datamanager.py` by default. We split the entire set into 4 subsets, and randomly sampled 10 images in each subset split. Feel free to decrease/increase the number to see the difference by modifying `--pipeline.datamanager.subset-num` and `--pipeline.datamanager.sampled-views-every-subset`. Set `--pipeline.datamanager.load-all` to `True`, if you want to edit all the images in the dataset. ## :camera: View Results Using NeRFStudio Viewer ```bash ns-viewer --load-config {outputs/.../config.yml} ``` ## :movie_camera: Render Your Results - Render all the dataset views. ```bash ns-gaussctrl-render dataset --load-config {outputs/.../config.yml} --output_path {render/EXPEIMENT_NAME} ``` - Render a mp4 of a camera path ```bash ns-gaussctrl-render camera-path --load-config {outputs/.../config.yml} --camera-path-filename data/EXPEIMENT_NAME/camera_paths/render-path.json --output_path render/EXPEIMENT_NAME.mp4 ``` ## Evaluation We use [this code](https://github.com/ayaanzhaque/instruct-nerf2nerf/tree/main/metrics) to evaluate our method. ## Citation If you find this code or find the paper useful for your research, please consider citing: ``` @article{gaussctrl2024, author = {Wu, Jing and Bian, Jia-Wang and Li, Xinghui and Wang, Guangrun and Reid, Ian and Torr, Philip and Prisacariu, Victor}, title = {{GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing}}, journal = {ECCV}, year = {2024}, } ```