-----------------
### Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
[![arXiv](https://img.shields.io/badge/arxiv-2312.04963-b31b1b?style=plastic&color=b31b1b&link=https%3A%2F%2Farxiv.org%2Fabs%2F2312.04963)](https://arxiv.org/abs/2312.04963)
[![website](https://img.shields.io/badge/Project-Website-brightgreen)](https://bidiff.github.io/)
- [x] Implement BiDiff on [diffusers](https://github.com/huggingface/diffusers) (training && inference).
- [x] Replace NeuS with FlexiCubes.
- [ ] Release the weights trained on Objaverse-LVIS.
- [ ] Release the processed training data.
- [ ] Release the data processing scripts.
- [ ] Re-train our model on Objaverse-XL.
- [ ] Hugging Face live demo.
- [x] Support fully decoupled texture and geometry control (below are results from BiDiff sampling).
### NEWS
- BiDiff supports fully decoupled texture and geometry control now.
- We implement an initial version of BiDiff on diffusers and improve the 3D representation from **NeuS** to [**FlexiCubes**](https://research.nvidia.com/labs/toronto-ai/flexicubes/).
- Data, weights, and a more detailed document are coming.
### 1. High-quality 3D Object Generation
Click the GIF to access the high-resolution video.
|
|
"An eagle head." |
"A GUNDAM robot." |
|
|
"A Nike sport shoes." |
"A house in Van Gogh style." |
### 2. Meshes with Authentic Textures
Click the GIF to access the high-resolution video.
### 3. Biredtional Diffusion (BiDiff) Framework
Most 3D generation research focuses on up-projecting 2D
foundation models into the 3D space, either by minimizing
2D Score Distillation Sampling (SDS) loss or fine-tuning
on multi-view datasets. Without explicit 3D priors, these
methods often lead to geometric anomalies and multi-view inconsistency. Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on
3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets.
To harness the advantages of both approaches, we propose
Bidirectional Diffusion (BiDiff), a unified framework that
**incorporates both a 3D and a 2D diffusion process, to preserve both 3D fidelity and 2D texture richness, respectively**.
Moreover, as a simple combination may yield inconsistent
generation results, we further bridge them with novel bidirectional guidance. In addition, our method can be used
as an initialization of optimization-based models to further
improve the quality of 3D model and efficiency of optimization, reducing the process from 3.4 hours to 20 minutes.
Experimental results have shown that our model achieves
high-quality, diverse, and scalable 3D generation
The BiDiff framework operates as follows: (a) At each step of diffusion, we render the 3D diffusion's intermediate outputs into 2D images, which then guide the denoising of the 2D diffusion model. Simultaneously, the intermediate multi-view outputs from the 2D diffusion are re-projected to assist the denoising of the 3D diffusion model. Red arrows show the bidirectional guidance, which ensures that both diffusion processes evolve coherently. (b) We use the outcomes of the 2D-3D diffusion as a strong initialization for optimization methods, allowing for further refinement with fewer optimization steps.
### 4. Quantities of Results.
### Getting Started
The code is tested on torch 2.0.1 and cuda 11.7. Data and weights will be uploaded to [here](https://drive.google.com/drive/folders/1qoHrHVcadVt9Dp7tbubkVtFwsND_L0WR?usp=sharing).
```sh
# cuda 11.7 torch 2.0.1 diffusers origin 0.18.0.dev0
pip install -e ".[torch]"
pip install git+https://github.com/NVlabs/nvdiffrast/
pip install kaolin==0.15.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.0.1_cu117.html
sudo apt-get install libsparsehash-dev
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
pip install imageio trimesh tqdm matplotlib torch_scatter ninja einops
```
### Train
We provide a sh file for training. Please modify parameters and gpus in it.
```bash
cd ./examples/bidiff
bash ./scripts/train_bidiff.sh
```
### Inference
We provide a sh file for inference.
```bash
cd ./examples/bidiff
bash ./scripts/sample_bidiff.sh
```
And you can specify the batch inference configure file by ```--sample_config_file```. In the configure file (json), you can specify multiple prompts and parameters, and the number of all parameters should be consistent. Inference will be executed repeatedly with prompts x negative_prompts x PARAMETERS times.
## Citation
If the paper and the code are helpful for your research, please kindly cite:
```
@article{ding2023text,
title={Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors},
author={Ding, Lihe and Dong, Shaocong, and Huang, Zhanpeng, and Wang, Zibin and Zhang, Yiyuan and Gong, Kaixiong and Xu, Dan and Xue, Tianfan},
journal={arXiv preprint arXiv:2312.04963},
year={2023},
}
```