CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
66.81k stars 9.99k forks source link

Text to 3d #344

Closed TheProtaganist closed 1 year ago

TheProtaganist commented 1 year ago

Great job with all your hard work! A new model known as dream fusion has been released but it doesn't seem to be open source. I was wondering if you will ever try to make a model that makes 3d objects free and open source someday?

Flova commented 1 year ago

It is based on Googles Imagen (closed source) and does not require retraining of the 2D diffusion model. I think it could work with stable diffusion instead of Imagen. The 2D diffusion is used as kind of a loss to optimize a NERF for a given caption, generating a queryable MLP for a given caption.

TheProtaganist commented 1 year ago

That's incredible, I can't wait to see when you release an open source model 😁

Flova commented 1 year ago

I thought about doing one, but even tho I trained a few diffusion models and know how a NERF works (never implemented one), I am not confident enough to build it myself. Also my compute (4x 2080ti) might also not be enough for the experiments, if we want to make quick progress (I know that we only need to inference, not train, the stable diffusion model).

enes3774 commented 1 year ago

@Flova You can use Stable Diffusion as a discriminator instead of Google's Imagen. You have to build a 3d generator then you will train only this model. I think you can use Nvidia's GET3D(https://nv-tlabs.github.io/GET3D/ codes are open-source) as a generator. You should change the input dim. I think your computer might work these processes. I want to work on this project if you want.

Flova commented 1 year ago

I don't think they use the diffusion model as a discriminator in the normal sense (like e.g. in GET3D). They use it to slightly refine a random rendering of the nerf scenery, which is subsequently used as the new ground truth for the nerf training. This is done again and again until the whole thing converges to a stable scenery. Therefore, GET3D seems to be not the best nerf basis. I would suggest something like HashNeRF, where we could integrate stable diffusion into the training loop. I am pretty busy, but we could spin up a repo if you want.

enes3774 commented 1 year ago

That is amazing. Stable diffusion can be used to generate scenery images. Yeah I want, we can spin up a repo.

TheProtaganist commented 1 year ago

Guys It's here! No colab notebook yet but the code is available https://github.com/ashawkey/stable-dreamfusion

TheProtaganist commented 1 year ago

They have a colab notebook now