ashawkey / stable-dreamfusion

Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
Apache License 2.0
8.15k stars 719 forks source link

training in latent space #4

Open guyko81 opened 1 year ago

guyko81 commented 1 year ago

Have you noticed that the latent space is actually just a 4x4 image representation of the same image, each of these small images are 16x16 pixel size? And this is layered into 4 channels. So in that sense we actually have 64 channels of 16x16 size images. So in that sense we can think of the latent space as a 16x16x64 representation of the original 512x512x3 image.

If that's the case, then the original optimisation should work for this 3D training.

I hope I'm right and you can use this idea.

guyko81 commented 1 year ago

image latent

guyko81 commented 1 year ago

well, the idea is even simpler, the reshape added some randomness, here is the correct representation latent2

thuanz123 commented 1 year ago

The thing is that nerf's target is to reconstruct a 3d object in RBG space so making nerf reconstruct from latent space is a whole different story and can be a challenging task, I think there is no work for this ?