dariopavllo / convmesh

Code for "Convolutional Generation of Textured 3D Meshes", NeurIPS 2020
MIT License
114 stars 18 forks source link

Custom images #2

Open whatdhack opened 3 years ago

whatdhack commented 3 years ago

@dariopavllo , congratulations on your presentation at NIPS 2020. Interesting work. Have a few quick questions.

  1. What exactly would be involved in using custom images to generate 3D meshes and textures ? Would fine-tuning work ?
  2. Looking to generate 3D meshes and then render views from different directions of custom images (i.e keeping geometry and texture constant) - would that be doable ? If so what would that pipeline look like for training and inference ?
dariopavllo commented 3 years ago

Hi,

  1. You just need to retrain the model on your dataset. While fine-tuning is in principle possible with GANs, it's better to retrain from scratch for best results. The dataset must contain segmentation masks (you can use Mask R-CNN to infer them) as well as 3D poses (or alternatively, keypoints or something else from which you can estimate poses).
  2. Yes, that's actually guaranteed by the model. The generator produces a full 3D mesh and a full texture, so you're free to render it from any view.
whatdhack commented 3 years ago

@dariopavllo thanks. So as I understand there are 3 steps in training from scratch - convmesh , inverse rendering and GAN training in order. So for my custom image sets, all 3 steps have to be repeated, right ? As I can see, the keypoints are used only in the convmesh step, right ?

Expanding on my second question. Suppose I have an image, and I want to generate a different view of that image. The GAN output depends on the z input. So, how should I select z so that GAN generates mesh and texture that is exactly same ( or very similar ) to the image .

dariopavllo commented 3 years ago

Yes, you would need to repeat the 3 steps. Poses/keypoints are only used for the first step.

Regarding your second question, I think what you want to do is more related to a reconstruction approach (as opposed to generation). For instance, you could use CMR to obtain the 3D mesh from an image, and then render it from a different view. If you still want to do it using a GAN, one idea is to pass both the rendered image and the "target" image through a VGG network, and minimize their difference in feature space (w.r.t. the latent code z). This is what people usually do for models like StyleGAN.

whatdhack commented 3 years ago

@dariopavllo , thanks. The first step, convmesh, is a stripped down version of cmr as you pointed out in the paper, hence that alone can theoretically generate the geometry and texture for a different pose. Would be great to read your thoughts on that.

dariopavllo commented 3 years ago

Yes, you are correct! The only thing is that we don't use any perceptual losses for that step (since we don't care about texture quality -- it is thrown away anyway). If you care about texture quality, you should use CMR (or equivalent) or improve texture supervision for the first step of convmesh.