NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.15k stars 224 forks source link

MLP to paramterize SDF #81

Closed JiyouSeo closed 2 years ago

JiyouSeo commented 2 years ago

Hello, thank you for your great work. In #9 and in Paper 8.5, You expressed you used DVR based MLP to optimize SDF. And I have several question about it.

  1. Can you tell me how many iter you have run to pre-train sphere?
  2. Did you set the MLP output shape as (size, sdf) which mean (size,1) or (size, sdf+deform) which mean (size, 4) ?
  3. Did you use Modulelist in DVR decoder to process latent condition code c too?
jmunkberg commented 2 years ago

Thanks @JiyouSeo

  1. For sphere initialization, we didn't pre-train, but instead initialized the SDF directly as something like
    # Sphere init
    sdf = (util.length(self.verts) / scale) - 0.45

    around https://github.com/NVlabs/nvdiffrec/blob/main/geometry/dmtet.py#L177

For 2-3, maybe @frankshen07 can comment further.

JiyouSeo commented 2 years ago

Thanks for responding me @jmunkberg !! I was confused you did pre-train for sphere initialization. Now, I see. but I have several question more.

  1. Is there any reason for initializing to sphere in DTU dataset differently??
  2. How did you pre-train MLP then? It means dmtet optimization around https://github.com/NVlabs/nvdiffrec/blob/main/train.py#L594?
  3. What did you use as input of MLP? Please let me know what is input, output of MLP.
self.sdf    = torch.nn.Parameter(sdf.clone().detach(), requires_grad=True)
self.register_parameter('sdf', self.sdf)
self.deform = torch.nn.Parameter(deform, requires_grad=True)
self.register_parameter('deform', self.deform)
frankshen07 commented 2 years ago

Hello @JiyouSeo ,

  1. The pretraining is straightforward – in each iteration, we randomly sample 1000 points inside a unit cube and compute MSE loss between the predicted SDF at these points and SDF of a sphere (with radius of 0.4). We train for 1000 iterations with lr of 1e-3.
  2. We use the network from https://github.com/autonomousvision/differentiable_volumetric_rendering/blob/5a190104b9f8143125beed714d33f265f5006f30/im2mesh/dvr/models/decoder.py#L7. The input is 3D points without condition code c. The output stays the same, and the last 3 channels are interpreted as deformation vector.

Please let me know if anything is unclear.

JiyouSeo commented 2 years ago

Thank you for Replaying @frankshen07 !! I have 2 questions.

  1. So DVR decoder get self.vert as input??
  2. I replaced self.sdf, self.deform from nn.Parameter to outputs of mlp(torch.tensor type), but I got a type error when calculating loss.backward() around https://github.com/NVlabs/nvdiffrec/blob/main/render/mlptexture.py#L73.
    Traceback (most recent call last):
    File "train_ori.py", line 729, in <module>
    geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate, 
    File "train_ori.py", line 471, in optimize_mesh
    total_loss.backward()
    File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 352, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/hooks.py", line 175, in hook
    res = user_hook(self.module, grad_inputs, self.grad_outputs)
    File "/workspace/render/mlptexture.py", line 73, in <lambda>
    self.encoder.register_full_backward_hook(lambda module, grad_i, grad_o: (grad_i[0] / gradient_scaling, ))

    I guess the reason of error is that type of grad_i[0] is None..
    Here is my code which declare mlp network and pretrain to sphere in DMTetGeometry init,

    self.mlp = Decoder(FLAGS=FLAGS, multires=6).to('cuda')
    self.mlp.pre_train_sphere(1000)

    and this is forward process of mlp in getMesh function around https://github.com/NVlabs/nvdiffrec/blob/main/geometry/dmtet.py#L196

    self.mlp.train()
    pred = self.mlp(self.points)
    self.sdf, self.deform = pred[:,0].detach() - 0.1 , pred[:,1:].detach()

    If it is not correct, how can you replace self.sdf, self.deform from nn.Parameter to outputs of mlp(torch.Tensor type)??

frankshen07 commented 2 years ago
  1. Yes
  2. Did you remove those two lines (also for sdf)? self.deform = torch.nn.Parameter(deform, requires_grad=True) self.register_parameter('deform', self.deform)
JiyouSeo commented 2 years ago

@frankshen07 Yes I removed

self.sdf    = torch.nn.Parameter(sdf.clone().detach(), requires_grad=True)
self.register_parameter('sdf', self.sdf)

self.deform = torch.nn.Parameter(deform, requires_grad=True)
self.register_parameter('deform', self.deform)

Do I have to use them too?

JiyouSeo commented 2 years ago

I solved above issue replacing pred[:,0].detach() to pred[:,0] It is because detach() method except the pred[:,0] from requires_grad. But I got new error

Traceback (most recent call last):
  File "train_ori.py", line 736, in <module>
    geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate, 
  File "train_ori.py", line 439, in optimize_mesh
    result_image, result_dict = validate_itr(glctx, prepare_batch(next(v_it), FLAGS.background), geometry, opt_material, lgt, FLAGS)
  File "train_ori.py", line 199, in validate_itr
    buffers = geometry.render(glctx, target, lgt, opt_material)
  File "/workspace/geometry/dmtet.py", line 253, in render
    return render.render_mesh(glctx, opt_mesh, target['mvp'], target['campos'], lgt, target['resolution'], spp=target['spp'], 
  File "/workspace/render/render.py", line 214, in render_mesh
    assert mesh.t_pos_idx.shape[0] > 0, "Got empty training triangle mesh (unrecoverable discontinuity)"
AssertionError: Got empty training triangle mesh (unrecoverable discontinuity)

I guess sdf cannot optimize on the range [-0.1, 0.9]. On the below, there is example of max&min value of sdf when I got error.

tensor(0.1004, device='cuda:0') tensor(0.0342, device='cuda:0')

Did you do any normalization or Clamp SDF value(output of mlp) ??

JiyouSeo commented 2 years ago

I solved ! Thank you @frankshen07

bigfudge2123 commented 2 years ago

Hi,would you mind opening the code about dmet.py? I'm stuck in the same situation. @JiyouSeo Thanks

Shubhendu-Jena commented 9 months ago

Hi, how did you solve the error @JiyouSeo ? Did you clamp the sdf or the deformation?

santisy commented 2 months ago

@JiyouSeo Hi, I also got the "Got empty training triangle mesh (unrecoverable discontinuity)" issue? How did you solve this?