a1600012888 / PhysDreamer

Code for PhysDreamer
437 stars 20 forks source link

Train the model on the common-used datasets #3

Closed HyoKong closed 2 months ago

HyoKong commented 2 months ago

Hi, thank you so much for releasing the excellent work!

Can the proposed pipeline work for vanilla deformable 3d gs tasks? For example, applying your pipeline to the Neu3D, HyperNerf, or Nerfies datasets?

Thanks for the help in advance!

a1600012888 commented 2 months ago

I tried the ficus scene in the Nerf-synthetic dataset. It's a green plant with whitebackground. And it does not work well. Major reason I found is that video diffsion model cannot generate realistic motions for such synthetic data with white background.

I am not sure about trying it on HyperNerf, Nerfies and Neu3D. If you plan to try it, I have two suggestions

  1. Get a Good 3D reconstructions first. No floaters, and no broken branches. (if the scene has lots of occlusions, then there might be broken branches, and if you run simulations with that, that broken branches might just fly away)
  2. Try different video generation models, to see if these models can generate good videos for this scene. I think when open-source video models becoming stronger and stronger, we will be able to handle more complex scenes easily with this pipeline.

If you look are the examples I showed on the website, I mostly focus on object with simple geometries and clean background.

HyoKong commented 2 months ago

Thanks for the reply. I'd be grateful if you can help to answer the following questions:

  1. How to train the parameters? Since the MPM method is similar to a Markov process and we can only obtain the velocity, stress, or other parameters of at $t$ based on the state at $t-1$, how do you train the parameters exactly? Let's say we have 100 frames in the training videos, do you forward your pipeline 100 times and do one back propagation?

  2. How to deal with the inplace operation? Since all the states of the particles need to be updated after p2g and g2p, how do you deal with the inplace operation? For training videos with 100 frames, do you maintain and optimize one state for each frame?

Thanks!

a1600012888 commented 2 months ago

Hi, I think you asked two different question.

  1. What's the gradient flow looks like in training. I mentioned about it in section 4.2 of the paper. A breify summary: When optimizing the initial velocity, gradient need to flow from the final frames to the first frame(since the initial velocity is only applied at the first frame). When optimizing the material parameters, the gradient only propogate from frame{i+1} back to frame{i}. Note that when gradient propagating too many frames, there might be issues of gradient vanishing/explosion.

  2. Deal with inplace operation. This is more like an implementation problem. Short answer, yes I need to remove all inplace operations in warp and preserve all intermediate state so that gradient from warp(the differentiable simulation engine) can correctly be computed. However, to avoid huge memory consuption, I added a gradient checkpointing to save memory. Details about gradient checkpointing: Suppose you need to simulate 1000 substeps to compute the final loss. I only save the state at 100, 200, 300, ... 900 steps when doing forward propagation (the forward simulation and gradient backward conceptually behaves like the forward prop and backward prop of DNNs), when computing the gradients of the final state w.r.t the input, I will first use stored states at 900 substeps to recompute all the states from step-900 to step-1000, then back propagate the gradients in that segment. And then continue to step-800 to step-900. This techniques is well explained here: https://arxiv.org/abs/1604.06174

HyoKong commented 2 months ago

thanks for the detailed reply!

I'm still a little bit confused when optimizing the material parameters. If there is any misleading, pls feel free to correct me. Let's assume we want to optimize $E$ by using $t$-th frame $I_t$. Since we only got the initial point cloud and the velocity, we need to inference p2g and g2p many times for obtaining the position of the point cloud at $t-1$ timestep. During this procedure, there is no grad recorded. After that, p2g and g2p are applied with grad and we calculate the loss between $I_t$ and the rendered $t$th frame. $E$ is optimized based on this loss. Is this the correct training procedure?

My concern is that since the MPM is a recursive procedure and we only have the initial point cloud, it is challenging to optimize parameters recursively and the errors will be gradually amplified. Though $E$ won't be changed across time, we still cannot guarantee the movement of the point cloud based on the initial point clouds and corresponding velocities.

Thank you so much for explaining the training process!

a1600012888 commented 2 months ago

Your description of the method is correct and your concern is also correct.

If you material is far off, then in later frames (frame $I_{t}$ with large t) in motion will also be far off, and the gradient provided by the rendering loss would be kind of meaning less. So avoid this, you might want to start with small t, and gradually increase t.

HyoKong commented 2 months ago

Thank you so much for the help!