gengshan-y / viser

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.
https://viser-shape.github.io/
Apache License 2.0
73 stars 6 forks source link

Problems of matching loss with other dataset #4

Closed Iven-Wu closed 2 years ago

Iven-Wu commented 2 years ago

I'm currently training VISER on our synthetic dataset. But when we calculate the matching loss in VISER, it always does to pdb in the first iteration. Our input video size is 1024*1024, would this lead to the problem? Also, our dataset has the camera moving between frames. It seems that the main difference between our video and the video in the demo is the resolution.

The pdb sentence is shown as followed. I tested on many videos of our datasets with various camera initial locations and animal actions. Most of them stopped in this sentence within 1 or 2 iterations, others stopped several iterations later. https://github.com/gengshan-y/viser-release/blob/a3943ad80d391f1b60379524de3c5d07f924c6bd/nnutils/mesh_net.py#L806-L808

lingtengqiu commented 2 years ago

I also have this problem. Our input Video size is 512*512, and my camera is fixed.

lingtengqiu commented 2 years ago

I'm currently training VISER on our synthetic dataset. But when we calculate the matching loss in VISER, it always does to pdb in the first iteration. Our input video size is 1024*1024, would this lead to the problem? Also, our dataset has the camera moving between frames. It seems that the main difference between our video and the video in the demo is the resolution.

The pdb sentence is shown as followed. I tested on many videos of our datasets with various camera initial locations and animal actions. Most of them stopped in this sentence within 1 or 2 iterations, others stopped several iterations later.

https://github.com/gengshan-y/viser-release/blob/a3943ad80d391f1b60379524de3c5d07f924c6bd/nnutils/mesh_net.py#L806-L808

the author said that "We find ViSER to be sensitive to the random initialization of network parameters. We run optimization with different random seeds for initializing the network parameters and find some perform considerably worse than the others, due to the convergence to bad local optima. " in paper. I guess this is why our data do not work.

gengshan-y commented 2 years ago

@Iven-Wu The script is stopped because the rendered silhouette does not overlap with the observed silhouette. My guess is that the principle points are not initialized properly.

Can you share the bash script (including the args passed to optimize.py) and the .config file?

Iven-Wu commented 2 years ago

@gengshan-y I slightly changed the bash script for dance-twirl, only the datapath. As for the .config file, I also use most part of the dance-twirl script. The datapath, init_frame, end_frame are changed for our dataset, with 60 frames in total.

gengshan-y commented 2 years ago

For videos of 1024x1024, you need to set ppx=512 and ppy=512 in the config file here.

This will initialize the principal pints of the renderings to the image center.

Iven-Wu commented 2 years ago

Thanks a lot! I miss that part previously.

gengshan-y commented 2 years ago

For videos of 1024x1024, you need to set ppx=512 and ppy=512 in the config file here.

This will initialize the principal pints of the renderings to the image center.

If the object is not centered, you want to set ppx and ppy roughly to the object center location (in pixel) to make sure rendered and observed silhouette image have overlapping pixels.

Besides passing principal points to the config file, another option is to pass --cnnpp to optimize.py, which optimizes an image CNN to predict principal points. In this case, we have some mechanism here to ensure the silhouette rendering and ground-truth overlaps.