Closed Iven-Wu closed 2 years ago
I also have this problem. Our input Video size is 512*512, and my camera is fixed.
I'm currently training VISER on our synthetic dataset. But when we calculate the matching loss in VISER, it always does to pdb in the first iteration. Our input video size is 1024*1024, would this lead to the problem? Also, our dataset has the camera moving between frames. It seems that the main difference between our video and the video in the demo is the resolution.
The pdb sentence is shown as followed. I tested on many videos of our datasets with various camera initial locations and animal actions. Most of them stopped in this sentence within 1 or 2 iterations, others stopped several iterations later.
the author said that "We find ViSER to be sensitive to the random initialization of network parameters. We run optimization with different random seeds for initializing the network parameters and find some perform considerably worse than the others, due to the convergence to bad local optima. " in paper. I guess this is why our data do not work.
@Iven-Wu The script is stopped because the rendered silhouette does not overlap with the observed silhouette. My guess is that the principle points are not initialized properly.
Can you share the bash script (including the args passed to optimize.py
) and the .config file?
@gengshan-y I slightly changed the bash script for dance-twirl, only the datapath. As for the .config file, I also use most part of the dance-twirl script. The datapath, init_frame, end_frame are changed for our dataset, with 60 frames in total.
For videos of 1024x1024, you need to set ppx=512
and ppy=512
in the config file here.
This will initialize the principal pints of the renderings to the image center.
Thanks a lot! I miss that part previously.
For videos of 1024x1024, you need to set
ppx=512
andppy=512
in the config file here.This will initialize the principal pints of the renderings to the image center.
If the object is not centered, you want to set ppx
and ppy
roughly to the object center location (in pixel) to make sure rendered and observed silhouette image have overlapping pixels.
Besides passing principal points to the config file, another option is to pass --cnnpp
to optimize.py
, which optimizes an image CNN to predict principal points. In this case, we have some mechanism here to ensure the silhouette rendering and ground-truth overlaps.
I'm currently training VISER on our synthetic dataset. But when we calculate the matching loss in VISER, it always does to pdb in the first iteration. Our input video size is 1024*1024, would this lead to the problem? Also, our dataset has the camera moving between frames. It seems that the main difference between our video and the video in the demo is the resolution.
The pdb sentence is shown as followed. I tested on many videos of our datasets with various camera initial locations and animal actions. Most of them stopped in this sentence within 1 or 2 iterations, others stopped several iterations later. https://github.com/gengshan-y/viser-release/blob/a3943ad80d391f1b60379524de3c5d07f924c6bd/nnutils/mesh_net.py#L806-L808