gengshan-y / viser

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.
https://viser-shape.github.io/
Apache License 2.0
73 stars 6 forks source link

can not train on davis car turn #1

Open VisonSpace opened 2 years ago

VisonSpace commented 2 years ago

can not train on car turn have pdb tracing bug how to solve it?

VisonSpace commented 2 years ago

why train on a subset of video(id 22-id42 in the breakdance-flare)? why not train with the whole video

gengshan-y commented 2 years ago

How is the data pre-processed? The pre-processing script hasn't been added yet.

why train on a subset of video(id 22-id42 in the breakdance-flare)?

We start from random root poses (equivalent to camera pose). Training on all videos does not always produce correct root pose of the dancer, possibly due to limited batch size.

VisonSpace commented 2 years ago

Does the model only handle the situation when the camera is not moving?

gengshan-y commented 2 years ago

In this work, moving camera is treated as static camera + moving root body.

VisonSpace commented 2 years ago

batchsize 4 only cost 4Ggpu memory. you can improve the batchsize. Are you saying that the larger the batchsize, the better performance?

VisonSpace commented 2 years ago

I used your preprocess code (https://github.com/gengshan-y/viser-release/blob/main/preprocess/README.md) but got stuck (pdb) where you break. Commenting out can be trained, but it seems that the program has bugs. you can try it with 'car-turn'

VisonSpace commented 2 years ago

Isn't this place data preprocessing code? (https://github.com/gengshan-y/viser-release/blob/main/preprocess/README.md)

gengshan-y commented 2 years ago

Can you point me where the break happens?

Unfortunately, I have limited capacity do further experiments. In my experience, larger batch size stabilizes training and improves performance. But note that solving dynamic shapes with large deformation from 2D is very under-constrained. I cannot guarantee it works even 2x batch size is used.

You are welcome to test it and let me know if that works.

VisonSpace commented 2 years ago

Re

Isn't this place data preprocessing code? (https://github.com/gengshan-y/viser-release/blob/main/preprocess/README.md)

I need to make it clear that the codebase's preprocessing code is complete

VisonSpace commented 2 years ago

the result on car-turn is far from useful

VisonSpace commented 2 years ago

at the beginning, the train/flowobs map is black (on tensorboard).

gengshan-y commented 2 years ago

I remembered it wrong and the pre-processing code was tested. I would suggest do the following.

  1. check whether flow is correctly computed in the FlowFW/Full-Resolution/$sequence-name/ folder
  2. check whether the code loads "black" flow images for breakdance sequence.
  3. find the difference of data format between your sequence and breakdance sequence