MoyGcc / vid2avatar

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)
https://moygcc.github.io/vid2avatar/
Other
1.2k stars 102 forks source link

SynWild dataset inaccurate body poses #50

Closed alvaro-budria closed 9 months ago

alvaro-budria commented 9 months ago

Hi, I was visualizing the body poses provided as 'ground truth' in the SynWild dataset, and came across some very noisy ones, for example:

Screenshot from 2023-09-19 09-58-33

However these poses are described as ground truth by the filename, which seems not to be the case. In the paper, you mention that the ground truth geometry is obtained from a dense MVS system on the dynamic 4D scene. So there shouldn't be any ground truth poses at all.

So I guess these poses were estimated with some other method, like ROMP. Is this correct?

MoyGcc commented 9 months ago

Hi, they are indeed quasi "ground-truth" poses as the SMPL is fitted using dense camera rigs and 3D meshes. The only problem is we don't provide the ground-truth camera parameters explaining that the 2D overlay isn't ideal. But you could try to align the SMPL with human meshes in 3D and they are aligned well. We use SMPL meshes to align the reconstructed human surfaces across different baselines which doesn't need camera information. We lose the real camera parameters while rendering the images in UE5 without saving them... That causes a bit of problems but is still okay for quantitative evaluation.

alvaro-budria commented 9 months ago

I see, thanks for the clarification.

However, now I don't understand why you refine all SMPL parameters, including body poses. https://github.com/MoyGcc/vid2avatar/blob/a1ab86a1cafc5a6e6be61bd8ef16c9c19711a415/code/v2a_model.py#L55 The camera parameters (extrinsics and intrinsics) and global rotation and translation estimated by ROMP are inaccurate, whereas body poses are correct. Could you please elaborate on this a bit more?

MoyGcc commented 9 months ago

We never used the ground-truth SMPL poses for our training and that's why we always optimize the SMPL parameters together with the avatar training. The ground-truth SMPL poses are only used to align the 3D reconstructions that are in different coordinate spaces for quantitative evaluation (e.g., using Procrustes alignment).

alvaro-budria commented 9 months ago

Ok, thanks a lot for explaining.