Closed alvaro-budria closed 9 months ago
Hi, they are indeed quasi "ground-truth" poses as the SMPL is fitted using dense camera rigs and 3D meshes. The only problem is we don't provide the ground-truth camera parameters explaining that the 2D overlay isn't ideal. But you could try to align the SMPL with human meshes in 3D and they are aligned well. We use SMPL meshes to align the reconstructed human surfaces across different baselines which doesn't need camera information. We lose the real camera parameters while rendering the images in UE5 without saving them... That causes a bit of problems but is still okay for quantitative evaluation.
I see, thanks for the clarification.
However, now I don't understand why you refine all SMPL parameters, including body poses. https://github.com/MoyGcc/vid2avatar/blob/a1ab86a1cafc5a6e6be61bd8ef16c9c19711a415/code/v2a_model.py#L55 The camera parameters (extrinsics and intrinsics) and global rotation and translation estimated by ROMP are inaccurate, whereas body poses are correct. Could you please elaborate on this a bit more?
We never used the ground-truth SMPL poses for our training and that's why we always optimize the SMPL parameters together with the avatar training. The ground-truth SMPL poses are only used to align the 3D reconstructions that are in different coordinate spaces for quantitative evaluation (e.g., using Procrustes alignment).
Ok, thanks a lot for explaining.
Hi, I was visualizing the body poses provided as 'ground truth' in the SynWild dataset, and came across some very noisy ones, for example:
However these poses are described as ground truth by the filename, which seems not to be the case. In the paper, you mention that the ground truth geometry is obtained from a dense MVS system on the dynamic 4D scene. So there shouldn't be any ground truth poses at all.
So I guess these poses were estimated with some other method, like ROMP. Is this correct?