Closed what-is-available-for-name closed 7 months ago
https://github.com/QitaoZhao/ContextAware-PoseFormer/blob/a2456578e8cd25f9fd99dacdf81d2e3623ca127b/ContextPose/mvn/models/loss.py#L36-L46 X0 = keypoints_gt - muX
should not be zero as muX
is the mean over the joint dimension. We previously found that this issue may happen when running with multiple GPUs. Is that the case for you?
https://github.com/QitaoZhao/ContextAware-PoseFormer/blob/a2456578e8cd25f9fd99dacdf81d2e3623ca127b/ContextPose/train.py#L75-L85 https://github.com/QitaoZhao/ContextAware-PoseFormer/blob/a2456578e8cd25f9fd99dacdf81d2e3623ca127b/ContextPose/train.py#L109-L119 In our previous case, the error you mentioned may happen if we use the torch.utils.data.distributed.DistributedSampler
in val_dataloader
as in train_dataloader
. Therefore, we removed it in val_dataloader
in our current implementation, which should already fix the error. Could you please also check this?
X0 = keypoints_gt - muX
should not be zero asmuX
is the mean over the joint dimension. We previously found that this issue may happen when running with multiple GPUs. Is that the case for you?
No, I met this case with just one GPU. And my code is consistent with yours about the dataloader part.
But there is still some parts of normX
equal to 0.
when debugging, i typed print((normX == 0).sum())
and it returned 10000
I suppose this happens because some parts of keypoints_gt
are all zero. You can print out to check if this is the case. If so, there might be something wrong with data processing.
Sorry that I forgot to reply. The bug happened because I didn't train over all batches and it faded automatically when I train one epoch completely
Thanks so much !
Thank you for your excellent work! However, I found an error (maybe a bug?) in your implementation. First, you set the absolute 3d keypoint ground truth to relative coordinates by
keypoints_3d_gt[:, :, 1:] -= keypoints_3d_gt[:, :, :1]
keypoints_3d_gt[:, :, 0] = 0
in ContextAware-PoseFormer/ContextPose/mvn/datasets/utils.py line 44. And then the 0th keypoint's coordinate would be set to 0. And this would cause an error when evaluating the results after an epoch. Because inContextAware-PoseFormer/ContextPose/mvn/models/loss.py , P_MPJPE loss,
you devideX0
by 0 and generatenan
in keypoints coordinate, which would raise an error innp.linalg.svd(H)
. Could you please tell me how to solve with this error?