Closed zhoujun-7 closed 3 months ago
Hello Jun!
Thank you for your careful review of our code. Regarding your observation about the calculation of contact points, I believe there might be a misunderstanding. It seems your question pertains to how we calculate the base_pts
within the GRAB_Dataset_V19
class in the file dataset_ours_single_seq.py
. I guess it is the following code
dist_rhand_joints_to_obj_pc = torch.sum(
(rhand_joints.unsqueeze(2) - object_pc_th.unsqueeze(1)) ** 2, dim=-1
)
that let you regard that we use GT rhand_joints
for calculating the contact region. However, it's important to note that GRAB_Dataset_V19
serves as the data loader for the first stage, where only hand trajectory information is denoised. While we do compute contact related information within this class, I want to emphasize that we don't use them in the denoising process.
In the subsequent stage where the contact related representation is denoised, the data loader shifts to GRAB_Dataset_V19_From_Evaluated_Info
. Here, we leverage the hand trajectory predicted by the initial stage to compute contact information, as demonstrated in the following code:
if self.wpredverts:
pert_rhand_joints = self.predicted_hand_joints
rhand_joints = self.predicted_hand_joints
rhand_verts = self.predicted_hand_verts
pert_rhand_verts = self.predicted_hand_verts
pert_rhand_joints = torch.matmul(
pert_rhand_joints, object_global_orient_mtx_th
) + object_trcansl_th.unsqueeze(1)
rhand_joints = torch.matmul(
rhand_joints, object_global_orient_mtx_th
) + object_trcansl_th.unsqueeze(1)
rhand_verts = torch.matmul(
rhand_verts, object_global_orient_mtx_th
) + object_trcansl_th.unsqueeze(1)
pert_rhand_verts = torch.matmul(
pert_rhand_verts, object_global_orient_mtx_th
) + object_trcansl_th.unsqueeze(1)
where we set both rhand_joints
and pert_rhand_joints
to the predicted_hand_joints
, which is loaded from the file args.predicted_info_fn
.
The rationale behind computing contact-related information, even based on ground truth joints, within the GRAB_Dataset_V19
class in the file 'dataset_ours_single_seq.py' stems from its modification from the training data loader, GRAB_Dataset_V19
, found in dataset_ours.py
. During the training phase, only clean trajectories with contact information derived from ground truth hand trajectories are utilized. In the modified GRAB_Dataset_V19
in dataset_ours_single_seq.py
, this calculation remains unchanged as it's directly adapted from the training data loader with some adjustments. However, it's important to re-emphasize that during the evaluation of the first stage, the contact information is no longer utilized.
I've modified the logic to eliminate such confusion:
dist_rhand_joints_to_obj_pc = torch.sum(
(pert_rhand_verts.unsqueeze(2) - object_pc_th.unsqueeze(1)) ** 2, dim=-1
)
_, minn_dists_joints_obj_idx = torch.min(dist_rhand_joints_to_obj_pc, dim=-1)
Again, please note that though we calculate contact information, we only denoise hand trajectories in the first stage where the contact related representation is neither used nor denoised.
I'll remove the contact related representation calculation in the first stage's dataloder GRAB_Dataset_V19
to avoid similar confusions ultimately.
Thank you for bringing this to our attention! The codebase still requires some tidying up and lacks a thorough cleanup. I'll be making further commitments to make it more organized.
Best regards, Xueyi
Thanks for your patience in answering my question. Sorry for the misunderstanding. My problem was solved.
Hi Xueyi!
Thanks for sharing this interesting work. The proposed method relies on the contact points on the object. But the calculation of contact points seems to be based on ground truth hand pose rather than the noisy one. If ground truth hand pose is strictly unavailable, can the proposed method still work?
Looking forward to your reply.
BR, Jun