ChanglongJiangGit / A2J-Transformer

[CVPR 2023] Code for paper 'A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image'
Apache License 2.0
85 stars 7 forks source link

How to recover the estimated 3D coordinates to a 2D image #16

Closed Mayo001 closed 11 months ago

Mayo001 commented 11 months ago

Thanks for your code! I found that the estimated key points do not correspond to ground-truth when I visualize it directly from preds. What should I do to recover the estimated 3D coordinates to a 2D image. btw, I'm a beginner, please don't hesitate to teach!

with torch.no_grad():
       for itr, (inputs, targets, meta_info) in enumerate(tqdm(tester.batch_generator,ncols=150)):
            # forward
            start = time.time()
            out = tester.model(inputs, targets, meta_info, 'test')
            end = time.time()

            joint_coord_out = out['joint_coord'].cpu().numpy()
            inv_trans = out['inv_trans'].cpu().numpy()   # 
            joint_vaild = out['joint_valid'].cpu().numpy()

            preds['joint_coord'].append(joint_coord_out)
            preds['inv_trans'].append(inv_trans)
            preds['joint_valid'].append(joint_vaild)

            timer.append(end-start)

            # visualization
            # focal = meta_info['focal'][0]
            # princpt = meta_info['princpt'][0]
            # for j in range(42):
            #     joint_coord_out[0][j,:2] = trans_point2d(joint_coord_out[0][j,:2],inv_trans[0])
            # joint_coord_out[0][:,2] = (joint_coord_out[0][:,2]/cfg.output_hm_shape[0] * 2 - 1) * (cfg.bbox_3d_size/2)
            # joint_coord_out[0][:21,2] += float(targets['rel_root_depth'][0])
            # joint_coord_out[0] = pixel2cam(joint_coord_out[0], focal, princpt)

            plt.imshow(inputs['img'][0].permute(1,2,0))
            plt.scatter(joint_coord_out[0][:21,0],joint_coord_out[0][:21,1])
            plt.scatter(targets['joint_coord'][0][:21,0],targets['joint_coord'][0][:21,1])
            plt.savefig('./visualization/result'+str(itr)+'.png')
            plt.close()
image
ChanglongJiangGit commented 11 months ago

Hello, thanks for your comment. Since the model predicts 2.5d coordinates, the xy coordinates in the model predictions are just the joint coordinates in the image coordinate system that you need. Therefore, you will find that when you remove the few lines of coordinate conversion code after #visualization, the visualized results are normal. As for those conversion codes, they are to convert the 2.5d coordinates in the image coordinate system into 3d camera coordinate system. If you don't understand 2.5d coordinates, you can read the article that proposed the InterHand2.6m dataset. All coordinate transformation methods in our model are the same as the baseline in that article. Hope this can answer your questions!

result0

Mayo001 commented 11 months ago

Thank u very much! I got the desired result after removing the coordinate conversion code.