facebookresearch / InterWild

Official PyTorch implementation of "Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild", CVPR 2023
Other
165 stars 15 forks source link

On rendering the mano. #3

Closed aragakiyui611 closed 1 year ago

aragakiyui611 commented 1 year ago

https://github.com/facebookresearch/InterWild/blob/12f68107686d2ad52a151506880c7e3e5f7d0e88/data/InterHand26M/InterHand26M.py#L52

I encountered an exception that missing the file aid_human_annot_*.txt in the InterHand26M.py, could you provide these files? Thank you!

mks0601 commented 1 year ago

Hi, could you check this? Thanks https://github.com/facebookresearch/InterWild#start

aragakiyui611 commented 1 year ago

Thank you every much

aragakiyui611 commented 1 year ago

I am tring to render the mano mesh on the input images but the size is always small. This is my code. Could you help me if possible?

image

model.py.txt

demo_hand_vid.py.txt

multip.py.txt

mks0601 commented 1 year ago

So the problem is that rroot_cam and lroot_cam in your model.py makes consistency between 1) 3D meshes of each right/left hand and 2) images of each right/left hand. Those root_cam are 3D global translation defined between 1) focal length (5000,5000), 2) princpt (128,128), and 3) (256,256) hand image of a single hand. In other word, root_cam are some virtual 3D global translation which has physical meaning only with the above focal lengths, princpt, and hand image size.

So you can render them with their separate rroot_cam and lroot_cam in the original image for the overlay without rel_trans. This should give some wrong rendering as rel_trans is not considered.

On the other hand, rel_trans does not belong to specific focal/princpt/input image shape. It has real physical meaning. So to render two hands properly, we need another root_cam for the two-hand images.

mks0601 commented 1 year ago

I'm not sure there could be some hacky way to render without additional root_cam for the two-hand images, but I think I need it, and that is the reason why rendering two-hand meshes overlaid on two-hand images is not trivial for my method :(

mks0601 commented 1 year ago

So my plan is changing TransNet to additionally output root_cam as it takes two-hand input so that the additional root_cam could make correspondence between two-hand meshes and two-hand 2D inputs. Using that, we could render 3D meshes on images.

aragakiyui611 commented 1 year ago

found the issue it was the img2bb_trans problem. I detail it later

mks0601 commented 1 year ago

yeap img2bb_trans should be a one reason, but fundamentally, I think additional root_cam is necessary as I tried the exact same thing as yours. Please let me know if you have some results :)

aragakiyui611 commented 1 year ago

I change these two line the enlarge_ration to 1.0: https://github.com/facebookresearch/InterWild/blob/f8b753144b9d5230c8136697172efd53f153f35d/main/model.py#L86 and I enlarge the bbox before process_hand_bbox(hand_bbox, img2bb_trans), then the bbox will be consistent when rendering the meshes of 2 hand.

image image

Although using the rroot_cam is not strictly accurate, but visually acceptable. Dose it feasible when using 2D evaluation metrics such as Pck?I am trying to compare hand4whole and InterWild in terms of 2D joint recall Pck, but found hard to adapt the code😭. As you mentioned, additional root_cam is optimal.

mks0601 commented 1 year ago

As you said, using the rroot_cam is not strictly accurate, and projecting 3D two-hand meshes is not possible at this moment. If you're in a hurry, your approximation could be a hacky way for your purpose? I guess one alternative way is to iterative optimize rroot_cam per frame, which would take much time

aragakiyui611 commented 1 year ago

Yes that is a hacky way. Waiting for your code update on global root_cam, Thank you very much! Besides, if I want to calculate the single hand Pck on 2D, can I use rroot_cam and lroot_cam respectively?

mks0601 commented 1 year ago

Yeap. for the single hand pck, you could just use them separately.

txytju commented 1 year ago

https://github.com/facebookresearch/InterWild/blob/12f68107686d2ad52a151506880c7e3e5f7d0e88/data/InterHand26M/InterHand26M.py#L52

I encountered an exception that missing the file aid_human_annot_*.txt in the InterHand26M.py, could you provide these files? Thank you!

@aragakiyui611 同关注这个方向,可以加个微信不哈哈!我微信 txy13512216389

mks0601 commented 1 year ago

Hi, visualization feature is added in here https://github.com/facebookresearch/InterWild/blob/main/demo/demo.py

aragakiyui611 commented 1 year ago

@mks0601 Hi I found that in the dataloader https://github.com/facebookresearch/InterWild/blob/25402806951f273353b0fd2a446346fa7cea5815/common/utils/preprocessing.py#L202 https://github.com/facebookresearch/InterWild/blob/25402806951f273353b0fd2a446346fa7cea5815/common/utils/preprocessing.py#L267 which gives joint_img under output_body_hm_shape, but the model outputs joint_img under output_hand_hm_shape. https://github.com/facebookresearch/InterWild/blob/25402806951f273353b0fd2a446346fa7cea5815/common/nets/module.py#L20 Is that correct or something I missed?

mks0601 commented 1 year ago

You're correct, and that is why I change cfg.output_body_hm_shape to cfg.output_hand_hm_shape in here https://github.com/facebookresearch/InterWild/blob/25402806951f273353b0fd2a446346fa7cea5815/main/model.py#L129

aragakiyui611 commented 1 year ago

oh I forgot this line, thanks