ChanglongJiangGit / A2J-Transformer

[CVPR 2023] Code for paper 'A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image'
Apache License 2.0
85 stars 7 forks source link

Inference without bounding box #17

Open minsuuuj opened 11 months ago

minsuuuj commented 11 months ago

HI I'm student who studying hand pose estimation these days. First of all, thanks a lot for your good project and paper. However, I have a question. I want to inference with my own custom data. (e.g., want to extract 3D keypoints in real time with live camera (webcam) ) But I found there is an function 'augmentation' in utils.preprocessing and it uses bounding box information in advance. (from xxxx_test_data.json) Is there any possible way to inference without bounding box? Thanks and have a wonderful day! :)

ChanglongJiangGit commented 10 months ago

Sorry to keep you waiting. Generally speaking, for hand pose estimation tasks, using bbox to crop images is a very important step. It allows the model to better learn and focus on the location of the hand instead of the cluttered and irrelevant background in the picture. The bbox loaded by our model uses the bbox provided by the data set, and we have not tried to process the image directly without loading. If you want to process live images, I have two suggestions: First, you can try to use detectors such as yolo to detect hands, and then load these bboxes to crop the image. The default input size of our model is 256256, so you can set the size of the bbox to 256256 and then crop the image. Secondly, if you just want to try whether the model can run, you can directly resize the real-time image to 256*256 size, and bbox choose to give a dummy data according to the bbox requirements of the model, such as giving an xywh of [0, 0, 256, 256 ] value to skip the clipping step. However, the performance of the model cannot be guaranteed in this case. Finally, thank you very much for your thoughts and I hope my suggestions will be helpful to you.