Daniil-Osokin / lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU.
Apache License 2.0
656 stars 138 forks source link

Converting Camera Space co-ordinates to pixels format #33

Closed aswin-datakalp closed 3 years ago

aswin-datakalp commented 4 years ago

Hi @Daniil-Osokin ,

Thanks for reverting back on the thread here : https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch/issues/28#issue-665631590

By using the parsing function : https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch/blob/151822507d5b5d4f8e4e11e09b4d6756c695a4ec/demo.py#L97 , I am able to get the 3d poses in Camera space as follows :

neck = [-190.62363 -143.5822 589.5959 0.7935022] nose = [-195.41003 -150.29758 570.4913 0.6902704] body_center = [-187.9092 -105.43911 617.3712 -1. ] l_shoulder = [-177.8263 -141.78792 584.78796 0.8494377] l_elbow = [-169.89859 -121.25131 595.30255 0.8332423] l_wrist = [-173.83104 -100.761116 599.0281 0.80825585] l_hip = [-178.46548 -105.983055 618.76904 0.80648357] l_knee = [-174.8669 -79.69078 633.88434 0.72061825] l_ankle = [-169.39319 -56.625427 660.8477 -1. ] r_shoulder = [-201.56178 -144.06097 592.5884 0.83254266] r_elbow = [-201.56342 -125.039116 607.90967 0.77501464] r_wrist = [-199.14009 -110.08909 602.84406 0.72150564] r_hip = [-188.06558 -107.70375 621.0893 0.7721691] r_knee = [-185.46873 -84.546844 643.06537 0.78837216] r_ankle = [-1.7721944e+02 -5.9826057e+01 6.6426001e+02 6.0836148e-01] r_eye = [-192.99725 -151.44644 569.71234 -1. ] l_eye = [-186.4578 -152.53731 573.75745 -1. ] r_ear = [-1.9430800e+02 -1.4779059e+02 5.7083228e+02 3.1776953e-01] l_ear = [-1.9454225e+02 -1.4477330e+02 5.7405463e+02 5.1655698e-01]

Camera space represents coordinates in X, Y, Z format. If we look into the results obtained above, always the X and Y coordinate values are negative for key points. Sometimes the Z coordinate value is also negative.

Is there a way or function to convert this camera space format coordinates to pixels format coordinates ?

Note : Since the 2D-coordinates obtained from the parsing function is not accurate at times, willing to derive it from 3D.

Thanks in advance !!

Daniil-Osokin commented 4 years ago

Hi, you can try usual world to pinhole camera coordinates mapping: https://staff.fnwi.uva.nl/r.vandenboomgaard/IPCV20172018/LectureNotes/CV/PinholeCamera/PinholeCamera.html.

aswin-datakalp commented 3 years ago

Hi @Daniil-Osokin ,

Thanks for the pointers.

Will try implementing it.

Daniil-Osokin commented 3 years ago

Great, that it helped.