RonLek / FastV2C-HandNet

Repository for the implementation of "FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks"
https://arxiv.org/abs/1907.06327
10 stars 2 forks source link

Why the functions pixel2world and world2pixel are implemented differently for different datasets? #3

Closed LYKlyk closed 4 years ago

LYKlyk commented 4 years ago

In V2V github website "https://github.com/mks0601/V2V-PoseNet_RELEASE". I find the functions pixel2world and world2pixel are implemented differently for different datasets. In dataset MSRA: world2pixel(x,y,z) local pixelY = imgHeight/2 - fy torch.cdiv(y,z) In dataset ICVL: world2pixel(x,y,z) local pixelY = imgHeight/2 + fy torch.cdiv(y, z) So can you provide your ICVL dataset related code? The formulate is whether right? Thank you!

RonLek commented 4 years ago

As you can see from our paper we have demonstrated the results on the MSRA dataset as of now. The code for the ICVL dataset is yet in progress. As for your doubt of different implementations, it's just a different way the V2V authors have done it for the two datasets. The implied meaning is the same. Going by their code: MSRA

"""pixel2world(x, y, z)"""
local worldY = (imgHeight/2 - y) * z / fy  -- (i)
"""world2pixel(x, y, z)"""
local pixelY = imgHeight/2 - fy * y / z 

If you substitute pixelY in place of y in (i), you get as follows:

worldY = (imgHeight/2 - (imgHeight/2 - fy y / z) z / fy worldY = y

which is true!

Similarly, ICVL

"""pixel2world(x,y,z)"""
local worldY = (y - imgHeight/2) * z / fy  -- (ii)
"""world2pixel(x,y,z)"""
local pixelY = imgHeight/2 + y / z * fy

Substitute pixelY in place of y in (ii), you get as follows:

worldY = ((imgHeight/2 + y / z fy) - imgHeight/2) z / fy worldY = y

which is again true!

Therefore, since the implementation differs in pixel2world, it differs in world2pixel. Mathematically, both are absolutely correct and mean the same.

LYKlyk commented 4 years ago

Thank you very much!