anibali / margipose

3D monocular human pose estimation
Apache License 2.0
98 stars 20 forks source link

How normalize the coordinate to [-1, 1]? #20

Closed guker closed 3 years ago

guker commented 4 years ago

https://github.com/anibali/pose3d-utils/blob/8ecab2c9c842d7a41c59f01cb8d00fd3166bb3e0/pose3d_utils/skeleton_normaliser.py#L8

def make_projection_matrix(z_ref, intrinsics, height, width):
    """Build a matrix that projects from camera space into clip space.
    Args:
        z_ref (float): The reference depth (will become z=0).
        intrinsics (CameraIntrinsics): The camera object specifying focal length and optical centre.
        height (float): The image height.
        width (float): The image width.
    Returns:
        torch.Tensor: The projection matrix.
    """

    # Set the z-size (depth) of the viewing frustum to be equal to the
    # size of the portion of the XY plane at z_ref which projects
    # onto the image.
    size = z_ref * max(width / intrinsics.alpha_x, height / intrinsics.alpha_y)

    # Set near and far planes such that:
    # a) z_ref will correspond to z=0 after normalisation
    #    zref=2fn/(f+n)
    # b) The distance from z=-1 to z=1 (normalised) will correspond
    #    to `size` in camera space
    #    f−n=size
    far = 0.5 * (sqrt(z_ref ** 2 + size ** 2) + z_ref - size)
    near = 0.5 * (sqrt(z_ref ** 2 + size ** 2) + z_ref + size)

    # Construct the perspective projection matrix.
    # More details: http://kgeorge.github.io/2014/03/08/calculating-opengl-perspective-matrix-from-opencv-intrinsic-matrix
    m_proj = intrinsics.matrix.new([
        [intrinsics.alpha_x / intrinsics.x_0, 0, 0, 0],
        [0, intrinsics.alpha_y / intrinsics.y_0, 0, 0],
        [0, 0, -(far + near) / (far - near), 2 * far * near / (far - near)],
        [0, 0, 1, 0],
    ])

    return m_proj

what is the meaning of size in code?

anibali commented 4 years ago

We need to specify where the near and far planes of the frustum are (i.e. where z=-1 and z=1 are in clip space). size effectively controls the separation between the z=-1 and z=1 planes (size is that distance in camera space). We calculate size using the width/height in order to obtain a square-ish frustum. z=0 corresponds to z_ref in camera space.

guker commented 4 years ago

got it, thanks.

5cat commented 3 years ago

Hey anibali,

I thought about opening a new issue request but I think it is more appropriate to open this issue again. May I ask what purpose does this line sqrt(z_ref ** 2 + size ** 2) serve ? I can see there is a triangle somewhere but I'm not sure why is it there. wouldn't 0.5 * (z_ref +/- size) get the job done intuitively? All I know the sqrt(z_ref ** 2 + size ** 2) is important for z_ref=2fn/(f+n) to be equal.

Also, I don't know why the far side is z_ref - size and the near frustum side is z_ref + size, shouldn't it be the opposite? or is the z-axis is negative from the camera point of view? because when I calculate far - near it appears to be equal to -size not size.

anibali commented 3 years ago

There are lots of different ways in which you could define a normalised space if you wanted. However, I decided to adhere to two properties to define mine: the "depth" of the space is based on the width/height (making it a cube assuming those two dimensions are equal), and z=0 in normalised space corresponds to the reference z in camera space. The "flipped" z-axis was a result of wanting a right-handed coordinate system, if I recall correctly.