Arthur151 / ROMP

Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
https://www.yusun.work/
Apache License 2.0
1.36k stars 231 forks source link

Some clarifications and .npz files visualization #15

Open dmetehan opened 3 years ago

dmetehan commented 3 years ago

I would like to visualize the body landmarks projected onto the 2d image however I have some problems understanding how the values are stored. When I run the demo script I get a file called .npz and I can load it with numpy. Inside it I found f->results where two dictionaries are stored one for each person in the image. Inside each dictionary I can see 'cam' (3,), 'pose' (72,), 'betas' (10,), 'j3d_smpl24' (24, 3), 'j3d_op25' (25, 3), and 'verts' (6890, 3) variables (I put their corresponding shapes in parentheses). From your paper I thought 'pose' variable would be of length 132 (22 landmarks x 6D). Anyway, I assume you have saved 24 landmarks in 3D which makes an array of length 72. From the 'cam' variable (tx, ty = cam[1:]) I calculated the center for each people trivially since those numbers are normalized between -1 and 1 just as you mentioned in your paper. However when it comes to visualizing 'pose', 'j3d_smpl24' or 'verts' variables I ran into some problems. Could you explain how each of these variables store their data? Apparently they are not normalized between -1 and 1. Also I'm having some problems understanding the first number in 'cam' variable which should correspond to scale according to the paper. In the paper it is said that this variable "reflects the size and depth of the human body to some extent.". How can this scale be used to visualize pose points? In my example I get 5.51 for one person and -5.156 for the other. Would you also explain what does a negative scale represent?

Arthur151 commented 3 years ago

Good question! 1.About the 3D-to-2D projection. It is presented at here. The detailed function is at here. To get the projected 2D coordinates, just use 'pj2d' at here. They are normalized (-1~1) coordinates on the input image (not the original image).

  1. About 'cam'. We adopt a 3-dim camera parameter. We don't directly use the estimated scale value. Instead, we take the $(1.1)^scale$ to make sure that the scale value is always positive. These camera parameters are used to project the estimated 3D joints or body vertex back to 2D image via weak-perspective projection.
  2. About 'pose': they are 72-dim SMPL pose parameters, which are 3-dim 3D rotation of 24 SMPL joints 'j3d_smpl24' in axis representation.
  3. About 'j3d_smpl24' or 'verts'. They are in standard SMPL space, not image space. In standard SMPL space, the body center (near pelvis) is located at the origin.

Don't hestiate to let me your question. Best.

dmetehan commented 3 years ago

Thank you very much for the detailed info. Is there a way to map j3d_smpl24 onto the image space?

Arthur151 commented 3 years ago

Of course. It is very simple. Add a line after this line, pj2d_smpl24 = proj.batch_orth_proj(j3d_smpl24, params_dict['cam'], mode='2d')[:,:,:2] 'pj2d_smpl24' is what you want. Please add it to the 'outputs' dict for further use.

lisa676 commented 3 years ago

@Arthur151 Hi dear Yu Sun. I'm also trying to get SMPL 3D keypoints in image space. I followed your instructions and I got 2D data and 45 entries. I used mode=3d but still I'm getting 2D data. I'm writing it as example below. [[0.0744452346641, -0.012984120034], [-0.23053123445656, 0.00345313533],.... upto 45 entries] I want to ask that how to get 3D keypoints in image space?

My other question is, as you mentioned above reply that in SMPL space body center is pelvis. So what about body center in image space?

Finally I want to ask that is it any possible way to get SPIN keypoints from your repository?

Arthur151 commented 3 years ago
  1. Please set keep_dim=True to keep the Z dimention. Please refer to projection function.
  2. What do you mean by "body center in image space"? Are you asking how to determine the body center in center map? We have detailed description about this in our paper. Briefly speaking, 2D body center could be regarded as the average of torso keypoints (left/right shoulders, left/right hips and so on).
  3. I haven't looked into the "SPIN keypoints" you mentioned. But I think it can be derived from SMPL body mesh via their regression functions.
lisa676 commented 3 years ago

Good question! 1.About the 3D-to-2D projection. It is presented at here. The detailed function is at here. To get the projected 2D coordinates, just use 'pj2d' at here. They are normalized (-1~1) coordinates on the input image (not the original image).

  1. About 'cam'. We adopt a 3-dim camera parameter. We don't directly use the estimated scale value. Instead, we take the $(1.1)^scale$ to make sure that the scale value is always positive. These camera parameters are used to project the estimated 3D joints or body vertex back to 2D image via weak-perspective projection.
  2. About 'pose': they are 72-dim SMPL pose parameters, which are 3-dim 3D rotation of 24 SMPL joints 'j3d_smpl24' in axis representation.
  3. About 'j3d_smpl24' or 'verts'. They are in standard SMPL space, not image space. In standard SMPL space, the body center (near pelvis) is located at the origin.

Don't hestiate to let me your question. Best.

Thanks for your reply. In this reply in point 4 you mentioned that In standard SMPL space, the body center (near pelvis) is located at the origin. I was asking about this body center.

Furthermore I changed keep_dim=True but still I'm getting same results as I mentioned above.

Arthur151 commented 3 years ago
  1. After projection, there may not be any center located at the origin. Because there are X-/Y-translation.
  2. Because we have forced the 'pj2d' to be 2-dims in this line. As we can see, we set 'pj2d':pj3d[:,:,:2]. If you want to get the 3-dims, please change it to 'pj2d':pj3d.
lisa676 commented 3 years ago

Thanks for clarification. Now I can get x, y and z data by removing [:,:,:2] but there is one confusion that there is 45 entries array size is (1, 45, 3) for one frame but I think it should be (1, 24, 3).

Arthur151 commented 3 years ago

To get the back-projected 3D joints in SMPL format, I recommand to replace this line, which is pj3d = proj.batch_orth_proj(j3d_op25, params_dict['cam'], mode='2d') with pj3d = proj.batch_orth_proj( j3d_smpl24, params_dict['cam'], keep_dim=True)

lisa676 commented 3 years ago

@Arthur151 Thanks for your help. I'm following your suggestions but still I'm getting (1, 45, 3). In-fact I can slice array to get first 24 entries as you also did here in j3d_smpl24[:,:24] . But my question is that why we are getting 45 entries? I'm unable to get it.

Arthur151 commented 3 years ago

These 3D body joints are derived from the estimated SMPL body mesh. Each body mesh contains 6890 vertices, from where we can regress the 24 body joints you need. The regression process is quite simple. Because these vertices have stable semantic location, each joint could be easily regressed from them. For instance, if we want to calculate the elbow keypoints, we make average on the vertices near the elbow. In this way, theoritically, we can derive any keypoint you want from the body mesh. There might be some redundant keypoints we regressed from the SMPL body mesh, which makes it 45 instead of 24.