david-svitov / HAHA

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior
84 stars 9 forks source link

About .pkl of Smplify-x and SMPLX #10

Open IceFtv opened 1 month ago

IceFtv commented 1 month ago

Very Nice job! But when I tried to train my model using the pkl file processed by smplify-x, I found that there is a certain difference between the. pkl output of Smplify-x and SMPLX.

Smplify-x: {'camera_rotation', 'camera_translation', 'betas', 'global_orient', 'left_hand_pose', 'right_hand_pose', 'jaw_pose', 'leye_pose', 'reye_pose', 'expression', 'body_pose'} SMPLX: {'betas', 'global_orient', 'body_pose', 'transl', 'left_hand_pose', 'right_hand_pose', 'jaw_pose', 'leye_pose', 'reye_pose', 'expression', 'camera_matrix', 'camera_transform'}

Could you tell me how you handled it at that time. Thank you very much.

david-svitov commented 1 month ago
from smplx.lbs import transform_mat

size_x = gt_masks_list[frame_id].shape[1]
size_y = gt_masks_list[frame_id].shape[0]

camere_matrix = camera.get_camera_matrix("cpu", size_x, size_y)[0]
camera_transform = transform_mat(camera.rotation,
                                 camera.translation.unsqueeze(dim=-1))[0]

result['camera_matrix'] = camere_matrix.detach().numpy()
result['camera_transform'] = camera_transform.detach().cpu().numpy()

Hi! I added this to fit_single_frame.py somewhere after line 470. But you can assemble these matrices in a dataloader using 'camera_rotation' and 'camera_translation'.

IceFtv commented 1 month ago

Thank you very much for your reply. I will try it immediately. But I found another issue, which is that the number of body_pose in smplifyx in the. pkl file is 32, while the number of SMPLX is 63 Could you tell me how you handled it? Thank you very much.

david-svitov commented 1 month ago

It seems you need to decode the vector from vposer: https://github.com/vchoutas/smplify-x/issues/139 result['body_pose'] = vposer.decode(pose_embedding, output_type='aa').detach().cpu().numpy().reshape((1,63))

IceFtv commented 1 month ago
from smplx.lbs import transform_mat

size_x = gt_masks_list[frame_id].shape[1]
size_y = gt_masks_list[frame_id].shape[0]

camere_matrix = camera.get_camera_matrix("cpu", size_x, size_y)[0]
camera_transform = transform_mat(camera.rotation,
                                 camera.translation.unsqueeze(dim=-1))[0]

result['camera_matrix'] = camere_matrix.detach().numpy()
result['camera_transform'] = camera_transform.detach().cpu().numpy()

Hi! I added this to fit_single_frame.py somewhere after line 470. But you can assemble these matrices in a dataloader using 'camera_rotation' and 'camera_translation'.

Is the function(camera.get_camera_matrix) self defined? Could you provide me with the code? And if I have my own camera parameters, do I need to modify the content of. pkl or other places to display it correctly. Thank you very much.

zyms5244 commented 1 month ago
from smplx.lbs import transform_mat

size_x = gt_masks_list[frame_id].shape[1]
size_y = gt_masks_list[frame_id].shape[0]

camere_matrix = camera.get_camera_matrix("cpu", size_x, size_y)[0]
camera_transform = transform_mat(camera.rotation,
                                 camera.translation.unsqueeze(dim=-1))[0]

result['camera_matrix'] = camere_matrix.detach().numpy()
result['camera_transform'] = camera_transform.detach().cpu().numpy()

Hi! I added this to fit_single_frame.py somewhere after line 470. But you can assemble these matrices in a dataloader using 'camera_rotation' and 'camera_translation'.

I have encountered the same problem when I want to convert my dataset into this format: https://drive.google.com/file/d/1peE2RNuYoeouA8YS0XwyR2YEbLT5gseW/view?usp=sharing Could you explain the data processing process in detail, especially get camera_matrix from Smplify-x? It would be even better if could provide the code.

david-svitov commented 1 month ago

In the camera.py add this to the PerspectiveCamera class:

NEAR = 0.01
FAR = 100

def get_camera_matrix(self, device, size_x, size_y):
    with torch.no_grad():
        camera_mat = torch.zeros([self.batch_size, 4, 4],
                                 dtype=self.dtype, device=device)
        camera_mat[:, 0, 0] = 2.0 * self.focal_length_x / size_x
        camera_mat[:, 1, 1] = 2.0 * self.focal_length_y / size_y
        camera_mat[:, 3, 2] = 1.0
        camera_mat[:, 2, 2] = -(self.FAR + self.NEAR) / (self.NEAR - self.FAR)
        camera_mat[:, 2, 3] = (2 * self.FAR * self.NEAR) / (self.NEAR - self.FAR)

    return camera_mat
IceFtv commented 1 month ago

Thank you for your reply. Are size_x and size_y the W and H of img here?

david-svitov commented 1 month ago

Yup

IceFtv commented 1 month ago

Another question is, how should SMPLX's “tranl” be calculated? Does it mean it is the "camera_translation" in Smplify-x.

david-svitov commented 1 month ago

Everything should be fine if you just ignore "transl".

zyms5244 commented 1 month ago

In the camera.py add this to the PerspectiveCamera class:

NEAR = 0.01
FAR = 100

def get_camera_matrix(self, device, size_x, size_y):
    with torch.no_grad():
        camera_mat = torch.zeros([self.batch_size, 4, 4],
                                 dtype=self.dtype, device=device)
        camera_mat[:, 0, 0] = 2.0 * self.focal_length_x / size_x
        camera_mat[:, 1, 1] = 2.0 * self.focal_length_y / size_y
        camera_mat[:, 3, 2] = 1.0
        camera_mat[:, 2, 2] = -(self.FAR + self.NEAR) / (self.NEAR - self.FAR)
        camera_mat[:, 2, 3] = (2 * self.FAR * self.NEAR) / (self.NEAR - self.FAR)

    return camera_mat

Thank you for your kindness. I have verified the data preprocessing on male-4-casual using Simplify-X and added your code in camera.py. However, I could not achieve the same results with the data you provided: [Google Drive link](^1^). Regarding your data: Key: camera_matrix : [[ 4.9223537 0. -0.08888889 0. ] [ 0. 4.941477 0.05 0. ] [ 0. 0. 1.0001 -0.01010101] [ 0. 0. 1. 0. ]] Key: camera_transform : [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]] Key: transl : [[0.1355158 0.29134032 5.315963 ]] my processed data: Key: camera_matrix : [[ 9.259259 0. 0. 0. ] [ 0. 9.259259 0. 0. ] [ 0. 0. 1.0002 -0.020002] [ 0. 0. 1. 0. ]] Key: camera_transform : [[1. 0. 0. 0.0371261] [0. 1. 0. 0.3118709] [0. 0. 1. 8.512359 ] [0. 0. 0. 1. ]] Key: transl : [[0. 0. 0.]] The matrices are different, and the avatar reconstruction is blurry. Could you provide some advice or update the entire data preprocessing code?

IceFtv commented 1 month ago

I also have the same problem, I think it's because smplify-x has a default focal_length=5000, so I give smplify_x a focal_length which comes from the camera.pkl.

IceFtv commented 4 weeks ago
from smplx.lbs import transform_mat

size_x = gt_masks_list[frame_id].shape[1]
size_y = gt_masks_list[frame_id].shape[0]

camere_matrix = camera.get_camera_matrix("cpu", size_x, size_y)[0]
camera_transform = transform_mat(camera.rotation,
                                 camera.translation.unsqueeze(dim=-1))[0]

result['camera_matrix'] = camere_matrix.detach().numpy()
result['camera_transform'] = camera_transform.detach().cpu().numpy()

Hi! I added this to fit_single_frame.py somewhere after line 470. But you can assemble these matrices in a dataloader using 'camera_rotation' and 'camera_translation'.

Hi! Sorry to bother you again. When I tried to train my model using the preprocessed code above, I obtained a very bad result: 00000 And I found that during the training process, the human body in the images saved in the val folder may be smaller than the human body trained using the dataset data. Use Smplify-x: s000000_b000 Use SnapshotPeople_SMPLX: s000000_b000_1

Is this caused by camera_translation estimated by smplify-x? How to solve it? I would greatly appreciate it if you could reply to me.

david-svitov commented 4 weeks ago

First, it's good to understand whether there's a problem with Smplify-x convergence or the data format.

  1. There should be visualizations in the folder with Simplify-x results if you set the --visualize="True" flag. Check that they are ok.
  2. This kind of problem often arises if the gender of a person is confused. Since you are trying on a woman, set the value here to "female": https://github.com/vchoutas/smplify-x/blob/68f8536707f43f4736cdd75a19b18ede886a4d53/cfg_files/fit_smplx.yaml#L11
david-svitov commented 4 weeks ago

@zyms5244 If the problem is only blurriness, then the problem is not in the camera, but in the quality of the fits. First check the things I pointed out in the post above. You'll likely see in your renderings that Smplify-x doesn't work well for some frames. For example, it restores the pose inaccurately. In this case, you can modify the Smplify-x code as suggested in this article: https://samsunglabs.github.io/MoRF-project-page/ Namely: add Silhouette loss and Temporal loss In the next few days I will try to upload a non-official implementation of the fitting from MoRF article. It's a little slow, but more accurate than Smplify-x.

zyms5244 commented 4 weeks ago

Thanks. I reproduced the effect of male4 with my processed data, and it seems that the position of the smplx model and the 3DGS are not well aligned. https://github.com/david-svitov/HAHA/assets/10354474/d02478fe-44c4-4db4-adf6-a81a78968a3a

now, I try to opt the cam pose and 3d human to align this, before the results from your MoRF

IceFtv commented 4 weeks ago

Thanks. I reproduced the effect of male4 with my processed data, and it seems that the position of the smplx model and the 3DGS are not well aligned. https://github.com/david-svitov/HAHA/assets/10354474/d02478fe-44c4-4db4-adf6-a81a78968a3a

now, I try to opt the cam pose and 3d human to align this, before the results from your MoRF

Hi! @zyms5244 I would like to know how you handle the differences between these two .pkl, especially camera_matrix and camera_transform. Thank you.

zyms5244 commented 3 weeks ago

Thanks. I reproduced the effect of male4 with my processed data, and it seems that the position of the smplx model and the 3DGS are not well aligned. https://github.com/david-svitov/HAHA/assets/10354474/d02478fe-44c4-4db4-adf6-a81a78968a3a now, I try to opt the cam pose and 3d human to align this, before the results from your MoRF

Hi! @zyms5244 I would like to know how you handle the differences between these two .pkl, especially camera_matrix and camera_transform. Thank you.

add get_camera_matrix function to camera.py and set matrixs like this

          size_x, size_y = img.shape[1], img.shape[0]
          camere_matrix = camera.get_camera_matrix("cpu", size_x, size_y)[0]
          camera_transform = transform_mat(camera.rotation,
                                          camera.translation.unsqueeze(dim=-1))[0]

          result['camera_matrix'] = camere_matrix.detach().numpy()
          result['camera_transform'] = camera_transform.detach().cpu().numpy()
          result['transl'] = np.zeros([1,3])