chungyiweng / humannerf

HumanNeRF turns a monocular video of moving people into a 360 free-viewpoint video.
MIT License
786 stars 86 forks source link

Conversion Script from SPIN code base to Metadata parameters in Human Nerf! #65

Open avinabsaha opened 1 year ago

avinabsaha commented 1 year ago

@chungyiweng, could you please provide the script used to obtain the Metadata parameters for the Monocular In-the-wild Videos after you run SPIN?

Dipankar1997161 commented 1 year ago

I don't think that there is a conversion script in this repo. Take the numpy file which you get from SPIN/VIBE and set it up as mentioned in the readme.

avinabsaha commented 1 year ago

I ran the demo code provided by SPIN : https://github.com/nkolot/SPIN/blob/master/demo.py

Can you tell me how to get the parameters of body pose and shape, and camera intrinsic and extrinsic parameters from this code?

Thanks for the help!

Dipankar1997161 commented 1 year ago

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.

For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.

Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

gushengbo commented 1 year ago

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.

For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.

Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

Dipankar1997161 commented 1 year ago

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

gushengbo commented 1 year ago

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

Thank you for your reply, but I have another question that how to convert the (s,tx,ty) to (tx,ty,tz)?

Dipankar1997161 commented 1 year ago

Apologies for the late response.

Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

gushengbo commented 1 year ago

Apologies for the late response.

Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

Thank you very much! But what is the img_res? There is no img_res in ROMP. It's is the image size? but my image size is (1920,1080), which has two parameters.

Dipankar1997161 commented 1 year ago

Apologies for the late response. Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be: E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

Thank you very much! But what is the img_res? There is no img_res in ROMP. It's is the image size? but my image size is (1920,1080), which has two parameters.

Yess, it's the inage resolution

gushengbo commented 1 year ago

which parameter should I choose, 1920 or 1080?

Dipankar1997161 commented 1 year ago

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.

Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

gushengbo commented 1 year ago

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.

Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

Dipankar1997161 commented 1 year ago

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

gushengbo commented 1 year ago

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

20000, but I think the result is wrong

Dipankar1997161 commented 1 year ago

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

20000, but I think the result is wrong

20000 iterations are extremly less for rendering on humannerf. You would atleast require 60K and more, to get the accurate results.

Also, how many images are there in your case? and can you show me the SMPL mesh generated by ROMP for any frame

gushengbo commented 1 year ago

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

20000, but I think the result is wrong

20000 iterations are extremly less for rendering on humannerf. You would atleast require 60K and more, to get the accurate results.

Also, how many images are there in your case? and can you show me the SMPL mesh generated by ROMP for any frame

OK,Thank you very much! There are 50 images, but they are all in same pose. image

In the hand is wrong.

Dipankar1997161 commented 1 year ago

there are few errors in this.

  1. the image size is larger than that of the smpl, hence the arm and face accuracy after rendering might be incorrect - try reducing the size of the image and run ROMP again to see, if there is any improvement.

  2. the pose doesn't matter, what matters is, most of the human body should be visible if not all. this will ensure a proper rendering, else the backside, might be incorrect.

  3. try to increase the images close to >100 if possible

gushengbo commented 1 year ago

there are few errors in this.

  1. the image size is larger than that of the smpl, hence the arm and face accuracy after rendering might be incorrect - try reducing the size of the image and run ROMP again to see, if there is any improvement.
  2. the pose doesn't matter, what matters is, most of the human body should be visible if not all. this will ensure a proper rendering, else the backside, might be incorrect.
  3. try to increase the images close to >100 if possible

OK, I will try it. Thank you very much.

gushengbo commented 1 year ago

I resize the images to (512,512), but I find that in the hand is still wrong. image image image image

Dipankar1997161 commented 1 year ago

I resize the images to (512,512), but I find that in the hand is still wrong. image image image image

Do you have the video of your data. Then try to run ROMP directly on the video once, to see if theirs any difference. Don't run on images, run it on Video.

If it doesn't work, I will tell you more methods

gushengbo commented 1 year ago

Sorry, I don't have image. But I successfully render the images until 145000 iterations. I think it benefited by pose correction function in humannerf. Thank you very much! Best wish to you!

gacu068 commented 1 year ago

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

@Dipankar1997161 Hello, How to get the intrinsic parameters in VIBE? Did you mean the config.yaml in config folder?

Dipankar1997161 commented 1 year ago

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

@Dipankar1997161 Hello, How to get the intrinsic parameters in VIBE? Did you mean the config.yaml in config folder?

I was talking about ROMP, for VIBE I need to check it

three-legs commented 1 year ago

Apologies for the late response.

Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?

Dipankar1997161 commented 1 year ago

Apologies for the late response. Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be: E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?

It is not a random value, depends on which method u using, for example ROMP or VIBE, the focal length is predefined in their Config files. U can check that out. If u using your own device, there is a formula of FOV based on which u calculate the focsl length ( even the ROMP, has the following formula mentioned in their config)

three-legs commented 1 year ago

Apologies for the late response. Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be: E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?

It is not a random value, depends on which method u using, for example ROMP or VIBE, the focal length is predefined in their Config files. U can check that out. If u using your own device, there is a formula of FOV based on which u calculate the focsl length ( even the ROMP, has the following formula mentioned in their config)

Thank you for your reply!But i still have a problem where is the config file ,did you mean the files in configs folder?But i didn't find any parameters about focal_length or intrinsic.