Open avinabsaha opened 1 year ago
I don't think that there is a conversion script in this repo. Take the numpy file which you get from SPIN/VIBE and set it up as mentioned in the readme.
I ran the demo code provided by SPIN : https://github.com/nkolot/SPIN/blob/master/demo.py
Can you tell me how to get the parameters of body pose and shape, and camera intrinsic and extrinsic parameters from this code?
Thanks for the help!
I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.
For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.
Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case
I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.
For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.
Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case
Hello, Did you use your own camera params and VIBE?
I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case
Hello, Did you use your own camera params and VIBE?
No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )
I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case
Hello, Did you use your own camera params and VIBE?
No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )
Thank you for your reply, but I have another question that how to convert the (s,tx,ty) to (tx,ty,tz)?
Apologies for the late response.
Also, here is the code for the conversion ( change the values as per yours )
def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection
perspective_camera = np.stack(
[
weak_perspective_camera[1],
weak_perspective_camera[2],
2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
],
axis=-1
)
return perspective_camera
Then the extrinsic parameters will be:
E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]
Apologies for the late response.
Also, here is the code for the conversion ( change the values as per yours )
def convert_weak_perspective_to_perspective( weak_perspective_camera, focal_length=5000., img_res=224, ): # Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz] # in 3D given the bounding box size # This camera translation can be used in a full-perspective projection perspective_camera = np.stack( [ weak_perspective_camera[1], weak_perspective_camera[2], 2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9) ], axis=-1 ) return perspective_camera
Then the extrinsic parameters will be:
E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]
Thank you very much! But what is the img_res? There is no img_res in ROMP. It's is the image size? but my image size is (1920,1080), which has two parameters.
Apologies for the late response. Also, here is the code for the conversion ( change the values as per yours )
def convert_weak_perspective_to_perspective( weak_perspective_camera, focal_length=5000., img_res=224, ): # Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz] # in 3D given the bounding box size # This camera translation can be used in a full-perspective projection perspective_camera = np.stack( [ weak_perspective_camera[1], weak_perspective_camera[2], 2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9) ], axis=-1 ) return perspective_camera
Then the extrinsic parameters will be: E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]
Thank you very much! But what is the img_res? There is no img_res in ROMP. It's is the image size? but my image size is (1920,1080), which has two parameters.
Yess, it's the inage resolution
which parameter should I choose, 1920 or 1080?
which parameter should I choose, 1920 or 1080?
It's like whichever res you want. 512, 1024 and so on.
Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution
which parameter should I choose, 1920 or 1080?
It's like whichever res you want. 512, 1024 and so on.
Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution
I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: This appears to be an error in the SMPL predicted by the ROMP model??
which parameter should I choose, 1920 or 1080?
It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution
I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: This appears to be an error in the SMPL predicted by the ROMP model??
This is after how many iterations??? Could you tell me that
which parameter should I choose, 1920 or 1080?
It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution
I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: This appears to be an error in the SMPL predicted by the ROMP model??
This is after how many iterations??? Could you tell me that
20000, but I think the result is wrong
which parameter should I choose, 1920 or 1080?
It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution
I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: This appears to be an error in the SMPL predicted by the ROMP model??
This is after how many iterations??? Could you tell me that
20000, but I think the result is wrong
20000 iterations are extremly less for rendering on humannerf. You would atleast require 60K and more, to get the accurate results.
Also, how many images are there in your case? and can you show me the SMPL mesh generated by ROMP for any frame
which parameter should I choose, 1920 or 1080?
It's like whichever res you want. 512, 1024 and so on. Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution
I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: This appears to be an error in the SMPL predicted by the ROMP model??
This is after how many iterations??? Could you tell me that
20000, but I think the result is wrong
20000 iterations are extremly less for rendering on humannerf. You would atleast require 60K and more, to get the accurate results.
Also, how many images are there in your case? and can you show me the SMPL mesh generated by ROMP for any frame
OK,Thank you very much! There are 50 images, but they are all in same pose.
In the hand is wrong.
there are few errors in this.
the image size is larger than that of the smpl, hence the arm and face accuracy after rendering might be incorrect - try reducing the size of the image and run ROMP again to see, if there is any improvement.
the pose doesn't matter, what matters is, most of the human body should be visible if not all. this will ensure a proper rendering, else the backside, might be incorrect.
try to increase the images close to >100 if possible
there are few errors in this.
- the image size is larger than that of the smpl, hence the arm and face accuracy after rendering might be incorrect - try reducing the size of the image and run ROMP again to see, if there is any improvement.
- the pose doesn't matter, what matters is, most of the human body should be visible if not all. this will ensure a proper rendering, else the backside, might be incorrect.
- try to increase the images close to >100 if possible
OK, I will try it. Thank you very much.
I resize the images to (512,512), but I find that in the hand is still wrong.
I resize the images to (512,512), but I find that in the hand is still wrong.
Do you have the video of your data. Then try to run ROMP directly on the video once, to see if theirs any difference. Don't run on images, run it on Video.
If it doesn't work, I will tell you more methods
Sorry, I don't have image. But I successfully render the images until 145000 iterations. I think it benefited by pose correction function in humannerf. Thank you very much! Best wish to you!
I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case
Hello, Did you use your own camera params and VIBE?
No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )
@Dipankar1997161 Hello, How to get the intrinsic parameters in VIBE? Did you mean the config.yaml in config folder?
I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file. For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go. Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case
Hello, Did you use your own camera params and VIBE?
No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )
@Dipankar1997161 Hello, How to get the intrinsic parameters in VIBE? Did you mean the config.yaml in config folder?
I was talking about ROMP, for VIBE I need to check it
Apologies for the late response.
Also, here is the code for the conversion ( change the values as per yours )
def convert_weak_perspective_to_perspective( weak_perspective_camera, focal_length=5000., img_res=224, ): # Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz] # in 3D given the bounding box size # This camera translation can be used in a full-perspective projection perspective_camera = np.stack( [ weak_perspective_camera[1], weak_perspective_camera[2], 2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9) ], axis=-1 ) return perspective_camera
Then the extrinsic parameters will be:
E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]
@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?
Apologies for the late response. Also, here is the code for the conversion ( change the values as per yours )
def convert_weak_perspective_to_perspective( weak_perspective_camera, focal_length=5000., img_res=224, ): # Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz] # in 3D given the bounding box size # This camera translation can be used in a full-perspective projection perspective_camera = np.stack( [ weak_perspective_camera[1], weak_perspective_camera[2], 2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9) ], axis=-1 ) return perspective_camera
Then the extrinsic parameters will be: E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]
@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?
It is not a random value, depends on which method u using, for example ROMP or VIBE, the focal length is predefined in their Config files. U can check that out. If u using your own device, there is a formula of FOV based on which u calculate the focsl length ( even the ROMP, has the following formula mentioned in their config)
Apologies for the late response. Also, here is the code for the conversion ( change the values as per yours )
def convert_weak_perspective_to_perspective( weak_perspective_camera, focal_length=5000., img_res=224, ): # Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz] # in 3D given the bounding box size # This camera translation can be used in a full-perspective projection perspective_camera = np.stack( [ weak_perspective_camera[1], weak_perspective_camera[2], 2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9) ], axis=-1 ) return perspective_camera
Then the extrinsic parameters will be: E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]
@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?
It is not a random value, depends on which method u using, for example ROMP or VIBE, the focal length is predefined in their Config files. U can check that out. If u using your own device, there is a formula of FOV based on which u calculate the focsl length ( even the ROMP, has the following formula mentioned in their config)
Thank you for your reply!But i still have a problem where is the config file ,did you mean the files in configs folder?But i didn't find any parameters about focal_length or intrinsic.
@chungyiweng, could you please provide the script used to obtain the Metadata parameters for the Monocular In-the-wild Videos after you run SPIN?