Closed cwwjyh closed 1 year ago
The matrices calculation in gradio_new.py is only for angle visualization purposes and is independent of the zero123 inference code, so I wouldn't use one to understand another.
Assuming you are asking zero123 inference conditioning, here's the code converting a pair of camera matices (input view and target view) into the 4-dimensional conditioning vector described in appendix section A: https://github.com/cvlab-columbia/zero123/blob/f70ea8c26af7943494f75314df054eb843999add/zero123/ldm/data/simple.py#L257-L272
Hope it makes sense.
P.S. camera matrices over parameterized the transformation so there'll be more than one way to represent an RT operation. I think a better way to compare RT is to visualize the camera or to convert it to Euler or axis angle first.
The matrices calculation in gradio_new.py is only for angle visualization purposes and is independent of the zero123 inference code, so I wouldn't use one to understand another.
Assuming you are asking zero123 inference conditioning, here's the code converting a pair of camera matices (input view and target view) into the 4-dimensional conditioning vector described in appendix section A.
Hope it makes sense.
P.S. camera matrices over parameterized the transformation so there'll be more than one way to represent an RT operation. I think a better way to compare RT is to visualize the camera or to convert it to Euler or axis angle first.
if I use camera_R to calculate the rotation matric in gradio_new.py, is the camera_R the target image's rotation matric or not?
Sorry but I don't understand. What do you need camera_R, which is used only for visualization, for?
Sorry but I don't understand. What do you need camera_R, which is used only for visualization, for?
camera_R is not used only for visualization. I want to input a single RGB image and then synthesize an image from a specified camera viewpoint. if I give many specified camera viewpoints, I can get many target views. Then, I utilize the target image and corresponding extrinsic matric(eg: camera_R) as a new dataset. So I want to know if the camera_R in gradio_new.py is the target image rotation matric. if yes, I can calculate the translation matric T in gradio_new.py.
If you want to generate a novel view based on a sampled target viewpoint described by an RT matrix, assuming your input viewpoint is cond_RT, you can sample a target viewpoint target_RT, and use the get_T function to get d_T which is the conditioning information for zero123: https://github.com/cvlab-columbia/zero123/blob/f70ea8c26af7943494f75314df054eb843999add/zero123/ldm/data/simple.py#L257-L272
If you want to generate a novel view based on a sampled target viewpoint described by an RT matrix, assuming your input viewpoint is cond_RT, you can sample a target viewpoint target_RT, and use the get_T function to get d_T which is the conditioning information for zero123:
Thank you! I will try it. Can I write the above code in gradio_new.py to get the target Rt matric?
This function does (cond_RT, target_RT) -> d_T. Sounds like you want (cond_RT, d_T) -> target_RT. You will need to implement that yourself.
This function does (cond_RT, target_RT) -> d_T. Sounds like you want (cond_RT, d_T) -> target_RT. You will need to implement that yourself.
cond_RT means input image RT, target_RT means synthesize view RT, what's mean d_T?
It means the relative transformation from cond_RT to target_RT, which is used as input to the diffusion model: https://github.com/cvlab-columbia/zero123/blob/f70ea8c26af7943494f75314df054eb843999add/zero123/gradio_new.py#L81-L82
I suggest you try to understand the paper/code before raising an issue here. Closing the ticket for now.
I have checked Appendix Section A of the paper regarding the camera coordinate system. In gradio_new.py, Is the Rotation matrix(camera_R ) in w2c format? In gradio_new.py, you use camera_R to obtain the camera extrinsic but not given T matric of the camera. So I want to use pytorch3d look_at_view_transform to compute extrinsic matric R|t using: R, T = look_at_view_transform(dist=radius, elev=90 - polar_deg, azim=azimuth_deg, up=((0, 1, 0),)). but I got a different result. In the image below, the result of R_jisuan is used pytorch3d to compute, and the camera_R_ori is your code result. how to compute the same R using pytorch3d in your code. Or you can give me advice on how to use your code to compute the T matric of the camera. Looking forward to your reply!