Closed Miles629 closed 1 year ago
Hi, I am also facing the problem of converting ROMP output. How did you find the camera intrinsics? As far as I know, ROMP only outputs 3 values for the camera, which is a weak perspective model. But here a perspective model is required.
Hi, I am also facing the problem of converting ROMP output. How did you find the camera intrinsics? As far as I know, ROMP only outputs 3 values for the camera, which is a weak perspective model. But here a perspective model is required.
Hey, I had asked this question to Arthur in ROMP Repo, he gave me the following suggestion https://github.com/Arthur151/ROMP/issues/421#issue-1600146589
Try this. Let me know if you need anything else
Apologies for the delayed response. I have resolved the issues at hand and have successfully run humannerf, albeit with unsatisfactory results. ROMP struggles to infer SMPL estimates accurately in frame-by-frame analysis of obstructed bullet screen videos, resulting in jittery outputs.
Apologies for the delayed response. I have resolved the issues at hand and have successfully run humannerf, albeit with unsatisfactory results. ROMP struggles to infer SMPL estimates accurately in frame-by-frame analysis of obstructed bullet screen videos, resulting in jittery outputs.
Could you tell me the Camera parameters you used, since humanenrf requires Proper camera ins and extrinsic, but ROMP provides a weak perspective model camera values
Hi, @chungyiweng , the results of this work are really amazing! I'm very interested in this work and I prepare a custom video. I extract the frames and masks successfully and estimate SMPL, intrinsics, extrinsics by ROMP. Finally I run the train wild process successfully after many failures (haha).
I notice the mainly reasons of my failures are the mistakes of intrinsics and extrinsics. So I compared the differences between the two files (prepare_dataset.py wild and zjumocap ). I found some difference and I have some questions.
In zjumocap, the KRDT can be get directly from the dataset. Camera calibration by chessboard so that camera parameters are stable. The Rh and Th can be get directly from the dataset as well which represents the global_orient and the position of human in world coordinate. (my personal understanding, notice me if anything wrong)
In wild, the intrinsics and extrinsics are estimated with SMPL estimation (I use ROMP), as a result, the camera parameters are NOT stable. I chat with the author of ROMP and know that the output camtrans is the position of a person in camera space. I use the camtrans as the T, and R is [[100][010][001]].
Rh = poses[:3].copy()
which is global_orient same as Rh in zjumocap. But Th:Th seems to be the coordinates of root in canonical pose, which is different with Th in zjumocap (my understanding might be wrong, please notice it out if I'm wrong).
I'm curious about how two different meaning of inputs are used in the code.
I checked the code,
apply_global_tfm_to_camera
can transfer E from world2cam to smpl2cam, it's easy to understand when using zjumocap, but I can hardly understand how it works when Th and extrinsics-T are different from zjumocap. BTW, can you explainrays_intersect_3d_bbox
in more detail, I am not sure about the meaning ofnominator
,d_intersect
,p_intersect
, andp_intervals
.It seems that my question is a bit long. Thank you for reading it. Please correct me if there is any mistake in my understanding. I am looking forward to your reply.