anibali / margipose

3D monocular human pose estimation
Apache License 2.0
98 stars 20 forks source link

how prediction convert to camera coordination in inference phase? #19

Closed guker closed 4 years ago

guker commented 4 years ago

Now,I have a video,no camera intrinsics, so how i can convert prediction to camera coordination in inference phase?

anibali commented 4 years ago

The camera intrinsics are required for that step of the process, so you will need to make a rough guess of what the intrinsics could be. Otherwise you are stuck with the predicted coordinates in normalised space.

guker commented 4 years ago
def infer_depth(self, norm_skel, eval_scale, intrinsics, height, width, z_upper=20000):
        """Infer the depth of the root joint.
        Args:
            norm_skel (torch.DoubleTensor): The normalised skeleton.
            eval_scale (function): A function which evaluates the scale of a denormalised skeleton.
            intrinsics (CameraIntrinsics): The camera which projects 3D points onto the 2D image.
            height (float): The image height.
            width (float): The image width.
            z_upper (float): Upper bound for depth.
        Returns:
            float: `z_ref`, the depth of the root joint.
        """
        def f(z_ref):
            z_ref = float(z_ref)
            skel = self.denormalise_skeleton(norm_skel, z_ref, intrinsics, height, width)
            k = eval_scale(skel)
            return (k - 1.0) ** 2
        z_lower = max(intrinsics.alpha_x, intrinsics.alpha_y)
        z_ref = float(optimize.fminbound(f, z_lower, z_upper, maxfun=200, disp=0))
        return z_ref

In other words,i can assume suitable camera intrinsics,and combine infer_depth,then convert prediction to camera space, right?

anibali commented 4 years ago

Yep. You can use one of these functions to get the eval_scale function parameter:

https://github.com/anibali/margipose/blob/e96d59187dc17651ab184ca263f9a1a150cfa201/src/margipose/data/skeleton.py#L196-L213

guker commented 4 years ago

got it, thanks very much!

guker commented 4 years ago

got it, thanks very much!

I try it, and it works well.

sunmengnan commented 2 years ago

image Is the depth predicted here in infer_single.py(step in utils, it is each joint? ) in 2d space, not 3d camera space?

anibali commented 2 years ago

I believe that the breakpoint that you are looking at in your debugger corresponds to this line in infer_single.py:

https://github.com/anibali/margipose/blob/2933f30203b3cd5c636917a7c9ff107d02434598/src/margipose/bin/infer_single.py#L75

At this point the joints (norm_skel3d) exist in normalised 3D space. Further up in this issue you can see discussion around denormalising (recovering metric units).

sunmengnan commented 2 years ago

Yes...the key is how to denormalise it, shoule I use PoseDataset.denormalise in data/init.py? But I don't have the value of untransform in eval_scale...