facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.7k stars 1.3k forks source link

fix: Fix docs for get_projection_transform #1797

Open jeanchristopheruel opened 4 months ago

jeanchristopheruel commented 4 months ago

The scalez should be positive in the K matrix. Minimal example: https://colab.research.google.com/drive/1od4ql8Y2P0d2-JBxkLp4Yu9z_M_QRy6?usp=sharing

facebook-github-bot commented 4 months ago

Hi @jeanchristopheruel!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

facebook-github-bot commented 4 months ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

bottler commented 4 months ago

As written, I think it's still wrong. Because a few lines up on 910 it defines scale_z = 2 / (far-near). So shouldn't this line actually say 0.5 * scale_z. Or, better, your change should be kept but 910 changed to scale_z = 1 / (far-near). And also the comment should say somewhere that if the input scale_xyz is provided it scales these default values. Do you agree?

jeanchristopheruel commented 4 months ago

I do not agree. The provided correction is valid if the intention is to normalize Z in [-1, +1] for orthographic camera. Here are the maths. Also I wouldn't allow the user to use any other variable when K is provided as it would reduce understandability.

However, I admit that I am a little bit sceptical about the original intention of this. It appears that the screen space of pytorch3d is defined as x \in [-1,+1], y \in [-1, +1] and z \in [znear, zfar]. It that case, there is no need to normalize the Z component. Hence , these variables should be set: scale_z=1, mid_z=0.

image

bottler commented 4 months ago

Depth is being scaled to the interval [0,1] not [-1,1] . The actual screen space depth in pytorch3d is inconsistent between camera types. The object is explicitly [0,1].

jeanchristopheruel commented 4 months ago

Thanks for the info. However I can experience that there is no z-clipping in the orthographic camera rasterizer. Here is a minimal example using a very large Z translation and unnormalized z coords.

https://colab.research.google.com/drive/1KjqjOC15SSG8_ZwRBbPN4yJ4H-IcvzRM?usp=sharing

bottler commented 4 months ago

I haven't looked closely at that code. I see zbuf_min and zbuf_max are both between 1000 and 1001. When I add z_clip_value = 1000.4, to the RasterizationSettings there the picture changes.

jeanchristopheruel commented 4 months ago

Thanks for your time and interest. Since there is no far z-clipping at all, I suggest to affect 'scale_z=1' and 'mid_z=0' in K to keep the z coordinates right handed and unnormalized. Do you agree?

Note: I find it odd that z_clip_value clips for z<z_clip_value. Should it be the inverse? i.e. clip the triangles to keep [0, z_clip_value] instead of [z_clip_value, inf]? Of not, I will open a different PR to improve the z_clip_value docs.

bottler commented 4 months ago

I don't understand your point about K. Are you suggesting a doc change or a code change? I think all there is to do is to change the doc from -scale_z to 0.5 * scale_z.

As described in clip.py, z_clip_value is for getting rid of things which are too close to the camera. Points with z<z_clip_value get ignored. I don't think anything is odd there.

jeanchristopheruel commented 4 months ago

I'm suggesting both code change and docs change in order to be consistent with the rest of the docs. The proposed changes fix the depth normalization to [0,1]. Indeed, the solution for image Is scale_z = 2 / (far - near) and mid_z = -2*near / (far-near)

However, I don't see the point of normalizing Z in the NDC coordinate system. The convention should be the same for any non-OpenGL camera. Screenshot from 2024-05-27 16-20-09

Do you think I should propose an other PR to remove the Z normalizing, i.e. do not account for Znear and Zfar at all.

bottler commented 4 months ago

I don't think there's a bug in the code here. The class is not consistent with other classes on where the z is, but it doesn't matter for the rest of PyTorch3D. I think it would be best to just fix the documentation to reflect what the code does and be helpful.

jacksonsunny29 commented 3 months ago

Hello, I too agree with @jeanchristopheruel that zfar clipping is not done by the rastarizer in pytorch3d.

jeanchristopheruel commented 3 months ago

I also find it odd to have the option for znear clipping but not zfar clipping. Zfar clipping is a pretty common strategy to limit the rasterization work.

However, I do not think this is not the right place to talk about it. I suggest you open a new issue/discussion (make sure it does not already exist).