Numeric bug on perspective matrix computation

Makogan commented 2 years ago

I think there is a numeric bug in the way nalgebra computes its perspective calculation.

I was trying to render an object through both raytracing and perspective projection and i was getting this:

The perspectives clearly did not lign up. I spent hours trying to figure out why my math was wrong, until i eventually decided to just copy paste and transpile a c++ perspective camera projection code I own.

With it I get this result:

Which matches my expectation.

With my code I get this perspective projeciton matrix:

  ┌                                         ┐
  │ 2.4142134         0         0         0 │
  │         0 2.4142134         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

Nalgebra on the other hand returns this for the exact same parameters:

   ┌                                         ┐
  │ 2.3306494         0         0         0 │
  │         0 1.8304877         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

As you can see they are clearly different.

This is my code:

  fn perspective(fov : Float, aspect : Float, z_near : Float, z_far : Float) -> Mat4
{
    debug_assert!(aspect > Float::MIN_POSITIVE, "");

    let tan_half_fovy = Float::tan(fov / 2.0);

    let mut result = Mat4::zeros();
    result[(0, 0)] = 1.0 / (aspect * tan_half_fovy);
    result[(1, 1)] = 1.0 / (tan_half_fovy);
    result[(2, 2)] = -(z_far + z_near) / (z_far - z_near);
    result[(3, 2)] = -1.0;
    result[(2, 3)] = -(2.0 * z_far * z_near) / (z_far - z_near);
    return result;
}

And this is nalgebra's implementation:

pub fn perspective_rh_no<T: RealNumber>(aspect: T, fovy: T, near: T, far: T) -> TMat4<T> {
    assert!(
        !relative_eq!(far - near, T::zero()),
        "The near-plane and far-plane must not be superimposed."
    );
    assert!(
        !relative_eq!(aspect, T::zero()),
        "The aspect ratio must not be zero."
    );

    let negone = -T::one();
    let one = T::one();
    let two: T = crate::convert(2.0);
    let mut mat = TMat4::zeros();

    let tan_half_fovy = (fovy / two).tan();

    mat[(0, 0)] = one / (aspect * tan_half_fovy);
    mat[(1, 1)] = one / tan_half_fovy;
    mat[(2, 2)] = -(far + near) / (far - near);
    mat[(2, 3)] = -(two * far * near) / (far - near);
    mat[(3, 2)] = negone;

    mat
}

Notice that the code is attempting to compute basically the same things. I don't know which operation is the cause of the discrepancy but something in nalgebra's code is not working properly.

You can verify the computations by using these input parameters:

width 800
height 800
z_near 0.1
z_far 20 000
fov 45

Finomnis commented 2 years ago

Please provide the exact input for both algorithms. They do not have a width and a height input, and the (0,0) and (1,1) values should clearly be identical when aspect is 1.0. I suspect an error on the user side, you must be passing incorrect parameters in.

Note that the order of parameters differs between your method and nalgebra's method. Are you sure you provide arguments in the correct order?

Finomnis commented 2 years ago

My suspicion is confirmed by this minimal example:

use std::f32::consts::PI;

use nalgebra_glm::{perspective_rh_no, Mat4};

fn perspective(fov: f32, aspect: f32, z_near: f32, z_far: f32) -> Mat4 {
    debug_assert!(aspect > f32::MIN_POSITIVE, "");

    let tan_half_fovy = f32::tan(fov / 2.0);

    let mut result = Mat4::zeros();
    result[(0, 0)] = 1.0 / (aspect * tan_half_fovy);
    result[(1, 1)] = 1.0 / (tan_half_fovy);
    result[(2, 2)] = -(z_far + z_near) / (z_far - z_near);
    result[(3, 2)] = -1.0;
    result[(2, 3)] = -(2.0 * z_far * z_near) / (z_far - z_near);
    return result;
}

fn main() {
    let width = 800.0;
    let height = 800.0;
    let z_near = 0.1;
    let z_far = 20000.0;
    let fov = 45.0 / 180.0 * PI;

    let aspect = width / height;

    println!(
        "Yours (correct): {}",
        perspective(fov, aspect, z_near, z_far)
    );
    println!(
        "Yours (incorrect): {}",
        perspective(aspect, fov, z_near, z_far)
    );

    println!(
        "nalgebra's (incorrect): {}",
        perspective_rh_no(fov, aspect, z_near, z_far)
    );
    println!(
        "nalgebra's (correct): {}",
        perspective_rh_no(aspect, fov, z_near, z_far)
    );
}

Yours (correct): 
  ┌                                         ┐
  │ 2.4142134         0         0         0 │
  │         0 2.4142134         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

Yours (incorrect): 
  ┌                                         ┐
  │ 2.3306494         0         0         0 │
  │         0 1.8304877         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

nalgebra's (incorrect): 
  ┌                                         ┐
  │ 2.3306494         0         0         0 │
  │         0 1.8304877         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

nalgebra's (correct): 
  ┌                                         ┐
  │ 2.4142134         0         0         0 │
  │         0 2.4142134         0         0 │
  │         0         0  -1.00001 -0.200001 │
  │         0         0        -1         0 │
  └                                         ┘

Makogan commented 2 years ago

What seems to have happened is, glm's order of parameters is this:

perspectiveRH_NO(T fovy, T aspect, T zNear, T zFar) https://github.com/g-truc/glm/blob/b3f87720261d623986f164b2a7f6a0a938430271/glm/ext/matrix_clip_space.inl#L248

Where nalgebra's is:

perspective_rh_no<T: RealNumber>(aspect: T, fovy: T, near: T, far: T)

The aspect and fov parameters are permuted. So a user porting code from C++ to rust or following a C++ tutorial in rust can very easily end up passing the incorrect parameter, since it is reasonable for a user to expect glm::nalgebra to have the same parameters as C++ glm.

Indeed that was exactly the case here, I had ported code from C++ to rust and since things compiled and seemed to work (the images I was generating were very plausible) it did not occur to me that the order of parameters in glm:nalgebra were different.

I do not know why the order of parameters is different or whether it can be justified to have the same order as glm, but I expect this difference to cause confusion in the future.

Finomnis commented 2 years ago

As a sidenote, they seem to be aware of this issue.

They added an explicit warning to the documentation: https://docs.rs/nalgebra-glm/latest/nalgebra_glm/fn.perspective_rh_no.html#important-note

Finomnis commented 2 years ago

It seems this is not the first time this was brought up: https://github.com/dimforge/nalgebra/issues/423

Might be worth reopening or creating a new issue to remind them of fixing it at the next major version.

Finomnis commented 2 years ago

@Makogan I deleted my comments as they were not relevant for the improvement of this crate.

sebcrozet commented 2 years ago

This is something that in an ideal world, would have been caught on by peer reviewing and would not have been deployed. It was deployed, this suggests that something in the peer review process failed. From the links you shared what failed was making a wrong call the first time the error was caught.

This isn’t a peer-reviewing problem. nalgebra existed way before nalgebra-glm and was shipped with its current argument order. Then nalgebra-glm was developped, and, to avoid confusion for people familiar with nalgebra , used the same argument order as nalgebra rather than glm. So it was a deliberate choice, maybe not the best one, but it was a choice, and is documented explicitly.

Please keep in mind that API design that makes every single person happy is hard and often impossible.

Addressing this by swapping the arguments on both nalgebra and nalgebra-glm is definitely possible. It just requires a version bump, which isn’t that big of a deal, especially since these two crates haven’t reached 1.0.0 yet. The main issue is to find a proper way of making the transition without silently breaking user code.

dimforge / nalgebra

Numeric bug on perspective matrix computation #1152