Is the magicbit version correct?

gnzlbg commented 7 years ago

I have a morton code library in rust that just uses BMI2 and wanted to use magicbits when BMI2 was not available. For testing, I used an invariant of the form:

fn invariant<T: Int>(v: T) {
  { // 2D
    let (x, y) = morton_decode_2d(v);
    let vs = morton_encode_2d(x, y);
    assert_eq!(v, vs);
  }
  { // 3D
    let (x, y, z) = morton_decode_3d(v);
    let vs = morton_encode_3d(x, y, z);
    assert_eq!(v, vs);
  }
}

for many integer ranges. I am testing for all u8, u16, [u16::max as u32, u16::max as u32 - 1000000], [u32::max - 1000000, u32::max], [u32::max as u64, u32::max as u64 - 1000000], [u64::max - 1000000, u64::max].

The BMI2 version always passes all tests correctly so it must be at least be a bug with my implementation.

I needed to change your magicbits implementation for the 2D case to:

// 2D decode 32-bit:
            x = x & MORTON_MASK2D_U32[5];
            x = (x ^ (x >> 1)) & MORTON_MASK2D_U32[4];
            x = (x ^ (x >> 2)) & MORTON_MASK2D_U32[3];
            x = (x ^ (x >> 4)) & MORTON_MASK2D_U32[2];
            x = (x ^ (x >> 8)) & MORTON_MASK2D_U32[1];
            x = (x ^ (x >> 16)) & MORTON_MASK2D_U32[0];
// 2D encode 32-bit:
            x = (x | x << 16) & MORTON_MASK2D_U32[1];
            x = (x | x << 8) & MORTON_MASK2D_U32[2];
            x = (x | x << 4) & MORTON_MASK2D_U32[3];
            x = (x | x << 2) & MORTON_MASK2D_U32[4];
            x = (x | x << 1) & MORTON_MASK2D_U32[5];
// 2D decode 64-bit:
            x = x & MORTON_MASK2D_U64[5];
            x = (x ^ (x >> 1)) & MORTON_MASK2D_U64[4];
            x = (x ^ (x >> 2)) & MORTON_MASK2D_U64[3];
            x = (x ^ (x >> 4)) & MORTON_MASK2D_U64[2];
            x = (x ^ (x >> 8)) & MORTON_MASK2D_U64[1];
            x = (x ^ (x >> 16)) & MORTON_MASK2D_U64[0];
// 2D encode 64-bit:
            x = (x | x << 32) & MORTON_MASK2D_U64[0];
            x = (x | x << 16) & MORTON_MASK2D_U64[1];
            x = (x | x << 8) & MORTON_MASK2D_U64[2];
            x = (x | x << 4) & MORTON_MASK2D_U64[3];
            x = (x | x << 2) & MORTON_MASK2D_U64[4];
            x = (x | x << 1) & MORTON_MASK2D_U64[5];

to make it pass this test. I am still having trouble with the 3D version though...

Just wanted to ping you on the issue that the magic bits implementation might not be 100% correct. It seems to work for small enough morton codes, but if you only need those you should probably be using u8 or u16 specific implementations.

A good test is just to test it against the BMI2 version for the full u8, u16, and u32 integer ranges, and for some subranges of the u64 integer range (testing the whole u64 range takes N * 100 years on a 3Ghz CPU).

If it turns out that this is not a bug in my code, but actually a problem with libmorton, you might want to repeat the benchmarks since if more operations are needed for correctness maybe BMI2 will compare better.

Forceflow commented 7 years ago

Can you give an example of a coordinate (x,y,z) for which it fails?

gnzlbg commented 7 years ago

So for a value = 4293967295

morton_decode_3d_magicbits(v) -> (947, 927, 975)
morton_encode_3d_magicbits(947, 927, 975) -> 1072741823 != 4293967295

The bit pattern of this value is:

0b1111_1111_1111_0000_1011_1101_1011_1111
  yxzy_xzyx_zyxz_yxzy_xzyx_zyxz_yxzy_xzyx

so i expect for 32-bit x, y, z:

x: 0b0011_1011_0011 -> 947.
y: 0b0011_1001_1111 -> 927.
z: 0b0011_1100_1111 -> 975.

which is what morton_decode_3d_magicbits returns. When those coordinates are encoded I expect to obtain back 4293967295 but I get:

0b1111_1111_1111_0000_1011_1101_1011_1111 == 4293967295
0b0011_1111_1111_0000_1011_1101_1011_1111 == 1072741823

the two high-order bits are missing.

gnzlbg commented 7 years ago

The confusion comes from the following:

For a value = 4293967295

morton_decode_3d_bmi2(v) -> (1971, 1951, 975)
morton_encode_3d_bmi2(1971, 1951, 975) -> 4293967295 == 4293967295

So... the BMI2 functions do preserve the last bits of the x and y coordinates for 32-bit, e.g., for x:

0b0011_1011_0011 = 947 (result from magicbits)
0b0111_1011_0011 = 1971 (result from bmi2)

So... I don't know how I feel about this. The last bit of x and y should not be used for 32-bit because z lacks it. It is not part of any coordinate. So from this POV "magicbits is right".

The BMI2 version preserves this bit, which makes encoding/decoding invertible. The magicbits version does not, which means encoding/decoding is not invertible (at least at first, encode -> decode -> encode -> invertible from now on...). So from this POV "bmi2 is right".

Since LUT and magicbits produce the same results here, and only BMI2 is off, I would suggest leaving it like it is.

gnzlbg commented 7 years ago

Those are the issues I can find with the 32-bit version. Here is one issue with the 64-bit version.

Given the 64-bit value 9223372036853775807 (which has its last bit cleared, so magicbit's result should be "invertible", 63 / 3 == 21):

BMI2 decodes x: 2097075, y: 2097055, z: 2097103 which is correct (and also re-encodes to the original value).
Magicbits decodes: x: 4503599627435955, y: 4503599627435935, z: 4503599627435983 which is incorrect. These coordinates re-encode with magicbits to 8070732007223639487 which is also not the original value. Maybe one shift is missing in the 3D decoding and decoding functions for magicbits :/
LUT produces the same results as BMI2 so Magicbits seems to be off here.

Forceflow commented 7 years ago

I will look into this, thanks for reporting!

gnzlbg commented 7 years ago

So I've fixed this. The relevant magic bits module with the 2D/3D morton encoding is here.

IIRC I've changed:

the split_by_second_bits/get_second_bits/split_by_third_bits/get_third_bits 64-bit code (the 32-bit masks and code should be identical, but you might want to check this as well),
the MASK_3D_U64 constants,
the 3D platform-independent wrappers for encoding/decoding here by adding a helper to copy the highest bits that the magic/lut functions miss. This results in a negligible performance cost, within measuring error (for running the benchmarks in benches/morton.rs just run cargo bench), and makes the exported functions invertible, which is a huge usability/correctness win.

Note that I still offer the non-wrapped versions without this.

EDIT: all the code is in Rust, but it should be trivial to translate it to C++ (just copy the masks, and correct the bit operations in split/get_..._bits. Translating the code for copying the high-order bits should be straight forward as well. Hope this helps.

EDIT2: The tests are inline at the end of the module. Most of them just check that bmi2, lut, and magicbits produce the same results though. They don't really check whether these results are correct, and the checking on u64 is "minimal" (I am pretty confident that bmi2 always produces correct results though). I've added you in the Copyright part of the license (which is also MIT), but I want to work in the docs next so I'll mention libmorton in the docs of the morton module as well as soon as I get to them.

Forceflow commented 7 years ago

Fixed - found some other bugs now, but we live and learn, I guess :)

Thanks!

gnzlbg commented 7 years ago

Thank you!

Forceflow commented 7 years ago

And addition fix in 1f36867c640e994d6a914a6a5450752fd703acad

I will link to your library in the readme as well!

Forceflow / libmorton

Is the magicbit version correct? #25