Benchmarking against, and possibly supplementing, `lindel`

DoubleHyphen commented 3 years ago

Could you also include the lindel crate in the benchmarks you produce? Mr Chernoch's code has some… inefficiencies, let's say, which I took upon myself to correct. I don't think it'll come anywhere remotely close to your own speed, but I also suspect it'll at least fall within the same ball-park as the other crates.
One of the parametres to your functions is the so-called “order” of the Hilbert curve. Correct me if I'm wrong: I think that in your specific case all that matters is whether the order is an even or odd number. Is this indeed what's happening?
Do you have any idea if the tricks you used here can be generalised for arbitrary dimensions? I suspect that generalising them to eg u8s or u16s would be trivial, but the part about the arbitrary dimensions will take a bit more thought methinks.

Thanks in advance!

becheran commented 3 years ago

Performance is at about 148 ns. Not bad. You should also better compare your lib with something like https://crates.io/crates/hilbert since fast-hilbert is optimized for 2D only and not higher dimensions.
The order is not well documented. I need to update the docs here. It is basically just the input dimensions/number of bits used. Say you use the order 8, it basically means that the 2D discrete space 2^8 * 2^8 is considered. I guess your lib will always take the full space of the number type use. For example u8 would mean order 8, u16 order 16...
Unfortunately not. I don't think it's that easy either. This crate here is for mapping of 2D => Hilber and backwards only. It is not as generalizable as lindel. Speedup higher dimensions would require another LUT which would soon not fit into memory anymore and might only make sense for dimensions <= 3 I would say. Though if you find a way plz let me know :)

becheran commented 3 years ago

Branch with benchmark of linden: https://github.com/becheran/fast-hilbert/tree/linden-bench

DoubleHyphen commented 3 years ago

Thank you so much! Now, let's see.

My code is actually a refactoring of hilbert's, so I don't think there's much meaning in making further comparisons. In any event, it appears that the speedup was 5x. Significant, but I'll admit I was hoping for more… hilbert's crate maintainer had described the algorithm used therein as very fast, so at the very least I'd have expected it to match the implementation you found on Wikipedia instead of being twice as slow.
I'm pretty sure that what you call the “order” of the Hilbert curve is what hilbert calls “amount of bits to examine”. In the beginning, I thought that Hilbert encoding was like Morton encoding, in that I expected it to be insensitive to all leading zeros. This, in the general case, is not true. What I noticed whilst refactoring, however, is that all leading zeros that come in multiples of D (where D is the amount of dimensions) can be safely ignored without changing the final result. ~~I'll download fast_hilbert and experiment a bit and let you know.~~ I just ran the experiments: The only difference that the order makes is whether it is even or odd. You might as well turn it into a boolean… or, better yet, get rid of it entirely. That way you'd also make it impossible for the user to pass wrong arguments to the function.
What fits in memory and how well is a matter for later examination. For now I'll ask: how exactly was the LUT constructed? What thought process led you to it?

Here is the code I used for the experiments in 2.:

    for _ in 0..1024 {
        let x = rand::random::<u32>() & 65535;
        let y = rand::random::<u32>() & 65535;
        let original_key = fast_hilbert::xy2h(x, y, 32);
        println!("x: {:010}\ny: {:010}\nkey: {}", x, y, original_key);
        for i in (16..32).step_by(2) {
            let new_key = fast_hilbert::xy2h(x, y, i);
            assert_eq!(original_key, new_key);
        }
    }

becheran commented 3 years ago

The algorithm on Wikipedia is for the special 2D case only which is an easier problem to solve than what you try.
Interesting. Yes it is also the amount of bits examined. I did not know that the pattern repeats every 2nd iteration? Anyways, the higher the order the slower it will be. An order of 2 is faster than order 32. So bool is not an option here.
2D Hilbert curves can be constructed bitwise from the input numbers using a state machine. See the docs for the state machine image: https://docs.rs/fast_hilbert/0.1.0/fast_hilbert/. What I did is squeezing's 3 conversion steps (3 x and 3 y input bits => 6 hilbert bits) to one 8bit=>8bit LUT were two bits are used for the 4 possible states of the state machine.

DoubleHyphen commented 3 years ago

Well, the thing is that my kung-fu is strong. I created a new fork of your code that gets rid of the order parametre (which I call, with no small amount of perverse joy, disorderly) and instead calculates the ideal order based on x and y. Benchmark it and see; I suspect the speed won't have changed much, if at all. (And also, as I mentioned: The pattern repeats every D bits, which in this case is 2 because you're examining a 2-D curve.)

becheran / fast-hilbert

Benchmarking against, and possibly supplementing, `lindel` #1