DoumanAsh / xxhash-c-sys

Rust raw bindings to xxHash
Boost Software License 1.0
3 stars 1 forks source link

use bindgen to regenerate bindings at build-time #2

Closed decathorpe closed 2 years ago

decathorpe commented 2 years ago

This keeps compatibility with no_std by using constants from core or by using C types from the libc crate instead of std.

DoumanAsh commented 2 years ago

@decathorpe Can you also show error that prevents you from using this crate as it is?

decathorpe commented 2 years ago

Sure. I'm trying to reproduce the issues I had yesterday right now.

DoumanAsh commented 2 years ago

Looking at how bindgen generates typedef for size_t I assume it might be possible that usize in Rust is not equivalent to size_t on some platforms As otherwise all other integers used by xxhash are fixed size.

So can you point me to sys/types.h used by fedora's toolchain?

DoumanAsh commented 2 years ago

Lol surprisingly it doesn't even work as it is on ubunt and macos in github CI...

decathorpe commented 2 years ago

Looking at how bindgen generates typedef for size_t I assume it might be possible that usize in Rust is not equivalent to size_t on some platforms

That should not be possible, according to Rust's own documentation: https://doc.rust-lang.org/std/os/raw/type.c_size_t.html

The only difference can be that size_t is a 32-bit number on 32-bit systems, but a 64-bit number on 64-bit systems. So bindings that are generated on x86_64 and use u64 instead of usize for size_t are not compatible with 32-bit libraries, but if they use usize, it should do the correct thing.

Lol surprisingly it doesn't even work as it is on ubunt and macos in github CI...

Why is that funny? This is one of the problems I mentioned on GitLab. I couldn't figure out where the invalid memory access happened, though.

DoumanAsh commented 2 years ago

The only difference can be that size_t is a 32-bit number on 32-bit systems, but a 64-bit number on 64-bit systems. So bindings that are generated on x86_64 and use u64 instead of usize for size_t are not compatible with 32-bit libraries, but if they use usize, it should do the correct thing.

I manually checked my bindings, they use size_t

I remember why I stopped using bindgen It incorrectly generated definitions of state. See what is generated on my machine (these are only useful for opaque pointers)

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct XXH3_state_s {
    _unused: [u8; 0],
}

Real definition:

pub const XXH3_INTERNALBUFFER_SIZE: libc::c_int = 256;
pub const XXH3_SECRET_DEFAULT_SIZE: libc::c_int = 192;
#[repr(C)]
#[repr(align(64))]
#[derive(Debug, Copy, Clone)]
struct Acc([XXH64_hash_t; 8]);
#[repr(C)]
#[repr(align(64))]
#[derive(Debug, Copy, Clone)]
struct CustomSecret([u8; XXH3_SECRET_DEFAULT_SIZE as usize]);
#[repr(C)]
#[repr(align(64))]
#[derive(Debug, Copy, Clone)]
struct Buffer([u8; XXH3_INTERNALBUFFER_SIZE as usize]);
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct XXH3_state_s {
    acc: Acc,
    customSecret: CustomSecret,
    buffer: Buffer,
    bufferedSize: XXH32_hash_t,
    useSeed: XXH32_hash_t,
    nbStripesSoFar: usize,
    totalLen: XXH64_hash_t,
    nbStripesPerBlock: usize,
    secretLimit: usize,
    seed: XXH64_hash_t,
    reserved64: XXH64_hash_t,
    extSecret: *const u8,
}
pub type XXH3_state_t = XXH3_state_s;
DoumanAsh commented 2 years ago

Test probably segfaults because it allocates state on stack, since struct is invalid, it crashes

decathorpe commented 2 years ago

It incorrectly generated definitions of state.

Yeah, that doesn't look right. But this means that Clang cannot parse xxhash source code correctly, which should be a problem in its own ...

Test probably segfaults because it allocates state on stack, since struct is invalid, it crashes

Makes sense.

Either way: The original issue I had with compiling this crate happened with 0.8.1, but I cannot reproduce the same problem with 0.8.2 or 0.8.3, so some change between 0.8.1 and 0.8.2 seems to have fixed the struct size mismatch on i686.

DoumanAsh commented 2 years ago

Either way: The original issue I had with compiling this crate happened with 0.8.1, but I cannot reproduce the same problem with 0.8.2 or 0.8.3, so some change between 0.8.1 and 0.8.2 seems to have fixed the struct size mismatch on i686.

Ah wait, 0.8.1 is faulty? Hmm not sure why, it should be correct one

Maybe because I removed useless layout tests left from bindgen?

decathorpe commented 2 years ago

Yeah, 0.8.1 is the only one that seems broken on i686. Here are builds of unmodified sources of xxhash-c-sys for multiple architectures: https://copr.fedorainfracloud.org/coprs/decathorpe/sequoia-test-builds/package/rust-xxhash-c-sys/

You can see that 0.8.1 failed on i686: https://copr.fedorainfracloud.org/coprs/decathorpe/sequoia-test-builds/build/4221704/

With this error message:

     Running unittests (target/release/deps/xxhash_c_sys-c76e559b2bcbaa0f)
running 4 tests
test bindings::bindgen_test_layout_XXH128_canonical_t ... ok
test bindings::bindgen_test_layout_XXH32_canonical_t ... ok
test bindings::bindgen_test_layout_XXH128_hash_t ... FAILED
test bindings::bindgen_test_layout_XXH64_canonical_t ... ok
failures:
---- bindings::bindgen_test_layout_XXH128_hash_t stdout ----
thread 'bindings::bindgen_test_layout_XXH128_hash_t' panicked at 'assertion failed: `(left == right)`
  left: `4`,
 right: `8`: Alignment of XXH128_hash_t', src/bindings.rs:267:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
    bindings::bindgen_test_layout_XXH128_hash_t
test result: FAILED. 3 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

The later versions succeeded.

DoumanAsh commented 2 years ago

@decathorpe Ah I understand why These layout tests are target dependent hence(I mention it again, but I don't trust bindgen to do work properly) I probably removed them because they were useless anyway

    assert_eq!(
        ::core::mem::align_of::<XXH128_hash_t>(),
        8usize,
        concat!("Alignment of ", stringify!(XXH128_hash_t))
    );

Imagine hardcoding 8 instead of doing sum of size_of of each field 😄

DoumanAsh commented 2 years ago

I suggest to use my modified bindgen bindings. They are completely cross-platform

decathorpe commented 2 years ago

These layout tests are target dependent

Well, of course they are. Which is why they are also re-generated for the target platform when running bindgen at build-time ;)

I suggest to use my modified bindgen bindings.

I will consider doing that. Thanks for helping me investigate.

DoumanAsh commented 2 years ago

Ok, I'll yank 0.8.1 just in case. Bindings themself are valid, but tests broken so hopefully no one will encounter it again