felixguendling / cista

Cista is a simple, high-performance, zero-copy C++ serialization & reflection library.
https://cista.rocks
MIT License
1.78k stars 113 forks source link

Dealing with Data Pointer Offset in Checked Mode #168

Closed Eshnek closed 1 year ago

Eshnek commented 1 year ago

Hello! I have been working with Cista for some time now and find it quite fantastic.

I ran into an interesting problem related to checking:

I am serializing data and sending it over a network. I recently began switching to a new networking library, and hit a failure in Cista's checking of pointer alignment on the receiving end. This networking library gives a std::string_view of the received data, but the pointer is offset to account for an internal header that the library adds, which gives it a strange alignment. This causes a verification check to fail here:

template <typename T>
void check_ptr(T* el, std::size_t const size = type_size<T>()) const {
    ...

    // Fails
    verify((pos & static_cast<intptr_t>(std::alignment_of<decay_t<T>>() - 1U)) == 0U, "ptr alignment");
}

The potential solutions I see are:

  1. Disable checking entirely (Undesirable, it's nice to have)
  2. Adjust Cista to not do the alignment check in this case only (But maintaining a private fork is undesirable. Could be opening an exploit, etc...)
  3. Adjust the networking library to have the correct offset for the data pointer (Same caveat as 2)

I tried using alignas to get the check to pass, but with the cista::offset::variant I am using I have not been able to make it work.

Is there a solution that I am missing here?

felixguendling commented 1 year ago

In general, unaligned memory access is undefined behavior (UB) in C++ and we try to avoid UB as much as possible. Some architectures (e.g. x86) can handle unaligned access for most instructions. However, even with x86 you need to be careful if you want to use SIMD instruction sets like SSE or AVX. You cannot be sure where the compiler tries to apply those instructions if you enable them. So the only safe thing to do, is to make sure you only access memory in an aligned way. This way, you're also future-proof for the ARM or ARM64 instruction sets which are more strict with regard to alignment.

In your case, I would probably think about making sure that

If you cannot make sure that memory access is always aligned, I would probably try to use something like FlatBuffers or Cap'n'Proto because these make sure that scalars are std::memcpy'ed before access to prevent UB.

If you're only developing for a very specific hardware architecture that does not and will not have any alignment requirements, it would probably make sense to add a flag to cista that allows you to disable only the alignment checks.

Eshnek commented 1 year ago

I see, thanks for the info.

As I understand then, copying the unaligned string before deserializing is a functional workaround because it gives Cista correctly aligned data to work with. I can't think of a way to achieve the right offset without modifying the networking library or copying the data.

As far as the custom flag goes, it works but like you said it's not future-proof.

I will see if I can find a way to get the networking lib (uWebSockets) to align the memory properly. Luckily an extra copy is likely not too big of a slowdown in my case, I will have to profile it to determine the true importance.

Eshnek commented 1 year ago

I ended up implementing a compression layer between Cista and the networking library, which amounts to a copy anyways. This allows for minimal overhead and a correct alignment.

felixguendling commented 1 year ago

Sounds like a great solution!