felixguendling / cista

Cista is a simple, high-performance, zero-copy C++ serialization & reflection library.
https://cista.rocks
MIT License
1.74k stars 110 forks source link

Using cista cross platform between emscripten and native #212

Closed opera-aberglund closed 3 months ago

opera-aberglund commented 3 months ago

Hi there, I'm not sure if this is an issue or not, but I was trying to serialize and deserialize data between a native and emscripten web build of my application. The web version called cista::serialize and the native called cista::deserialize on that data that was sent on the network. This resulted in this check failing:

if constexpr (!std::is_same_v<std::remove_const_t<T>, void>) {
      verify((pos & static_cast<intptr_t>(std::alignment_of<decay_t<T>>() - 1U)) == 0U,
             "ptr alignment");
}

Which doesn't happen web-to-web or native-to-native, only in this cross-platform case. I have been using cista::offset structs for everything. I'm not using of the cista::mode options though, so I'm wondering if any of that would help, or if cista is simply not guaranteed to work cross-platform like this.

Thankful for any help to investigate this.

felixguendling commented 3 months ago

Is both 32bit or both 64bit? Otherwise, this is currently not supported. Serializing host can be done with a 32bit binary.

felixguendling commented 3 months ago

The problem is that the memory layout needs to be the same which is not the case for pointers on 32bit vs 64bit architectures. So only offset_ptr would work as it could use 32bit offsets. However, this would require that offets are always in the range of a 32bit integer which depends on where the allocator puts stuff and whether you mix stack and heap data structures.

I have not tried it, but maybe using offset mode cista::offset together with changing cista::offset_t from intptr_t to std::int64_t could help to make at least the offset mode cross-platform accross 32bit vs 64bit.

Another option I thought about was to create an allocator and as long as everything that will be serialized is allocated inside one buffer, then cista::offset could even be set to std::int32_t as long as the buffer doesn't exceed 2GB (or to be precise the largest offset stays within range of std::in32_t).

Edit: offset_t is defined here: https://github.com/felixguendling/cista/blob/715692faa0d6dea6c878e2e1a64cdbd1324274f7/include/cista/offset_t.h#L10

opera-aberglund commented 3 months ago

Yeah, I just realised web assembly is 32 bit by default, so I'm currently looking into if I can force it to build 64 bit.

But thanks for the other suggestions, I'll try them in a moment and report back.

felixguendling commented 3 months ago

If I remember correctly, we couldn't make it compile wasm with emscripten 64bit so we just used a 32bit x86 binary to produce the dump on the host. But I could imagine changing offset_t to a 64bit int instead of intptr_t could also do the trick and might be less work than to make everything compile on 32bit x86.

opera-aberglund commented 3 months ago

Changing to std::int64_t did the trick! I only had to #undef __cpp_lib_bit_cast for it to compile again. Much appreciated, perhaps write something about this architecture incompatibility in the docs? Not sure how common this use case is, but in my case I'm using Cista to serialize the game state in a cross web/native multiplayer game.

macadev commented 3 months ago

Hi @opera-aberglund! I very coincidentally find myself trying to do the same thing you are.

I have a C++ server and a C++ client compiled to WASM using emscripten that I run on the web. I want to use cista to serialize/deserialize messages that I send between them, and I think I ran into the same problem you did:

libc++abi: terminating due to uncaught exception of type cista::cista_exception: ptr alignment

If you have time, could you share the exact lines you changed to get serialization/deserialization working? I tried poking around the header file with the info mentioned on this issue, but I haven't gotten it to work yet. Thanks!

felixguendling commented 3 months ago

It's important to use scalar types with explicitly defined size everywhere. So the moment you start using something like std::size_t, it won't work. Only types like std::int32_t, std::uint64_t have the same size and alignment on all platforms. Also raw pointers (my_struct*) are not allowed - instead use cista::offset::ptr<T> which is an alias for cista::offset_ptr<T>. In general: did you use offset mode consistently? Raw mode will not work for this use case as pointers have a different width.

opera-aberglund commented 3 months ago

Hi @macadev! That's cool you found a similar use case for Cista. As Felix just wrote, there's probably something like that with the types you need to change. I only had to change that one single line in the header for it to work plus undefine __cpp_lib_bit_cast, but I'm also not using size_t anywhere and also only use cista::offset for my types.

macadev commented 3 months ago

Thanks for the information @felixguendling and @opera-aberglund. I think I'm following those guidelines correctly. Here's more sample code demonstrating what I'm doing. Excuse the poor quality code (it is hacky):

struct UserCommandMessage {
   uint8_t tick;
   uint8_t movementFlags;
   uint16_t mouseAngle;
   uint32_t socketId;
};

struct UserCommands {
   cista::offset::vector<UserCommandMessage> commands;
};

std::vector<uint8_t> MessageParserV2::serializeUserCommandMessages(const std::vector<UserCommandMessage> &messages) {
   cista::offset::vector<UserCommandMessage> cistaCommandsVec{};
   for (auto& command : messages) {
      cistaCommandsVec.emplace_back(command);
   }

   UserCommands commandsStruct{};
   commandsStruct.commands = cistaCommandsVec;
   return cista::serialize(commandsStruct);
}

std::vector<UserCommandMessage> MessageParserV2::deserializeUserCommandMessages(std::string_view message) {
   UserCommands commandsStruct = *cista::deserialize<UserCommands>(message);
   ...
}
felixguendling commented 3 months ago

With the assumption that you changed the cista::offset_t type to std::int64_t (for both host + web assembly!), this should work.

You can try to use cista::mode::CAST for deserialization and check if you can use the data. This will skip all checks (deserialization = reinterpret_cast<T const*>(buf.data())) and if there are serious errors in the data, your program will just crash.

macadev commented 3 months ago

I just did a bunch of testing - using cista::mode::CAST in the deserialize call got it to work!

Still not clear to me why it works for @opera-aberglund without that flag, although I think it might be because we are trying to do different things:

I serialize natively and deserialize in WASM compiled code. Opera serializes in WASM compiled code and deserializes in native code (unless I'm interpreting what's on this thread incorrectly).

I'll poke around more tomorrow as I try using cista in other parts of my messaging code.

opera-aberglund commented 3 months ago

I'll mark this as closed for now :)

opera-aberglund commented 3 months ago

@macadev I just checked and it works both ways! Web to native and native to web. What browser and what operating system are you using? I've only tested on Opera, Safari and MacOS (apple silicon) so far.