felixguendling / cista

Cista is a simple, high-performance, zero-copy C++ serialization & reflection library.
https://cista.rocks
MIT License
1.78k stars 113 forks source link

Help: Serialization and inheritance (with standard layout classes) #161

Closed AdelKS closed 1 year ago

AdelKS commented 1 year ago

Hello!

I would like a little bit of help on serializing a class B that inherits A, both standard layout but need custom serialization functions.

For example, let's say we have this

struct A
{
  int _a;
  int _b;

  A(const int a, const int b): _a(a), _b(b) {}

  template <typename Ctx>
  friend inline void serialize(
      Ctx & context, A const* el,
      cista::offset_t const offset) {
    using cista::serialize;
    serialize(context, &el->_a,     offset + offsetof(A, _a));
    serialize(context, &el->_b,     offset + offsetof(A, _b));
  }
}

and this

struct B: A
{
  int _c;
  int _d;

  B(const in a, const int b, const int c, const int d): A(a, b), _c(c), _d(d) {}

  template <typename Ctx>
  friend inline void serialize(
      Ctx & context, B const* el,
      cista::offset_t const offset) {
    // What do I do here ?
  }

}

How do I write the serialization of B ?

Thanks!

AdelKS commented 1 year ago

Oh it looks like having data members in both A and B breaks the standard layout trait, I misread the spec :/

I wonder if we can still serialize nonetheless, with another approach I suppose :thinking:

felixguendling commented 1 year ago

Cista was not really designed with derived classes in mind, but it should work like this: https://gcc.godbolt.org/z/fM1577Yfc

#include "cista.h"

template <typename T, typename Member>
cista::offset_t member_offset(T const* t, Member const* m) {
    return (reinterpret_cast<std::uint8_t const*>(m)
           - reinterpret_cast<std::uint8_t const*>(t));
}

struct A {
  int _a;
  int _b;

  A(const int a, const int b): _a(a), _b(b) {}

  template <typename Ctx>
  friend inline void serialize(
      Ctx & context, A const* el,
      cista::offset_t const offset) {
    using cista::serialize;
    serialize(context, &el->_a, offset + offsetof(A, _a));
    serialize(context, &el->_b, offset + offsetof(A, _b));
  }
};

struct B : A {
  int _c;
  int _d;

  B(const int a, const int b, const int c, const int d): A(a, b), _c(c), _d(d) {}

  template <typename Ctx>
  friend inline void serialize(
      Ctx & context, B const* el,
      cista::offset_t const offset) {
    using cista::serialize;
    serialize(context, static_cast<A const*>(el), offset);
    serialize(context, &el->_c, offset + member_offset(el, &el->_c));
    serialize(context, &el->_d, offset + member_offset(el, &el->_d));
  }
};

int main() {
    B b{1, 2, 3, 4};
    auto const buf = cista::serialize(b);
    auto const ptr = cista::deserialize<B>(buf);
    printf("%d %d %d %d\n", ptr->_a, ptr->_b, ptr->_c, ptr->_d);
}
AdelKS commented 1 year ago

Awesome thanks ! I did not think of the member offset trick ! I found out about it late yesterday to report on it.

It may be worth adding this to an "unsupported" page in the wiki maybe, it looks like it should work as long as no class makes use of the vtable pointer (no virtual inheritance and no virtual functions) :thinking:

felixguendling commented 1 year ago

This is a shorter version that does the same: https://gcc.godbolt.org/z/s7drY1b9M

#include "cista.h"

struct A { int _a, _b; };

struct B : A {
  auto cista_members() { return std::tie(_a, _b, _c, _d); }
  int _c, _d;
};

int main() {
  B b{1, 2, 3, 4};
  auto const buf = cista::serialize(b);
  auto const ptr = cista::deserialize<B>(buf);
  printf("%d %d %d %d\n", ptr->_a, ptr->_b, ptr->_c, ptr->_d);
}

I think it's better to advertise this way in the Wiki.

AdelKS commented 1 year ago

Wait... Does it recurse at compile time, with tuple_cat, if some of the elements of the tuple also offer a cista_members() method ? This is way better than writing the custom serialize function described in the wiki, as even type hashing could go through it !!

AdelKS commented 1 year ago

In the example you gave, it would then be

#include "cista.h"

struct A 
{ 
  int _a, _b;
  auto cista_members() { return std::tie(_a, _b); }
};

struct B : A {
  auto cista_members() { return std::tie(static_cast<A>(*this), _c, _d); }
  int _c, _d;
};

int main() {
  B b{1, 2, 3, 4};
  auto const buf = cista::serialize(b);
  auto const ptr = cista::deserialize<B>(buf);
  printf("%d %d %d %d\n", ptr->_a, ptr->_b, ptr->_c, ptr->_d);
}

That would be AMAZING.

felixguendling commented 1 year ago

Short answer: yes, this works.

(*static_cast<A>(*this) instead of static_cast<A>(*this))

Long answer:

To check if something like this really works, it's always a good idea to use complex types with pointers, allocations, etc. like vector, string, etc. - with those types it will probably crash if your approach would not work.

With scalars like int in this example, it could always "accidentally" work, because the first step for serialization in cista is to memcpy into the serialization buffer anyway. If you stay on a CPU with the same endianess, this will work, even if serialize would not be called properly for each member (which is required for complex types).

I constructed a more complex example to check:

#include "cista.h"

#include "fmt/core.h"
#include "fmt/ostream.h"

namespace data = cista::offset;

struct A { data::vector<data::string> _a, _b; };

struct B : A {
  auto cista_members() { return std::tie(*static_cast<A*>(this), _c, _d); }
  data::vector<data::string> _c, _d;
};

struct C : A {
  auto cista_members() { return std::tie(*static_cast<A*>(this), _e, _f); }
  B _e;
  cista::optional<int> _f;
};

int main() {
  data::string s = "I am a very very very long string!";

  cista::buf b;
  {
    C c;
    c._a = {s};
    c._b = {s, s};
    c._e._a = {s, s, s};
    c._e._b = {s, s, s, s};
    c._e._c = {s, s, s, s, s};
    c._e._d = {s, s, s, s, s, s};
    c._f = 1337;
    cista::serialize(b, c);
  }

  auto const p = cista::deserialize<C>(b);
  fmt::print(
    "a = {}\nb = {}\ne.a = {}\ne.b = {}\ne.c = {}\ne.d = {}\nf = {}",
    p->_a, p->_b, p->_e._a, p->_e._b, p->_e._c, p->_e._d, *p->_f);
}

https://gcc.godbolt.org/z/K7jh15W6W

I hope it helps :)

I will add a section in the documentation.

felixguendling commented 1 year ago

as even type hashing could go through it !!

This does not only apply to type hashing but also for normal hash functions if you want to use those types in hash_map or hash_set.

#include "cista.h"

#include "fmt/core.h"
#include "fmt/ostream.h"

namespace data = cista::offset;

struct A { data::vector<data::string> _a, _b; };

struct B : A {
  auto cista_members() { return std::tie(*static_cast<A*>(this), _c, _d); }
  data::vector<data::string> _c, _d;
};

struct C : A {
  auto cista_members() { return std::tie(*static_cast<A*>(this), _e, _f); }
  B _e;
  cista::optional<int> _f;
};

int main() {
  data::string s = "I am a very very very long string!";

  cista::buf b;
  {
    C c;
    c._a = {s};
    c._b = {s, s};
    c._e._a = {s, s, s};
    c._e._b = {s, s, s, s};
    c._e._c = {s, s, s, s, s};
    c._e._d = {s, s, s, s, s, s};
    c._f = 1337;
    cista::serialize(b, c);
  }

  auto const p = cista::deserialize<C>(b);
  fmt::print(
    R"(type_hash = {}
a = {}
b = {}
e.a = {}
e.b = {}
e.c = {}
e.d = {}
f = {}
)",
    cista::type_hash<C>(), p->_a, p->_b, p->_e._a, p->_e._b, p->_e._c, p->_e._d, *p->_f);

    data::hash_map<B, int> map;
    map.emplace(B{}, 7331);
    fmt::print("entry = {}\n", *map.get(B{}));
}

(see added usage of explicit cista::type_hash as well as using hash_map<B, int> which demonstrates hashing of types for keys in hash_map or hash_set.

https://gcc.godbolt.org/z/ETeqozGh4

AdelKS commented 1 year ago

This is very satisfying. That's what we call a modern serialization library.

(static_cast<A>(this) instead of static_cast(this))

It probably works with simply static_cast<A&> as I believe reference of reference gets squashed down to a reference.

Last question (I can check it on my own but...), if any class has a type_hash(EncryptedRing const& el, cista::hash_t h, std::map<cista::hash_t, unsigned>& done) function, will it be used in priority, when taking the cista_members() approach ?

Thanks!

felixguendling commented 1 year ago

Since cista_members is just another way to make cista::to_tuple_works true, this only enables the default/fallback type hash / (de-)serialization to work. So if you have a custom serialize/deserialize/type_hash function, this to_tuple_works check is irrelevant because the more your more specialized implementation should take precedence over the most generic cista::type_hash/serialize/deserialize anyway.

https://github.com/felixguendling/cista/blob/0f78d0cc2cae1440a2c72bdc1311efbf3a9c753c/include/cista/type_hash/type_hash.h#L41-L73

However, currently I don't think there are cases where you would want a custom type hash but cista_members enabled de/serialization function because the type_hash should reflect the binary compatibility of the de/serialization functions. This is already accomplished with the generic default type_hash that uses cista_members.

felixguendling commented 1 year ago

I updated the docs and the README to reflect the findings in this discussion. Thank you for your feedback!

AdelKS commented 1 year ago

However, currently I don't think there are cases where you would want a custom type hash but cista_members enabled de/serialization function because the type_hash should reflect the binary compatibility of the de/serialization functions. This is already accomplished with the generic default type_hash that uses cista_members.

Agreed, just that I was throwing extra compile time string in the type hash of classes, just like the h = hash_combine(h, hash("struct")); above. I suppose it could be useful for class A that inherits B but without any extra member, although that makes it binary compatible with A.

AdelKS commented 1 year ago

Unfortunately the new approach does not work with std::array at least when trying to hash it, as it tries to make it into a tuple, and it reaches a recursion depth limit when std::array is too large. It because of for_each_field tries to make std::array into a tuple.

Changing for_each_field to the following fixes the issue

template <typename T, typename Fn>
void for_each_field(T& t, Fn&& fn) {
  if constexpr (std::is_pointer_v<T>) {
    if (t != nullptr) {
      for_each_field(*t, std::forward<Fn>(fn));
    }
  } else if constexpr (std::is_invocable_v<Fn, T&>) {
    fn(t);
  } else {
    std::apply([&](auto&&... args) { (fn(args), ...); }, to_tuple(t));
  }
}

Basically to stop recursing into the tuple members as soon as the function can apply, I think this change shouldn't break anything ? :thinking:

AdelKS commented 1 year ago

Also, it's not very easy to have cista_members as a protected method, as one needs to friend at least three long templated cista functions.

felixguendling commented 1 year ago

Not sure how to reproduce. I found a problem but it didn't involve infinite recursion. I tried in #163. I increased the array size in the array test to >64 which leads to the problem that to_tuple does not work anymore. This way I found out that the specialized type_hash function I wrote for array<T, N> is not even called. The PR fixes this problem.

Since there is a specialized type_hash<array<T, N>>(), the goal should be to make sure this one is called instead of the generic one that just calles for_each_field. I tried to ensure this by moving it up. However, best would probably be to forward declare everything. Then every implementation can call every function, regardless of declaration order.

I think the problem is that I didn't forward declare all type_hash functions and it didn't find the more concrete type hash function.

Not sure if you fix for_each_field like you proposed if it will still work for "normal" cases where it's not just about fixing the infinite recursion in the type_hash function.

AdelKS commented 1 year ago

Hello!

Thanks for looking into it, you can reproduce it with std::array< std:array<int, 1000>, 1000>

felixguendling commented 1 year ago

Did the fix work for you?

AdelKS commented 1 year ago

Did the fix work for you?

Yes it did ! Because it avoids making the inner std::array into a tuple.

felixguendling commented 1 year ago

Perfect! Thank you for clarification. :-)

AdelKS commented 1 year ago

Oh sorry, you meant the fix you merged. Gimme a few minutes to try it x)

AdelKS commented 1 year ago

Yes it looks like the fix you merged is all good, thanks!

felixguendling commented 1 year ago

Perfect! Thank you for your feedback. It helps to improve the library.