getml / reflect-cpp

A C++20 library for fast serialization, deserialization and validation using reflection. Supports JSON, BSON, CBOR, flexbuffers, msgpack, TOML, XML, YAML / msgpack.org[C++20]
https://getml.github.io/reflect-cpp/
MIT License
990 stars 85 forks source link

A way to request adding a "type" name member to JSON and other serialization formats #67

Closed Klaim closed 5 months ago

Klaim commented 7 months ago

While rfl::json::write(data) is super useful in many situations, if the type of the structs provided to this functions are important to the parser of the resulting json, the type names are lost. Same issue for other formats which do not have a type specified per aggregation.

Feature request: Provide an option for this function and similar for other formats, to request to the implementation to inject an additional member to every "struct" being read, which will have a name like "type" (or "typename") and a value being the name of the type being reflected, and this recursively for all members. Ideally, such feature would also allow to specify what would be the name of that "type" member.

liuzicheng1987 commented 7 months ago

@Klaim , TaggedUnions allow you to do something like this:

https://github.com/getml/reflect-cpp/blob/main/docs/variants_and_tagged_unions.md

But this is obviously not 100% what you want, because you want this added to the struct by default.

Could you provide a bit more context why you need this?

Klaim commented 7 months ago

Indeed it's not what I want also because it can't work recursively for struct members of a struct for example.

Could you provide a bit more context why you need this?

Yes of course, I'll try to avoid too much details but if you need more feel free to ask: the heart of the context is that I have a set of structs representing various messages (events or action requests) which need to be passed from some code to another layer of code which is not written in C++. In that specific context, the easiest way to transfer these messages is through json serialization on both sides. I use type-erasing types to store the messages in vectors in the C++ side, so I can have a bunch of messages of various types to serialize to json and that's it. However the type_name need to be read on the other side after deserialization both to identify the kind of message (I dont want to have to add a message name member in every message instance obviously) and because the members can have various types which drives how the json interpretation will go on the receiving side.

I would prefer if json had (maybe optionally) type names of aggregates or I would love have the choice to use a format that has a notion of types instead, but right now Ihave to use json.

Klaim commented 7 months ago

Sorry apparently I hit some shortcut that closed the issue - fixed that, finishing my explanation:

I prototyped the mechanism I'm describing before going with the real project but at the time I used Boost.Describe (no C++20 serialization function existed yet, except boost.pfr but it didnt handle yet member names) to then write my own serialization function which was adding the type name as type_name : "that_event" for each structs recursively. That worked perfectly althouhg required some scafholding and the issue of having to explicitly declare the members of each type that are serializable in a macro.

My current alternatives:

Then looking at the json write fnuction from reflect-cpp I thought that it would not be complicated to let it do it for me... but I'm not sure if it is hard to implement because I dont know yyjson nor do I understand all the details of reflect-cpp so I didnt try a pr.

If you think it's too niche an issue, no problem, I'll try my own serialization instead.

liuzicheng1987 commented 7 months ago

@Klaim, indeed, it is a bit of a niche problem, but if we can think of a solution that doesn’t complicate the code too much and doesn’t create a lot of maintenance issues, I’d be happy to include it.

Here is how I solve problems like this:

struct Number {
    Literal<"number"> type;
    std::optional<std::string> description;
  };

Then I can just go like this:

const auto with_description = Number{.description = “description”};

const auto no_description = Number{};

And for the resulting JSON I will get:

{“type”: “number”, “description”: “description”}
{“type”: “number”}

So I don’t have to do it for every instance, just for every class. It seems to me that this is a lot easier than any solution you had suggested.

However, I think it might be possible to write some kind of a wrapper that would abstract these kind of things away. The only trouble I have is how to also automatically do it for all child structs. Because that would also include structs included in all kind of vectors, etc.

liuzicheng1987 commented 7 months ago

@Klaim, another relatively simple solution would be to write a custom parser for your structs. The advantage here is that it would leave your original structs completely unchanged.

https://github.com/getml/reflect-cpp/blob/main/docs/custom_parser.md

If this is a route you would consider, I can also help you set something up.

Klaim commented 7 months ago

So I don’t have to do it for every instance, just for every class. It seems to me that this is a lot easier than any solution you had suggested.

I dont think so: my goal is specifically to not have any such data in the type. This solution is basically as good as my Boost.Describe based solution , even worse: it forces me to add unused/redundant members to all the types (not only the message types, because these contain also other custom type members which also need to have their type reflected, hence the recursive aspect I mentionned). Boost.Describe would force me to write something for every type too, but at least it's not part of the type.

However, I think it might be possible to write some kind of a wrapper that would abstract these kind of things away. The only trouble I have is how to also automatically do it for all child structs. Because that would also include structs included in all kind of vectors, etc.

Indeed, but thats exactly what I need (not for vector, string, int etc. just for the other user-defined structs).

So I don’t have to do it for every instance, just for every class. It seems to me that this is a lot easier than any solution you had suggested.

That's better indeed, although that seems still similar and more verbose and has to be done per type, compared to Boost.Describe. It seems the custom parser cannot be generalized for a set of types but maybe if it's defined inside the serialization function which is already template, maybe that works. The general idea corresponds to my second alternative listed above, yes. I will try this and get back to you. 👍🏽

liuzicheng1987 commented 7 months ago

@Klaim , what you should do is to set up the helper struct like this, which is a trick I also use for the TaggedUnions:

template <class T, VfL::internal::StringLiteral _tag>
struct HelperStruct {
  using TypeName = rfl::Literal<rfl::internal::remove_namespaces<rfl::internal::get_type_name<T>()>()>;
  rfl::Rename<_tag, TypeName> tag;
  rfl::Flatten<T> data;

  static from_class(const T& _t): data(_t) {}

  T to_class() const {return data.value(); }
};

And then you can set up the custom parser like this:

struct template <class ReaderType, class WriterType, class T,
          rfl::internal::StringLiteral _tag>
struct AddTypeName: public CustomParser<ReaderType,WriterType, T, HelperStruct<T, _tag>>{};

Now you can set up the custom parsers like this:

namespace rfl::parsing{

template <class ReaderType, class WriterType>
struct Parser<ReaderType, WriterType, YourStruct >
    : public AddTypeName<ReaderType, WriterType, YourStruct, "type"> {};

template <class ReaderType, class WriterType>
struct Parser<ReaderType, WriterType, YourOtherStruct >
    : public AddTypeName<ReaderType, WriterType, YourOtherStruct, "type"> {};

}

And now a field "type": "YourStruct" or "type": "YourOtherStruct" will be added every time you serialize it.

I think this is much better than the Boost.Describe solution, because you do not have to register all of the field names for your structs. You can just place this somewhere in your code and be done with it.

And if it's still too much boilerplate code for you, you can set up Macros, which would boil it down to this:

namespace rfl::parsing{
ADD_TYPE_NAME(YourStruct, type);
ADD_TYPE_NAME(YourOtherStruct, type);
}
liuzicheng1987 commented 7 months ago

@Klaim, I have been giving this some thought. I think what I could do is to add a generalized postprocessing solution to rfl::xx::write. For instance, Rust’s serde allows you to automatically transform all of your snake_case field names to hungarianCase.

It would then be easy to write something like you want as well or the library could offer it out of the box. But I have to think about what a good syntax might look like and how to implement it.

Klaim commented 7 months ago

Seems interesting! Meanwhile I didn't have time yet to try the other solution, thanks for the the guidance, ill report asap

liuzicheng1987 commented 7 months ago

@Klaim , I have been thinking about the issue and the way I see it is it might be possible to come up with a syntax like this:

rfl::json::write<rfl::AddStructName<"type_name">>(your_struct);

This would add the following to all serializations of any struct (including child structs):

{
    "type_name": "NameOfTheStruct"
}

Likewise, it would be possible to have other such processors:

rfl::json::write<rfl::SnakeCaseToHungarianCase>(your_struct);

This would transform all field names from snake_case to hungarianCase.

Of course, you could combine them as well:

rfl::json::write<
   rfl::AddStructName<"type_name">, 
   rfl::SnakeCaseToHungarianCase>(your_struct);

And then you can do the same for the read operations as well:

rfl::json::read<YourStruct, rfl::HungarianCaseToSnakeCase>(json_string);

I am 95% certain that I could make this work. There are some open questions, but I am confident I can resolve them.

It appears to me that this is what you were asking for, @Klaim . Do you agree?

@jimixxperez , what are your thoughts on the matter?

Klaim commented 7 months ago

Yes I was initially thinking about an object to pass as second argument which would have held all options at runtime, but at least in my case your idea would work as I know the options I need at compile time. So yes that would solve my problem.

Klaim commented 6 months ago

I couldnt touch my project in the past 2 weeks but will resume my work today. I was wondering if you ended up experimenting with your idea already? If not I'll try the other solution that should work today. (right now I use v0.7.0 but I can use whatever commit if you want me to try something)

liuzicheng1987 commented 6 months ago

@Klaim , I was working on another issue over the last week. I will do a release over the weekend. But the idea of the processors, as described in this issue is going to be in the release after that.

liuzicheng1987 commented 5 months ago

Hi @Klaim ,

I have now added the concept of processors, as discussed in this issue. The one that you would be most interested in is AddStructName<...>.

The basic idea is that you can do the following:

struct Person {
     std::string first_name;
     std::string last_name;
     std::string age;
};

const auto homer = Person{.first_name = "Homer", .last_name = "Simpson", .age = 45};

const auto json_string = rfl::json::write<rfl::AddStructName<"type">>(homer);

const auto homer2 = rfl::json::read<Person, rfl::AddStructName<"type">>(json_string);

This will result in the following JSON:

{"type":"Person","first_name":"Homer","last_name":"Simpson","age":45}

If you want to, you can also transform "first_name" and "last_name" from snake_case to camelCase, that's another thing you can do with processors:

const auto json_string = rfl::json::write<rfl::SnakeCaseToCamelCase, rfl::AddStructName<"type">>(homer);

const auto homer2 = rfl::json::read<Person, rfl::SnakeCaseToCamelCase, rfl::AddStructName<"type">>(json_string);

This will result in the following JSON:

{"type":"Person","firstName":"Homer","lastName":"Simpson","age":45}

It is still in a feature branch (or not, depending on when you read this): https://github.com/getml/reflect-cpp/tree/f/processors

It appears to me that this is pretty much what you were asking for. Do you agree?

Klaim commented 5 months ago

It appears to me that this is pretty much what you were asking for. Do you agree?

Yes it seems to be exactly that, thanks! I suppose one can also add new processor types if they want to, which might help in the future but for now this is just what I needed. I used another way for now but will try this as soon as I can 👍🏽 I'll confirm soon.

liuzicheng1987 commented 5 months ago

@Klaim , no problem.

Thank you for the issue, this was a very interesting challenge to solve. If you have more ideas for processors, let me know. Now that the basic infrastructure is in place, it should be fairly easy to implement new ones.

Klaim commented 5 months ago

I can now confirm that this solves the problem I reported, at least in my project 👍🏽

As for some other related feedback (I'm noting them here because it's not actual issues for now)

Thanks again for the feature 👍🏽