getml / reflect-cpp

A C++20 library for fast serialization, deserialization and validation using reflection. Supports JSON, BSON, CBOR, flexbuffers, msgpack, TOML, XML, YAML / msgpack.org[C++20]
https://getml.github.io/reflect-cpp/
MIT License
901 stars 76 forks source link

Q: deserializing/serializing complex XML documents? #149

Closed ethindp closed 1 month ago

ethindp commented 1 month ago

The title is a bit odd, but I've read through the docs (and this library looks truly awesome) and I have a bit of a (possibly unique) use-case. Though I feel like this is pretty common with XML documents.

If I have a document that has this structure:

<top>
<element
    attributes for element (integer/boolean/single-line string)
>
<multi-line attribute>value</multi-line attribute>
...
</element>
...
</top>

Then I'd like to "move up" the (inner) elements of element into element itself when deserializing (since I don't have control over the actual structure of the document). The docs indicate that it's possible to "flatten" when serializing, but is it possible to go the other way? That is: flatten when deserializing but expand when serializing?

If the type-system constrains this and there isn't any way around it I don't mind using a custom processor with the reflection parts of the library only to handle it. If at all possible though I'd like to just define the attributes once given how many there are for the document I'm trying to parse.

Edit: I'd also like to rename the top-level element when serializing but that should be a lot easier (if I remember right that's just a template string literal when calling rfl::xml::write).

liuzicheng1987 commented 1 month ago

Hi @ethindp ,

I am not entirely sure I understand your requirement. Could you provide a somewhat more detailed example of what you are trying to do?

But if I do understand this correctly, I would do something like this:

struct MultiLine {
    rfl::Attribute<std::string> attribute;
    std::string xml_content; // magic field name, represents "value"
};

struct Element {
    rfl::Attribute<std::string> attribute1;
    rfl::Attribute<bool> attribute2;
    rfl::Attribute<int> attribute3;
    rfl::Rename<"multi-line", std::vector<MultiLine>> multi_line;
};

struct top {
    Element element;
};

The way I understand it, this should generate an XML that looks like what you have asked for.

Does that make sense?

liuzicheng1987 commented 1 month ago

And yes, you can just pass a string literal to rfl::xml::write to rename the top element.

ethindp commented 1 month ago

@liuzicheng1987 Sorry, I'll clarify as to your first comment.

The actual XML document I'm working with is the MUSHclient XML format. It has this general structure:

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE muclient>

<muclient>
<world
  ... world single-line attributes (eg. ip address) ...
>

  ... world multi-line tags (eg. <connect_text> ... </connect_text> ...

</world>

<triggers>

  ... individual triggers here ... (eg. <trigger> ... </trigger> )

</triggers>

<aliases>

  ... individual aliases here ... (eg. <alias> ... </alias> )

</aliases>
... Continues like this...

Essentially, the attributes on world itself can be boolean, numeric, color, or single-line strings (i.e. they're all just strings in the actual doc itself). But world can have sub-elements, like so:

<connect_text>Now connected to game</connect_text>

What I'm asking is: How do I represent things like connect_text as another field in the struct (but as a string), like I would for the single-line strings, without needing to create a new struct? Example of what I'm asking to achieve:

struct MCLFile {
// These can appear in any order and are all optional, theoretically
World world;
std::vector<Trigger> triggers;
std::vector<Alias> aliases;
// ...
};

struct World {
// Sample of some string attributes that might (but might not) appear in a document
std::string chat_file_save_directory;
std::string chat_name;
std::string chat_message_prefix;
// Example of some multi-line attributes, but they are child elements in the document itself
std::string connect_text;
std::string filter_aliases;
std::string filter_timers;
};

As XML this should, in theory, be then able to read a document like:

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE muclient>

<muclient>
<world
  chat_name = "name"
  chat_message_prefix = "prefix"
>
  <filter_aliases>
This is a
multi-line string
</filter_aliases>
</world>
</muclient>

And it should be able to write a document like that too, if I do it right and it's possible.

liuzicheng1987 commented 1 month ago

@ethindp , you are pretty close.

In an XML context, std::vector is optional anyway, otherwise you can use std::optional. If you want something as an attribute, use rfl::Attribute.

The order of the fields does not matter, the parser goes my field names, not order.

struct MCLFile {
// These can appear in any order and are all optional, theoretically
std::optional<World> world;
std::vector<Trigger> triggers;
std::vector<Alias> aliases;
// ...
};

struct World {
// Sample of some string attributes that might (but might not) appear in a document
rfl::Attribute<std::optional<std::string>> chat_file_save_directory;
rfl::Attribute<std::optional<std::string>> chat_name;
rfl::Attribute<std::optional<std::string>> chat_message_prefix;
// Example of some multi-line attributes, but they are child elements in the document itself
std::string connect_text;
std::string filter_aliases;
std::string filter_timers;
};
ethindp commented 1 month ago

@liuzicheng1987 Thanks so much! :) One last question: if I have an element like:

<element attr1 attr2>text</element>

What's the appropriate way of reading the text part? The format I'm parsing isn't documented anywhere so I'm just reading the code and trying to reimplement it using this lib and it's got some quirks.

liuzicheng1987 commented 1 month ago

Hi @ethindp,

you can use the magic field xml_content, like this:

struct element{
    rfl::Attribute<std::string> attr1;
    rfl::Attribute<std::string> attr2;
    std::string xml_content; // This represents "text"
};