breese / trial.protocol

Network wire protocols
11 stars 4 forks source link

**WIP DO NOT MERGE IT**: Add automatic serialization support for boost::hana::Struct #55

Open vinipsmaker opened 3 years ago

vinipsmaker commented 3 years ago

So... as I was saying in a previous email exchange of ours, I'll need iarchive::reader() to return a non-const reference so I can use algorithms such as partial::skip() to skip over unwanted extraneous members for instance.

Users that specialize the types to have specific serialization for JSON might want just the same as well.

The code is fairly small and you could already take a look.

breese commented 3 years ago

It is possible with the current API to (1) get a copy of the iarchive reader, (2) use this copy to skip over input, and (3) update the iarchive reader with this copy.

Off the top of my head:

auto copy = ar.reader(); // Makes a copy
skip(copy);
ar.reader(std::move(copy));
vinipsmaker commented 3 years ago

Okay, I've removed the usage of const_cast and it does work: https://github.com/breese/trial.protocol/pull/55/commits/21ea7541fde0b839b3da7262b03549e0a3d81683

However I'm not convinced. What's the rationale for this forced copy here?

And yes, I know that I could have created a single copy at the beginning of load(), but that's still a copy.

breese commented 3 years ago

These function have not been well-considered. The iarchive::reader() getter was added because I needed to access the literal view to get an estimate of the size such that I could reserve space for it during serialization, and I did not want to make the iarchive interface too fat. The iarchive::reader(reader) setter was added for symmetry, without a real use case.

That said, I would feel uncomfortable about giving users direct access to modify the underlying reader. The user may believe he is working on his own copy and thus unintentionally leave the iarchive in an inconsistent state.

vinipsmaker commented 3 years ago

That said, I would feel uncomfortable about giving users direct access to modify the underlying reader. The user may believe he is working on his own copy and thus unintentionally leave the iarchive in an inconsistent state.

If you're worried about the user unintentionally modifying the internal reader, you can still keep separate getter/setter acessor methods. What do you think about changing iarchive::reader(reader)'s signature to:

template<class F>
void reader(F&& f)
{
    f(member.reader);
}

Sounds good enough?

breese commented 3 years ago

I have been thinking about this for a while, and would like to propose a more consistent solution.

Currently the serialization input archive owns the reader, and the output archive owns the writer. Maybe we should start thinking about the archives as "algorithms".

We are gravitating towards a composable solution where different algorithms (such as skip()) works on state (reader) and the user can then compose a solution by combining these algorithms. The serialization archives are just another way of traversing the JSON structure.

We could therefore make the reader a first-class citizen that represent the current state. The user owns this state and can apply different algorithms to it, including serialization.

breese commented 3 years ago

There is a rather easy solution for that. Change json::basic_iarchive<CharT> into json::basic_iarchive<Reader> and create a json::iarchive_view that sets Reader = json::reader& (notice the reference type).

I have some experimental changes in 454bf9b40fb57ed13861f9eacc8ef1d3fc0e4d37 on the featur/compose branch.

With this you can either let the archive own the reader which works like normal:

  json::iarchive archive(input);

or you can use an external reader:

  json::reader reader(input);
  json::iarchive_view(reader);

The above-mentioned patch also removes the json::iarchive::reader(reader) setter as this functionality will be replaced by the external reader.

vinipsmaker commented 3 years ago

On the idea of two archives, I've believed in the same direction in the past. However the experience with {save,load}_overloader makes me believe that a more aggressive approach is required. Here's a snippet from std::map<std::string, V> serialization:

template <typename CharT, typename T, typename Compare, typename MapAllocator>
struct save_overloader< json::basic_oarchive<CharT>,
                        typename std::map<std::string, T, Compare, MapAllocator> >
{
    static void save(json::basic_oarchive<CharT>& archive,
                     const std::map<std::string, T, Compare, MapAllocator>& data,
                     const unsigned int protocol_version)
    {
        archive.template save<json::token::begin_object>();
        for (typename std::map<std::string, T, Compare, MapAllocator>::const_iterator it = data.begin();
             it != data.end();
             ++it)
        {
            archive.save_override(it->first, protocol_version);
            archive.save_override(it->second, protocol_version);
        }
        archive.template save<json::token::end_object>();
    }
};

I wrote similar code for Hana. I don't agree with this approach. Boost.Serialization only defines S-exprs mappings. The correct code would be:

archive.save_override(it->first, protocol_version);
archive.template save<json::token::begin_array>();
archive.save_override(it->second, protocol_version);
archive.template save<json::token::end_array>();

We shouldn't really insert framing on structs having a load/save/serialize member function. We should insert framing by default (including construction/destruction of the archive) and skip them only if we know it's safe. I was thinking in the lines of...

archive.save_override(it->first, protocol_version);
if (!has_native_json_representation<T>::value)
    archive.template save<json::token::begin_array>();
archive.save_override(it->second, protocol_version);
if (!has_native_json_representation<T>::value)
    archive.template save<json::token::end_array>();

This road would not only be safer and led to actual proper support for Boost.Serialization (we still having missing support for versioning, object id, etc), but would also allow to fix #7.

However that alone doesn't help me with the scanf() PR (which is how I got into the journey for this PR we're discussing now). I still do need some kind of iarchive_view that (1) operates on an existing reader and (2) won't insert automatic framing on the most outer nesting level.

To make matters worse mixing has_native_json_representation<T> and Boost.Serialization still is dangerous. It's still a possibility to have a “sandwich” where non-native JSON representation triggers native JSON representation which in turn triggers non-native JSON representation (e.g. smart pointers that use Boost.Serialization object id to dedup objects). If I try to use them in the scanf() PR the code will silently work but introduce bugs at runtime. What if we get the initial idea you dropped:

We could therefore make the reader a first-class citizen that represent the current state. The user owns this state and can apply different algorithms to it, including serialization.

and pushed it a little further? As in... introduce a lightweight "serialization" layer not dependent on Boost.Serialization that deals with protocol layer. We could move the {save,load}_overloader machinery to this layer. In this layer it'd be safe to assume has_native_json_representation<T> all the time (thus making the explicit trait has_native_json_representation<T> unnecessary) and there would be no danger from the “sandwich” pattern I've laid earlier.

The Boost.Serialization glue could fallback to this layer but not the other way around (this being the reason why it's safely avoiding the sandwich pattern).

I could use the new layer (free from Boost.Serialization concerns) on the scanf() PR.