USCiLab / cereal

A C++11 library for serialization
BSD 3-Clause "New" or "Revised" License
4.17k stars 750 forks source link

Optional Fields #30

Open karhu opened 10 years ago

karhu commented 10 years ago

I was wondering if support for optional serialization fields is considered/planned.

In my case I would use this feature when loading/writing settings files from/to json (or xml), where certain entries don't have to be present. But also in the scenario of binary serialization this could lead to memory savings.

Based on the boost_compat_new branch I was able to implement this for my own maybe type and the json archives. For this I had to add another variant of the templated save / load functions of the json-archive and type specific prologue/epilogue functions.

Is there any interest in officially supporting something similar?

AzothAmmo commented 10 years ago

Can you clarify what you mean by optional fields here? Do you mean supporting things like boost::optional (or the was to be std::optional)?

DrAWolf commented 10 years ago

I think it is just the following, which is strongly recommend to work:

e.g. a class is serializing X, Y, and Z the (text) archive only contains an entry for X. It should load (without a crash obviously.) Y and Z have/keep the values assigned by the default contructor of the class

Your unordered loading code for text archives will already handle that (or it will have to handle).

For binary archives it is thinkable but certainly non-standard.

karhu commented 10 years ago

Yes, my proposal is similar to what DrAWolf describes but might differ in some details, so let me give you some examples.

First about the type backing this functionality. boost::optional is definitely possible, but it might be preferable to solve this type independently via some trait, or via a cereal::optional_name_value_pair wrapper or similar (to avoid boost dependencies).

And now to the example. Let's assume we have the following struct:

struct ServerSettings {
   string map;
   int maxPlayers;
   optional<string> password;
}

Now let's look at scenarios of loading various json files.

{
   "map": "The world's end",
   "maxPlayers": 64,
}

This should successfully load the json file, but set the password field to the unset state.

{
   "maxPlayers": 64,
   "password": "abc123"
}

This should probably throw an exception (FieldNotPresentException("map") or similar).

Analogous to this an unset optional field should not be serialized to an json/xml output archive.

As DrAWolf mentioned this is mostly important for json/xml files and less so for binary files, but yes it's still thinkable.

DrAWolf commented 10 years ago

For values that are just 'plain' optional, my considerations above apply

for boost::optional one has to write al little helper file optional.hpp (EXPERIMENTAL!)

#include <cereal/cereal.hpp>
#include <boost/optional>

namespace cereal
{
  //! Serializing for boost::optional
  template <class Archive, class T> inline
  void save( Archive & ar, boost::optional<T> const& object )
  {
      if ( object )
      {
          ar( _CEREAL_NVP( "content", object.get() ) );
      }
  }

  template <class Archive, class T> inline
  void load( Archive & ar, boost::optional<T> & object )
  {
      T value;
      ar( _CEREAL_NVP( "content", value ) );
      object = value;
  }
} // namespace cereal

!!! This will crash in load() if the value is NOT in the file because in the current version no 'plain' optional values exist!

AzothAmmo commented 10 years ago

In the next release of cereal (currently available on develop branch), you will be able to load NVPs in any order you would like, and if the name is not present (and was provided via an NVP), an exception will be thrown.

I want to make sure I completely understand the desired functionality, so here is my interpretation:

If you had some field that you somehow denoted as optional, you would like to be able to ask cereal to load this field and then for some special behavior to occur if the field does not exist, such as some default value being applied to the type.

There are several ways something like this can be implemented. If you have an optional type, wrapping the values in that makes the most sense, and its serialization would look very similar to std::unique_ptr, in the sense that it would need to serialize a valid bit followed by the actual data. In the case when it was valid, this would create an overhead of one byte in the binary version and one field in the text versions, but you'd only pay the one byte when you had nothing to serialize.

Another solution is for us to provide a cereal::make_optional function that will wrap up your type in a thin layer that would act in a similar fashion to actual optional types, with the exception being that in your struct there would be nothing actually optional about your type, e.g.:

struct Hello
{
  int x;
  std::string y;

  template <class Archive>
  void serialize( Archive & ar )
  {
    ar( x, cereal::make_optional( y ) );
  }
};

This solution would have essentially the same implementation as the preceding one with one important difference: the type I serialize is not actually optional, meaning cereal has no way of knowing whether it is not valid for saving, so cereal would likely always try and save something. It also requires that the format of the thing to be loaded fits with the implementation details described above.

The important thing to remember here is that any general solution needs to work for both binary archives and the text archives, otherwise I'd say this is not a feature that would be a part of cereal and something you'd have to write on your own.

randvoorhies commented 10 years ago

What I dislike about the second option is that a user has no idea whether or not the field was actually present after serialization takes place. Optional-ness should a property of the data field, rather than of the serialization itself. For example, a good example of optionally serializable fields would be something like this:

struct camera_params
{
   optional<float> gain, exposure, brightness; 

   template <class Archive> 
   void serialize( Archive & ar )
   { ar( gain, exposure, brightness ); }
};

camera_params params;
cereal::XMLInputArchive ar( ... );
ar(params);

// Adjust the camera parameters if necessary, otherwise leave them as they were.
if(params.gain)       my_camera.set_gain(*params.gain) 
if(params.exposure)   my_camera.set_exposure(*params.exposure) 
if(params.brightness) my_camera.set_brightness(*params.brightness)

As an aside, ZeroC's Slice language has recently added optional values so we may want to take inspiration from their semantics: http://doc.zeroc.com/display/Ice/Optional+Values

karhu commented 10 years ago

I agree with randvoorhies, optional-ness should be a property of the type and his usage scenario exactly describes the functionality I am looking for.

AzothAmmo commented 10 years ago

So the only thing that we can do from cereal's perspective is to possibly add support for boost::optional, but writing support for your own optional implementation should be fairly easy using unique_ptr for inspiration. As an aside, there probably won't be any progress on issues for a few weeks, but we are planning on doing a sweep through everything to push out 1.0 at that point.

DrAWolf commented 10 years ago

From a 'good design' and 'clean code' point of view, I vote +1 for 'optional-ness' being a property of serialization, NOT class.

So, I would be very happy with cereal::make_optional( y ) - which I expect to work with BINARY archives, too.

@randvoorhies : This is a standard problem in serialization. Your example can be cleaned up considerably by dropping the boost::optional and the ifs. You just need an instance of camera_params that has the current (or default in other cases) parameters. Loading from an archive will then overwrite only the ones contained in the file. My guess: You have only been using optional, because you had no serialization that handles optional-ness!?

For keeping existing code, I would also vote +1 for the developers to support boost::optional and maybe other commonly used boost stuff (boost::any???) in a small boost_common.hpp file.

Devacor commented 10 years ago

I'd like to point out, as it was mentioned you wouldn't know if the value was loaded or not for cereal::make_optional. This may be true, but it also probably doesn't matter:

int x = 5; archive(cereal::make_optional(x)); //if x is not present in archive it remains 5. If it is present in archive, it is loaded.

You would need to have a separate save/load method so your serialize doesn't trash the value before saving, however.

This would be great for me too. Often my objects don't have a translateTo, or rotateTo, or rotateOrigin for example, but these types always get saved anyway:

                "translateTo": {
                    "x": 0,
                    "y": 0,
                    "z": 0
                },
                "rotateTo": {
                    "x": 0,
                    "y": 0,
                    "z": 0
                },
                "rotateOrigin": {
                    "x": 0,
                    "y": 0,
                    "z": 0
                },

That's an awful lot of json for stuff that will already be default constructed correctly anyway (I use load_and_allocate for this type, so I'm not loading into an existing constructed value.)

heinermann commented 10 years ago

I think an optional for an archive supporting NVPs makes sense, but ONLY in this case. No extra data should be involved with this feature. (I disagree with its use on binary archives and unnamed data)

Loading NVPs: Instead of handling exceptions, you use make_optional so that if the name isn't found, then the value is constructed based on the arguments given.

Possible ideas:

cereal::make_optional(x) // no change if not found (always serialize)
cereal::make_optional(x, y) // assigned specified value (y) if not found
cereal::make_optional(x, {}) // assigned default-constructed value if not found

Saving NVPs: The biggest issue might be determining what gets serialized. In some cases you may want serialization to occur even if a default value is being used. In other cases you might want to ignore serialization if the value is equal to some other value. (or perhaps I'm just thinking too much and not serializing a given default value is the way to go?)

Use case: @M2tM has provided a very good example. I think there may be an intuitive way of combining this into the single serialize() functionality though.

Again, I'm for the idea of being able to (de)serialize partials but having any additional overhead would be detrimental. For example, you wish to deserialize a document that you don't own or have any control over, then there is no knowledge of the overhead data that is being suggested and an exception would be thrown.

Just as out-of-order loading and NVPs aren't available for binary archives, make_optional could have its own restrictions as well.

benpye commented 10 years ago

Has any progress been made here, in my application I also require this functionality and thus far am yet to find a suitable library that meets the requirement. I have a class which has more members than necessarily required as it is a superset of several 3D primitives. This has to be such as it is passed to C++AMP. I would like to use JSON to allow the scene to be defined externally and requiring the user to set even the unused fields is awkward.

AzothAmmo commented 10 years ago

Other than this discussion, nothing else has been done about this issue yet. It'd be promoted to an actual enhancement once the optional mechanism has been settled on.

I kind of like the idea of making this only for archives that support NVPs, in which case we might name the feature something like make_optional_nvp. Otherwise the name suggests that this might work across all archives. This needs more thought in terms of the overall interface as well as determining whether this requires changes to archives or if it can be implemented on a new type level.

We could potentially have some kind of a mechanism like:

template <class T, class ... Args>
SomeType cereal::make_optional_nvp( const char * name, T && value, Args && ... args );

somewhere along the line we'd check to see if the name was found, and if not, we can do something like value = T(std::forward<Args>(args)...);

Also note that the everything I just typed up really only changes load behavior and doesn't affect saving. You'd have to implement your own way of detecting whether a feature was worth serializing or not.

heinermann commented 10 years ago

cereal::make_optional_nvp is also much cleaner, opposed to cereal::make_optional( cereal::make_nvp(...

Hacking in the feature would just be a wrapper for automagically dealing with the exceptions thrown. This type of behaviour needs to be avoided though.

I agree, the mechanism itself is still up in the air and should be well-established before progress on this issue has been made.

Devacor commented 10 years ago

"Also note that the everything I just typed up really only changes load behavior and doesn't affect saving. You'd have to implement your own way of detecting whether a feature was worth serializing or not." - AzothAmmo

Exactly. It would be trivial to expect the user to do this. This is also where the differing save/load methods comes in. It is necessary to have logic surrounding when a field is optionally saved.

The way I see this working interface-wise (if I were to solve my problem above with a bunch of 0 position items and scale 1):

        template <class Archive>
        void save(Archive & archive){
            if(!translateTo.atOrigin()){
                archive(CEREAL_NVP(translateTo));
            }
            if(!rotateTo.atOrigin()){
                archive(CEREAL_NVP(rotateTo));
            }
            if(!rotateOrigin.atOrigin()){
                archive(CEREAL_NVP(rotateOrigin));
            }
            if(scaleTo != Scale<double>(1.0, 1.0, 1.0)){
                archive(CEREAL_NVP(scaleTo));
            }
            archive(
                CEREAL_NVP(depthOverride), CEREAL_NVP(overrideDepthValue),
                CEREAL_NVP(isVisible),
                CEREAL_NVP(drawType), CEREAL_NVP(drawSorted),
                CEREAL_NVP(points),
                CEREAL_NVP(drawList),
                cereal::make_nvp("texture", ourTexture)
            );
        }

        template <class Archive>
        static void load_and_construct(Archive & archive, cereal::construct<Node> &construct){
            Draw2D* renderer = nullptr;
            archive.extract(cereal::make_nvp("renderer", renderer));
            construct(renderer);
            archive(
                cereal::make_optional_nvp("translateTo", construct->translateTo),
                cereal::make_optional_nvp("rotateTo", construct->rotateTo),
                cereal::make_optional_nvp("rotateOrigin", construct->rotateOrigin),
                cereal::make_optional_nvp("scaleTo", construct->scaleTo, 1.0, 1.0, 1.0),
                cereal::make_nvp("depthOverride", construct->depthOverride),
                cereal::make_nvp("overrideDepthValue", construct->overrideDepthValue),
                cereal::make_nvp("isVisible", construct->isVisible),
                cereal::make_nvp("drawType", construct->drawType),
                cereal::make_nvp("drawSorted", construct->drawSorted),
                cereal::make_nvp("points", construct->points),
                cereal::make_nvp("drawList", construct->drawList),
                cereal::make_nvp("texture", construct->ourTexture)
            );

            for(auto &drawItem : construct->drawList){
                drawItem.second->parent(construct.ptr());
            }
        }

I like the parameter list forwarded to the constructor as well, it is a nice touch. When you do this, however, it becomes impossible to reconstruct state to something not entirely available to a constructor. If you have the optional load simply not do anything to the value then you could construct the default ahead of time, call additional methods, or set the state from some factory method and if there is no serialized value to load from, it could just keep that.

By supplying the parameters in cereal_optional_nvp and constructing the object you have no way of knowing if it was loaded or not.

What you could do, as an advanced use case (and I don't anticipate this being frequently needed) is flagging in the cereal_optional_nvp if the value has been loaded from an archive. In this way you could do something like:

        auto oNVP = cereal_optional_nvp("key", value, someParameter);
        archive(oNVP);
        if(!oNVP.loaded()){ //additional default state initialization
            value.customPostLoadStepForValue();
            value.z = 5; //we could also override the -> operator for oNVP to return the object.
        }

This keeps that cool forwarding convenience/not constructing a default needlessly ahead of time, and also gives you the same flexibility.

Devacor commented 10 years ago

It looks like cereal_optional_nvp could also be handy in situations where a field has been renamed between versions. This would require the ability to either check .loaded(), or to not construct the object if it doesn't exist (instead leave that up to the user to initialize the value).

gramic commented 10 years ago

I did implement a new archive for BSON serialization to use with Mongodb. It all works well for now and the only missing feature is the optional serialization. From this discussion it is not clear what is the best way to work around this.

Catching the exception thrown when the field is missing doesn't seem good enough, but this looks to be the best way for now. For example, in the "Protocol buffers" documentation even suggest that everything should be optional by default and "required" only as exception. Cereal demands the opposite and even without the optional part at all.

Any better work around suggestion for this feature?

AzothAmmo commented 10 years ago

There's no current better workaround. The cereal::make_optional_nvp mechanism will probably be implemented in an upcoming release, but it hasn't been thought about at all yet.

aseering commented 8 years ago

The discussion above seems to imply a straightforward implementation -- implement a load() method that catches and appropriately handles the exception that's thrown in case the field is missing. Is that what y'all're thinking of doing? Or did you have deeper changes in mind?

heinermann commented 8 years ago

The problem with catching exceptions is that the performance will be significantly impacted. And when I write significantly, I'm not exaggerating.

aseering commented 8 years ago

Indeed...  Is there a cheaper way to intercept this failure currently in Cereal? On Mar 31, 2016 10:03 PM, Adam Heinermann notifications@github.com wrote:The problem with catching exceptions is that the performance will be significantly impacted. And when I write significantly, I'm not exaggerating.

—You are receiving this because you are subscribed to this thread.Reply to this email directly or view it on GitHub

Type1J commented 8 years ago

The experience from Protocol Buffer, Cap'n Proto, etc. explicitly states that treating missing fields as an error is a bad thing.

However, the other side of that argument is that state that is expected to be modified on load will not be, and instead would retain its previous value. However, if the object always resets its state on load to some default state, then it wouldn't be a problem.

I'd say that it's easier to work around problems caused by having all fields optional than working around all fields being required. Because it's easier, the exception throw on not finding a field to which a deserialized value can bind should be removed, and Cereal should increment its version number.

@gramic I've been looking for a BSON Cereal archive. Have you released the BSON Cereal archive that you talked about above? If so, where could I get it?

gramic commented 8 years ago

@Type1J I didn't open source BSON Cereal archive, mainly because I use very old versions of both BSON and Cereal libraries. Well, also some of the code is not efficient as much as it should be to show off.

Recently, MongoDB did create a very modern c++11 libraries, which will be perfect for use in a new modern version of BSON Cereal archive that I intent to use and then open source.

About the missing optional field in Cereal, my current use is to require every field to be present. That forces me to update all document fields in MongoDB before using the newly added field via the BSON Cereal archive. A bit inconvenient, but it works.

aseering commented 8 years ago

@Type1J for what it's worth, I've worked with a library where missing fields are not an error, and that can be a bad thing too :-) I think both extremes have significant problems.

I wouldn't mind if, for performance reasons, there were a return code rather than an exception. Or maybe a callback function that could compute a value or do something else.

Type1J commented 8 years ago

@gramic I'll be interested in that archive when you get around to it.

@aseering I've always been bitten by required fields in evolving protocols. Optional fields can be handled manually, but rejecting something because it had a field that might actually be insignificant is probably bad, and the class should probably decide if it really is bad or not. Throwing an exception because something wasn't found is probably acceptable, but even then, it should be optional if the class says "Call this function with each field that wasn't found in the deserialized object" by registering a callback.

Since each use of Cereal will likely have a different set of opinions, I'd say that each object should probably register its own personal opinion of what will be, and what will not be accepted as a serialized object.

Enhex commented 8 years ago

The exception solution is brittle, and doesn't work in some cases. example of a bad case

Another solution that is exception-free and can handle the example I gave is to loop over the existing node names:

load(/*...*/)
{
    while (true)
    {
        const auto namePtr = archive.getNodeName();

        if (!namePtr)
            break;

        const auto name = std::string(namePtr);

        if(name == /*...*/)
            archive(cereal::make_nvp(name, /*...*/));
        else if(name == /*...*/)
            archive(cereal::make_nvp(name, /*...*/));
        //...
    }
}
rcdailey commented 8 years ago

Wow I'm surprised this issue has been unresolved for 3 years. This is a major setback for me. I started using cereal for HTTP response body parsing and some response properties (in JSON) that I get back may not be present.

In my humble opinion, if the field is missing all cereal has to do is quietly return without mutating the variable passed into make_nvp(). In this case, whatever value that member had when the object was constructed will be what is used by consumers of the serialized object.

Note this is for loading serialized data (JSON input archive).

I feel it's the responsibility of the serializable object to handle fields that may potentially be optional (empty). Forcing the user to use pointers or boost::optional to represent this is flawed and unnecessary. Using exceptions is fundamentally wrong because this is not an "exceptional" situation. Certain contracts make this a perfectly acceptable and expected schema (that is, missing fields). I don't even need to go over the performance implications, as plenty of others have pointed that out.

So far I like the CEREAL_OPTIONAL_NVP and cereal::make_optional_nvp() ideas a lot. However, the underlying implementation should avoid throwing exceptions when this particular function is used.

erichkeane commented 8 years ago

I did an optional value at one point like this (where "v" is the value):

try
    {
        ar(v);
    }
    catch(cereal::Exception&)
    {
        ar.setNextName(nullptr);
        // Loading a key that doesn't exist results in an exception
        // Since "Not Found" is a valid condition for us, we swallow
        // this exception and the archive will not load anything
}
joshhaines commented 7 years ago

@erichkeane This is exactly what I was looking for in my case. We have a bunch of optional values we are try/catching in a vector of structs. I noticed if the last object failed deserializing in the load block that it would just not even bother deserializing the rest of the structs in the vector. Clearing nextName with ar.setNextName(nullptr) was exactly what I was looking for. Thanks!

Enhex commented 7 years ago

I wrote a header that provides an elegant solution for optional NVPs: https://github.com/Enhex/Cereal-Optional-NVP

LowCostCustoms commented 7 years ago

Here I wrote some methods to make work with optional values easier. https://github.com/LowCostCustoms/cereal-optional

Dysl3xik commented 6 years ago

@AzothAmmo If you guy are willing to add a simple function to the XML && JSON archives this problem becomes really simple.

As a public function in XMLInputArchive (mostly repeat code from Start Node but WITHOUT throwing):

//! Check if node with given name exists
inline bool hasName(const char * searchName)
{
    auto next = itsNodes.top().child; 
    if (searchName && (next == nullptr || std::strcmp(next->name(), searchName) != 0))
    {
        next = itsNodes.top().search(searchName);
    }
    return next != nullptr;
}

As a public function in JSONInputArchive (most repeat code from search but WITHOUT throwing!):

inline bool hasName(const char* name)
{
    // The name an NVP provided with setNextName()
    if (name)
    {
        // The actual name of the current node
        auto const actualName = itsIteratorStack.back().name();

        // Do a search if we don't see a name coming up, or if the names don't match
        if (!actualName || std::strcmp(name, actualName) != 0)
            return itsIteratorStack.back().hasName(name); //YOU NEED TO COPY SEARCH to hasName and remove the throw bits
            return true;
    }
}

Finally the optional NVP similar to @Enhex but now this works out of order Also I used C++17 here but you can convert it to template meta code without much hassle:

namespace cereal
{
    /// <summary>
    /// Save/Load an NVP if its name is located inside the text archive
    /// Returns TRUE if the value is loaded or saved
    /// </summary>
    template <typename Archive, typename T>
    bool make_optional_nvp(Archive& ar, const char* name, T&& value)
    {
        constexpr bool isTextArchive= traits::is_text_archive<Archive>::value;
        constexpr bool isInputArchive = std::is_base_of_v<InputArchive<Archive>, Archive>;

        //Do check for node here, if not available then bail!
        if constexpr  (isTextArchive&& isInputArchive)
        {
            if (!ar.hasName(name))
                return false;
        }

        ar(make_nvp(name, std::forward<T>(value)));

        return true;
    }

}

#define CEREAL_OPTIONAL_NVP(ar, T) ::cereal::make_optional_nvp(ar, #T, T)
Dysl3xik commented 6 years ago

Also if you are interested in seeing this more fully fleshed out I can fork and share - but this snippet covers most of it really.

avibahra commented 6 years ago

I need this functionality. But are we going to see this in any release soon ? Also is there any c++11 solution for this.

tankorsmash commented 5 years ago

I'd love to see an implementation of this @Dysl3xik, having non-required fields in cereal would be great.