AcademySoftwareFoundation / OpenTimelineIO

Open Source API and interchange format for editorial timeline information.
http://opentimeline.io
Apache License 2.0
1.44k stars 279 forks source link

`null` semantics in JSON serialization #726

Open palemieux opened 4 years ago

palemieux commented 4 years ago

In the JSON serialization, is there an intended difference in semantics between:

As a corollary, can all optional properties be set to null, or only some of them?

palemieux commented 4 years ago

For example, what are the semantics of "name" : null at https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/master/tests/sample_data/clip_example.otio ?

Shouldn't name simply be absent?

meshula commented 4 years ago

I agree with the upvoted comment: https://stackoverflow.com/questions/9619852/what-is-the-convention-in-json-for-empty-vs-null -

paraphrasing that discussion: Arrays have an empty convention, ie [] is legitimate, so it's considered good form to not have to check an array for null before checking if it empty.

The other types do not have an empty form, so checking for null is considered idiomatic.

meshula commented 4 years ago

I believe the OTIO stance is that unauthored, required, fields must be marked with null or [] as appropriate, and unauthored optional fields can be omitted.

palemieux commented 4 years ago

I believe the OTIO stance is that unauthored, required, fields must be marked with null or [] as appropriate

Looking at https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/eaccea3b61455705286ede0de239dd84406352bb/src/opentimelineio/item.h#L23 , name is not optional but cannot also be null since the it is stored as a non-optional value on the stack.

Is that the correct understanding?

Is clip_example.otio valid since "name" : null?

jminor commented 4 years ago

Our original thinking was as follows:

  1. When writing an OTIO, the implementation writes all properties of each object even if their value is null, empty, or default.

    • This increases ease of learning, since you can trivially look at a file to see what properties are available, rather than needing to refer to documentation.
    • This also protects older archived files from accidentally changing semantics when a new/different default value is added to a type of object. In principle this should only happen when the schema is versioned, but in practice it has happened due to temporary bugs in the implementation. Having this protection in place was helpful.
  2. When reading an OTIO, the implementation treats any missing properties as though they had the default value.

    • This allows folks who wrote OTIO files with tech other than our implementation or who hand-edited files, to get a reasonable behavior in practice.

We have heard feedback from several people that having these empty, or default values in the OTIO files is a waste of space. Being permissive when reading files was a concession to that feedback. Our stance has also been that if storage space of these files is an issue, then they can be easily gzipped instead of making them harder to learn & work with.

For the specification, we could tighten up these restrictions to help with conformance, but we should consider the practical use cases carefully, since a core design goal of OTIO is that it should help you get work done in a production pipeline.

palemieux commented 4 years ago

When reading an OTIO, the implementation treats any missing properties as though they had the default value.

Are properties set to null in JSON handled the same as properties that are missing, i.e. the default value is read?

Doesn't this also mean that, for property of type "string", the empty string is equivalent to both (i) null and (ii) absence of the property?

davidbaraff commented 4 years ago

From some notes on the C++ implementation I wrote:

Next, for any of the above atomic types T, excepting for SerializableObject, an any can store a type of optional. (Supporting serialization of an optional<SerializableObject> would be ambiguous and unneeded; putting a null pointer of type SerializableObject* in an any, is written as a null to the JSON file.)

From the header file serializableObject.h:

bool read(std::string const& key, optional<bool>* dest);
    bool read(std::string const& key, optional<int>* dest);
    bool read(std::string const& key, optional<double>* dest);
    bool read(std::string const& key, optional<RationalTime>* dest);
    bool read(std::string const& key, optional<TimeRange>* dest);
    bool read(std::string const& key, optional<TimeTransform>* dest);
    // skipping std::string because we translate null into the empty
    // string, so the conversion is somewhat ambiguous

    // no other optionals are allowed:
    template <typename T>
    bool read(std::string const& key, optional<T>* dest) = delete;

On Jun 4, 2020, at 2:14 PM, Pierre-Anthony Lemieux notifications@github.com wrote:

When reading an OTIO, the implementation treats any missing properties as though they had the default value.

Are properties set to null in JSON handled the same as properties that are missing, i.e. the default value is read?

Doesn't this also mean that, for property of type "string", the empty string is equivalent to both (i) null and (ii) absence of the property?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-639119725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJPF3OVYATENFOUEQHLRVAFCVANCNFSM4NSUJYEA.

jminor commented 4 years ago

Out of curiosity, I checked the clip_example.otio that you linked above, @palemieux, to see what the current implementation does. That file was written back in our older pure-python code base. We can see some real world examples of what happens when the code changes, but a file does not.

Running that file through otiocat causes it to change in these notable ways:

  1. The keys are no longer sorted alphabetically. This is a regression, since we intended them to be sorted to facilitate visual comparison.
  2. The top-level Timeline object gains a "global_start_time": null property. This is an example of a missing property turning into the default value.
  3. Clip-001's media_reference's name changes from null to an empty string "". If I recall correctly, this was changed in the C++ port to reduce the number of optionals.
  4. The RationalTime value and rate properties are written as floats 1.0 instead of integers 1. This was an intentional change during the C++ port. The pure-python implementation relied on Python's runtime type system in unexpected ways, and we decided to enforce this more strictly.

Each of these probably warrants a follow up discussion.

davidbaraff commented 4 years ago

On Jun 4, 2020, at 4:36 PM, Joshua Minor notifications@github.com wrote:

Out of curiosity, I checked the clip_example.otio that you linked above, @palemieux https://github.com/palemieux, to see what the current implementation does. That file was written back in our older pure-python code base. We can see some real world examples of what happens when the code changes, but a file does not.

Running that file through otiocat causes it to change in these notable ways:

The keys are no longer sorted alphabetically. This is a regression, since we intended them to be sorted to facilitate visual comparison. From the documentatin:

Note

Properties are written to the JSON file in the order they are written to from within write_to(). But the reading code need not be in the same order, and changes in the ordering of either the reading or writing code will not break compatability with previously written JSON files.

This pertains to properties written programatically (i.e. where there is a well-defined field in the C++ file itself, with a call to write_to). In the case of arbitrary metadata that the C++ implementation doesn’t know about (“blind data”) the keys should be written in alphabetic order, because we don’t want arbitrary changes to mess up baseline comparison tests.

The top-level Timeline object gains a "global_start_time": null property. This is an example of a missing property turning into the default value. Clip-001's media_reference's name changes from null to an empty string "". If I recall correctly, this was changed in the C++ port to reduce the number of optionals. yes. a null for a string vs the empty string is asking for pain. as i posted previously, we don’t support optional strings. so a string is either empty or it has a value. it should never be written as null (so only an old file can have null). The RationalTime value and rate properties are written as floats 1.0 instead of integers 1. This was an intentional change during the C++ port. The pure-python implementation relied on Python's runtime type system in unexpected ways, and we decided to enforce this more strictly. Each of these probably warrants a follow up discussion.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-639172029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJMKKPUU4GCOOBTBDCDRVAVZJANCNFSM4NSUJYEA.

jminor commented 4 years ago

Thanks for the clarifications @davidbaraff. Did this answer your questions @palemieux ?

If so, then we can now go two directions:

palemieux commented 4 years ago

To summarize, only properties of the following types can be null in the file:

In the long run, is the objective that a file that conform to version a of OTIO always conform to version b of OTIO, where b > a?

davidbaraff commented 4 years ago

optional is not an allowed type.

On Jun 5, 2020, at 5:23 PM, Pierre-Anthony Lemieux notifications@github.com wrote:

To summarize, only properties of the following types can be null in the file:

bool int double RationalTime TimeRange TimeTransform string In the long run, is the objective that a file that conform to version a of OTIO always conform to version b of OTIO, where b > a?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-639913840, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJNRXR6JIIPK7YOQYKTRVGD7DANCNFSM4NSUJYEA.

palemieux commented 4 years ago

optional is not an allowed type.

@davidbaraff SerializableObject::Reader::read(std::string const& key, optional<std::string> value) is not defined but SerializableObject::Reader::read(std::string const& key, std::string* value) does not fail upon reading null and instead returns the empty string -- see https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/0a35894d72b14c7cb8373a00f0dfa594dc493751/src/opentimelineio/deserialization.cpp#L487 .

... so string properties can be null in the JSON serialization, right?

davidbaraff commented 4 years ago

Sorry, my answer was a bit on the terse side, but more confusingly, I believe what I typed in was somehow mangled somewhere. What I typed was “optional is not an allowed type". in case it gets mangled again, i am typing

"optional (less-than-symbol) string (greater than symbol) is not an allowed type"

What I meant is that going forward, we don’t want to ever write a null into the json to indicate an empty string, in that we don’t want C++ code doing the “is it a null value or an empty string” dance.

When we were developing the C++ code base, we had a lot of existing OTIO files produced by the Python API where names were indeed represented as None. This isn’t a big problem in python, where one can write:

if not name:
    # passes if name is the empty string or None

But for C++ this would be an issue, and we didn’t want to inflict the pain of getting this kind of test wrong. So going forward, I said that we should agree to take in the null value if we saw it, when reading a string, but treat it as if we had read in the empty string. By disallowing optional-of-string as an allowed type, it made the code never have to decide if when a null was encountered it should encode that as an empty string or an optional string (there’s a lot of data that goes into the C++ code as blind data, i.e. in the C++ any type) — since there is no optional string type, what to do is clear.

  1. The clear implication is that any files written by C++ will never have null for a string value. they will explicitly have the empty string, and slowly over time, null for a string value in otio files will fade away. and vanish forever.

except for test cases i guess.

On Jun 5, 2020, at 8:44 PM, Pierre-Anthony Lemieux notifications@github.com wrote:

optional is not an allowed type.

@davidbaraff https://github.com/davidbaraff SerializableObject::Reader::read(std::string const& key, optional value) is not defined but SerializableObject::Reader::read(std::string const& key, std::string* value) does not fail upon reading null and instead returns the empty string -- see https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/0a35894d72b14c7cb8373a00f0dfa594dc493751/src/opentimelineio/deserialization.cpp#L487 https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/0a35894d72b14c7cb8373a00f0dfa594dc493751/src/opentimelineio/deserialization.cpp#L487 .

... so string properties can be null in the JSON serialization, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-639972379, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJMP5MHBDLXS432VKCLRVG3RBANCNFSM4NSUJYEA.

palemieux commented 4 years ago

@davidbaraff Thanks for the background.

null for a string value in otio files will fade away. and vanish forever.

Understood. In the meantime, it is not an error for an OTIO file to contain string types set to null, i.e. such a file must not be rejected upon ingest, right?

davidbaraff commented 4 years ago

On Jun 5, 2020, at 9:25 PM, Pierre-Anthony Lemieux notifications@github.com wrote:

@davidbaraff https://github.com/davidbaraff Thanks for the background.

null for a string value in otio files will fade away. and vanish forever.

Understood. In the meantime, it is not an error for an OTIO file to contain string types set to null, i.e. such a file must not be rejected upon ingest, right?

Correct, given the need to deal with legacy files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-639978847, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJMWHHIYHF4DUQI4A7DRVHALJANCNFSM4NSUJYEA.

palemieux commented 4 years ago

It looks like no change to the code is required, aside from perhaps noting that null strings are permitted even though there are no optional strings.

palemieux commented 4 years ago

@davidbaraff Quick follow-up: looking at serializableObjectWithMetadata.h, the metadata property of SerializableObjectWithMetadata is never null but can be equal to an empty object. Is that an artifact of the C++ representation, or is this intrinsic to the OTIO data model?

davidbaraff commented 4 years ago

I don’t know how to answer that. Since the metadata is of type AnyDictionary, when you serialize it, if empty you’ll get an empty dictionary, which is valid json, and distinct from null. Is that a C++ artifact? you tell me.

On Jun 10, 2020, at 2:22 PM, Pierre-Anthony Lemieux notifications@github.com wrote:

@davidbaraff https://github.com/davidbaraff Quick follow-up: looking at serializableObjectWithMetadata.h, the metadata property of SerializableObjectWithMetadata is never null but can be equal to an empty object. Is that an artifact of the C++ representation, or is this intrinsic to the OTIO data model?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-642274602, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJKBHITU7JNUZBL3USLRV72QFANCNFSM4NSUJYEA.

davidbaraff commented 4 years ago

I would note that the reading routine in the .cpp file does “read_if_present” meaning that you can omit from the otio json file an empty dictionary for the metadata, but I believe that is to allow backward compatibility for old files. Certainly new files with no metadata get an empty dictionary written out.

On Jun 10, 2020, at 2:24 PM, David Baraff davidbaraff@icloud.com wrote:

I don’t know how to answer that. Since the metadata is of type AnyDictionary, when you serialize it, if empty you’ll get an empty dictionary, which is valid json, and distinct from null. Is that a C++ artifact? you tell me.

On Jun 10, 2020, at 2:22 PM, Pierre-Anthony Lemieux <notifications@github.com mailto:notifications@github.com> wrote:

@davidbaraff https://github.com/davidbaraff Quick follow-up: looking at serializableObjectWithMetadata.h, the metadata property of SerializableObjectWithMetadata is never null but can be equal to an empty object. Is that an artifact of the C++ representation, or is this intrinsic to the OTIO data model?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenTimelineIO/issues/726#issuecomment-642274602, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAWJJKBHITU7JNUZBL3USLRV72QFANCNFSM4NSUJYEA.

palemieux commented 4 years ago

@davidbaraff It sounds like (i) an absent metadata property is a (legacy) artifact of JSON serialization and (ii) the metadata property is always present and always equal to a (possibly empty) JSON object.

[edit: removed the possibility of metadata property being equal to null per the discussion above.]

palemieux commented 4 years ago

In the following example , media_reference is set to null. Is that still expected in current files?

https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/afc69b96c7908b29c2334994e6c4404073d95fc7/tests/test_stack_algo.py#L51