Backwards and forwards compatibility

jminor commented 2 years ago

Currently OTIO can read old .otio files that were serialized with older schema versions, because there is a built-in schema upgrade system. However, if an .otio file is written with a brand new schema revision, then software linked with an older version of OTIO will not be able to read those files. If you attempt this, a SCHEMA_VERSION_UNSUPPORTED error occurs.

In a one-way pipeline, this is manageable, as long as you are careful to upgrade the OTIO version in your software starting with the most downstream end of the pipeline, working your way back to the start. For example a pipeline which passes .otio files from: A to B to C is upgraded C, then B, then A.

However, one of OTIO's stated goals is to support round-trip pipelines. Even if a pipeline is only two pieces of software, A and B, they must be upgraded in lock step anytime a OTIO schema version change is introduced.

To solve this problem, we must do one of the following:

Allow modern OTIO software to write out old OTIO schema (controlled by a configuration option).
Allow old OTIO software to read modern OTIO schema (via some runtime plugin).
Modify our strategy for schema changes to ensure that revisions can still be read by old software (within some range of versions).
Never change the schema.

Are there other options? Can we look at other file formats for inspiration or guidance on this?

meshula commented 2 years ago

There is quite a body of work around schema migration for XML (c.f. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.174.6520&rep=rep1&type=pdf) Taking inspiration from what the XML gang has historically done ~~~

with xslt (for xml) one would set up a matcher for rev A, with a corresponding translation rule to rev B. so,

new = xform(old, A_B_migrate)

one could also set up a matcher for rev B, with a translation rule to A.

old = xform(new, B_A_migrate)

now, this transform is not transitively idempotent. i.e.

new != xform(xform(new, B_A), A_B)

so we introduce a further transform for elements that do not exist in A, such that the B_A migration rule moves the A-non-existent data into a B_A_migration metadata section.

finally, we modify the A_B migration to recognize the existence of B_A_migration metadata, and use it to fully reconstitute the B fields, and we achieve

new == xform(xform(new, B_A), A_B)

A strong candidate for the json equivalent of xslt is jq, which has a zero dependency C implementation. https://stedolan.github.io/jq/

jp is also nice, but the Golang dependency puts it out of scope for our purposes, where our current core language requirements are C/C++ and Python. https://github.com/go-jsonfile/jp/

reinecke commented 2 years ago

There are some versioning notions in the HLS and ISO BMFF (MP4) file formats.

For instance, HLS has a version tag which declares the maximum specification version used in the file. Later in the specification, it outlines what features are available for each specification version. It is implied that you should tag the minimum version that includes all the specification features used in that file.

The ISO BMFF (a.k.a. MP4 file format) specification discusses schema and versioning as well. At the beginning of the file is an ftyp "box" containing a list of "compatible brands" used within the file. From the spec:

The presence of a brand in the compatible_brands list of the ftyp box is a claim and a permission. It is a claim that the file conforms to all the requirements of that brand, and a permission to a reader implementing potentially only that brand to read the file.

The "box" types in ISO BMFF can be thought of as roughly equivalent to schema names in OTIO. Additionally, many of the "box" types also include a version field which would be roughly equivalent to our schema version. A "brand" specifies a collection of "box" types as well as valid versions of "boxes" used. The constraints of each "brand" are outlined in Annex E of the spec. Many of the "brands" are "supersets" of previous "brand" versions - support of the newer brand implies support of the older "brand".

The note included about box versioning is:

Boxes with an unrecognized version shall be ignored and skipped.

The net result is that the compatibility model in ISO BMFF is functionally similar to OTIO - there is no implied forward compatibility and any individual schema version that's unsupported is ignored.

These file formats are slightly different from OTIO in that the specification and implementations are very decoupled, so the ecosystem they exist in may be a bit more fragmented.

Some relevant rules/discussions we've had about schema version are:

Parsers should be tolerant of unexpected fields in the serialization.
Schema version shouldn't be incremented unless it would be a breaking change for parsers.
We should allow ad-hoc access to unsupported schema.
The implementation is encouraged to "upgrade" to the latest schema on write - this behavior is codified in both the implementation and the specification.

Points 1 and 2 imply that adding new fields to a schema is a non-breaking change. Hopefully, addition of fields accounts for most our schema changes and should be low-impact. We should consider a mechanism like in point 3 for making sure make "extra" fields accessible - they currently round-trip when read to memory and are re-written but aren't accessible.

The behaviors in points 1, 2, and 3 are very conducive to maximum compatibility. Point 4 tends to force upgrade any file "touched" by the implementation. A common theme when reading between the lines of ISO BMFF and HLS is that implementors of the specifications should write files in the "most compatible" way possible - that is, write files conforming to the lowest specification version possible that can express the data you need to express. We may consider something that is a bit of a riff on "Allow modern OTIO software to write out old OTIO schema (controlled by a configuration option)", but instead go for "Modern OTIO software writes newest schema necessary and allow force downgrade to old schema (controlled by a configuration option)".

For example, in Clip.2 we introduced the "multi-reference" where a clip can have a multitude of media references instead of a singular media reference. In reality, there will be many contexts where this feature won't be used - only a single reference will be included. By default at write clips with a single media reference with the key DEFAULT_MEDIA would actually be written as Clip.1 schema losslessly. When a clip does contain multiple references, then the default behavior would be to write a Clip.2. However, if a user tells the serializer to force Clip.1, then the serializer would write a Clip.1 with just the reference selected by active_media_reference_key and the additional references could either be ignored or optionally persisted in the metadata of the clip.

In that particular example, there is a graceful degradation in the fidelity of the representation, future schema updates may not go so well, however. Also, for this to be really manageable, we'd need to define something like the "brand" construct in ISO BMFF to allow users control over groupings of schema version - it's probably too much mental overhead for users to specify every specific schema version.

jminor commented 2 years ago

Here's a proof-of-concept schema downgrader script. Note that it does not use the opentimelineio module at all.

https://gist.github.com/jminor/4ab918836f1dba78789b29e0479931ca

jminor commented 2 years ago

Another side effect of our current strategy: if you open an old OTIO in otioview, you'll see the upgraded schema in the JSON sidebar. This can be very confusing, since otioview is logically a viewer, and the user (me without enough caffeine) might reasonably assume that it is showing you the contents of the file un-altered.

meshula commented 2 years ago

I like the downgrader, and that no fancy dependencies are required to pull it off. Since we can rev schemas piecemeal, to make a downgrader viable, do we need a schema to otio version table? Like

["0.15", "clip.v2"],
["1.2", "track.v2"],
["1.5", "clip.v3"]

so that if I am targeting otio 0.12, a migrater would know to downgrade track.v2 to the track.v1, or that if I find a clip.v3, it must be moved to clip.v2, and then repeat, to get to clip.v1?

Targeting 1.0 would mean one would only downgrade clips to v2, and tracks to v1

jminor commented 2 years ago

Yeah, bundling the set of schema versions that match each OTIO release is a good idea. That is similar to the "brand" concept that Eric mentioned above, I think.

@reinecke I like the idea of writing out the oldest schema needed to represent a specific file, but I worry that it would make the serialization code more complex. Would it also mean that you would get a mix of versions of the same schema type in the same file? Lots of Clip.1 and a few Clip.2 in one file might lead to more confusion.

A downgrader could be packaged as an adapter, a standalone script, or a "post-write" hook so that newer OTIO could effectively write out old schema.

I wonder also, if it could work as a "pre-read" hook so that it could be installed at runtime into software using older versions of OTIO after they ship.

reinecke commented 2 years ago

@ssteinbach: We should huddle with the USD team and ask about how they're thinking about this too.

apetrynet commented 2 years ago

Hi! Reading through this again makes me realize I've probably come to the same conclusions as you, but here are my thought anyway :)

The way I look at this is that if an application with an older OTIO needs to read a newer file, it is dependent on a more recent version of OTIO to do the downgrade. Either at write time through an adapter argument specifying the desired OTIO version or a dedicated tool. Since both approaches depend on getting a hold of an updated version of OTIO or tool, I think it would make sense to add up/downgrade functions in migration files for each schema shipped with OTIO. These files could live in their own folder under src.

src
└── migrations
    ├── Clip.2.migration
    ├── Clip.3.migration
    └── ImageSequenceReference.1.migration

(Inspired by SQLAlchemy's alembic) Each new version of a schema only provides functions for moving back and forth between its previous version. This way a migration tool can traverse its way to the desired version based on a table like @meshula describes. In cases where completely new schemas are introduced the migration functions could provide the closest representation like ImageSequenceReference.1 -> ExternalReference.1 or default to the unknown schema functionality.

Since we don’t keep old versions of schemas in the codebase it makes sense to manipulate the JSON objects directly as in @jminor’s example

In my opinion the code for migrating schemas should be a part of OTIO so it’s easy to write hooks etc. and we can either provide a new console application (otiomigrate ?) or bake this into otioconvert like @reinecke suggested.

ssteinbach commented 2 years ago

I have a proposal for this: https://gist.github.com/ssteinbach/36f477d32add838a0746fbc0346e7ec3

I'll be working on a branch to put a version of this out there for feedback.

ssteinbach commented 2 years ago

Being addressed in #1387

JeanChristopheMorinPerso commented 2 years ago

@ssteinbach in case you want this issue to be closed when your PR gets merged, you have to link the other way around. In the sense that it's in your PR description that you need to mark your PR as closing this issue, see https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue.

ssteinbach commented 2 years ago

Yup. Its going to be a little bit before that lands, I just wanted to leave a breadcrumb here for anyone interested.

ssteinbach commented 2 years ago

Closed by #1387

AcademySoftwareFoundation / OpenTimelineIO

Backwards and forwards compatibility #1295