clamsproject / mmif

MultiMedia Interchange Format
Apache License 2.0
5 stars 1 forks source link

specification versions #12

Closed keighrim closed 4 years ago

keighrim commented 4 years ago

In essence, MMIF specification has three versioned components,

  1. JSON schema; that defines syntactic elements in MMIF
  2. LinkedData context; that defines shortcuts for URIs
  3. Type hierarchy (vocabulary); that defines concepts and their ontological relations

Currently the MMIF draft (written as 1.0) describes overall relations between these components as well as giving some details on syntactic structure of the MMIF. In doing so, the document refers not-yet-existing vocabulary and schema as 1.0, but before we concretely define them, I think we need to first decide how to version these different elements, as well as version the MMIF as an overarching entity. In lappsgrid and LIF, we used semantic versioning, although there has been only a small number of versioned changes lif specification. The synchronization of versions of sub-components were never vigorously confined or discussed.

I'm proposing we use sementic versioning, starting from 0.1.0 (as draft) synchronized over all three MMIF elements plus clams-python-sdk as that serialization module uses the same version number as underlying data model.

marcverhagen commented 4 years ago

I think you are saying that there are actually four things: schema, context, vocabulary and then MMIF. And I think you propose that a particular version of MMIF is tied to particular versions of the others. Not sure I totally agree here in that we may want to allow use of old vocabulary with new MMIF syntax, but this sure warrants discussion.

Yes, I can live with the full semantic version as long as we can define what the three digits all mean, I was using a simpler semantic scheme because I did not think through what major versus minor versus patch level would mean here. When suggesting using 0.1.0 for a first version I assume you mean that our first stable version that we publish would be 1.0.0. I used 1.0 for existing and non-existing contexts for everything so far just to have an initial version, but really for now it is still more like a SNAPSHOT version.

Another thing on versioning. I was thinking to use directories to do the versioning so all versions are available at all time and you do not need to do a check out to go to an old version. But we probably also want to come up with a system for tagging the releases with tags like "mmif-0.1.0 and "vocabulary-0.1.0".

Each versioned component should be in its own directory and that directory should have a VERSION.md file.

Finally, I have not fully thought through where things should live. Given that I propose links to URLs inside http://mmif.clams.ai these things should live in the docs directory, but it would be nice to maintain the data outside of docs and export/publish them to docs. And maybe have an additional mechanism to put up the SNAPSHOT like things, but that may require pushing to the master branch, which I am not in favour of.

keighrim commented 4 years ago

In the long run, I think MMIF must be definition of a set of abstract APIs that relies on other three technical specification, and the implementation of the APIs should be the clams-python-sdk::serialize (or any other SDK for other language maybe). In that sense, we can say something like

MMIF 1.0.0 is made up with schema 1.0.0, context 1.0.0, and vocab 1.0.0. (all versions are synchronized)

or

MMIF 1.1.0 is based on schema 1.1.0, context 1.0.1, and vocab 1.2.0. (if versions are not tied to each other).


About the SDK, I think we eventually need to publish the SDK to pypi or other common library repositories. To that end, we might want to separate serialize module as an independent library (+ all mmif related code into e.g. mmif-python) that clams-sdk then depends on.


On publishing multiple versions, we can use git tags and a builder software that iterate all tags and generate individual html files and merge to master to be published to the website. In that way, no one must manually merge or push to master and having SNAPSHOT-like mechanism by re-tagging newer version of SNAPSHOT manually.

marcverhagen commented 4 years ago

I am starting to think that having one version to rule them all would be the best, but I am still not totally convinced that there are no unwanted consequences.

When we have some version of MMIF, say 1.2.4, then specifications, schema, context and vocabulary are all part of that version. And that makes sense. Some parts, like the vocabulary and the schema, don't really talk about the same things, which is fine, yet others are intimately connected (schema and specifications). The context will refer to a vocabulary of the same version. So if the context is in http://miff.clams.ai/1.2.4/context/miff-context.json (or whatever path we come up with) then its vocabulary line will read

"@vocab" : "http://miff.clams.ai/1.2.4/vocab"

And with that line if you have Segment in the @type property then it will be expanded to the right version (http://miff.clams.ai/1.2.4/vocab/Segment). If the tool produces a full URL then it needs to make sure it is the right one.

The one issue I am toying with is what happens when you have a tool that adheres to the new schema and specification, but that for some reason wants to refer to an older version of the vocabulary, maybe because the tool uses a annotation type that is not in the new vocabulary anymore.

Not sure whether this will ever occur or even whether we should allow that, but since vocabulary and schema are somewhat disconnected the tool could just put a full URL in the @type property with a different version. This will however give rise to an expanded JSON file where annotation types and properties of annotation types are out of sync, for example, you could have an annotation that looks like

{
    "@type": "http://miff.clams.ai/1.1.0/vocab/Segment",
    "http://miff.clams.ai/1.2.4/mmif/properties": {
        "http://miff.clams.ai/1.2.4/vocab/Segment#segmentType": "bars-and-tones"
    }
}

The tool wanted the 1.1.0 version of Segment but the context expanded segmentType to the property in the current vocabulary.

keighrim commented 4 years ago

I guess, removing items in the reserved vocabulary must be extremely prudent decisions, but as we experienced in lapps vocab, we can't really disallow deprecation. However, we can at least officially disallow using mix of different versions of specs (or even different versions of vocab in a single MMIF output), by simply not supporting the behavior in the API implementation (SDK). If one really wants to use old version of specific concept, then she/he can always hack around the API and hard-code unsupported version string in the output, but I don't think there's anything we can do in such a situation. We can recommend one to use old version of entire specification set to stay old definition of certain vocab items, and hope for the best....

keighrim commented 4 years ago

serialze+vocab modules from the prototype python sdk is now separated (https://github.com/clamsproject/clams-python/issues/11, and https://github.com/clamsproject/mmif/pull/18) and both clams sdk (clams-python) and mmif sdk (mmif-python) are now published at pypi under the name of clamsproject. Currently both are using 0.0.x to indicate their pre-release stage, but as we used 0.1.x for current draft of the MMIF specs, they soon will also be bumped to 0.1.0 with implementation of draft 0.1.0.

Version numbers and their meanings are drafted out in #14 , and I think everyone agreed on that suggestion, so that should also be integrated into official documentation.

keighrim commented 4 years ago

done via #35