clamsproject / mmif

MultiMedia Interchange Format
Apache License 2.0
5 stars 1 forks source link

document ID in view metadata #152

Closed keighrim closed 3 years ago

keighrim commented 3 years ago

(also see #9)


We need a place to store the ID of a document that a view is based on (where all its annotations are anchored) - and another question related to this: what if a view processes two or more documents? By the way, current JSON schema (0.2.2) does not accommodate such a field anywhere in view or viewMetadata.

marcverhagen commented 3 years ago

We have this for the view metadata (as written in https://mmif.clams.ai/0.2.2/):

{
  "app": "http://apps.clams.ai/bars-and-tones/1.0.5",
  "timestamp": "2020-05-27T12:23:45",
  "contains": {
    "http://mmif.clams.ai/0.2.2/vocabulary/TimeFrame": {
      "unit": "seconds",
      "document": "m1" } }
}

We cannot accommodate representing a document identifier if new annotations can refer to two documents. We then need to not use "document" in the view metadata and store it on each individual annotation or use one document as a default and put that one in the metadata.

I was just looking at the schema and the metadata for the view is just an object, where it could use "viewMetadata" which is not used. Even with the latter, the value of contains would be an object and we would not be able to specify "document" in the schema, but I am not quite sure of that.

keighrim commented 3 years ago

Aha, now I understand that currently IDs of documents on which annotations are based are recorded in two places; contains metadata in viewMetadata and individual annotation objects. (this implementation of get_views_for_document (#129) also relies on those two places) So in conclusion, this issue doesn't exist and we don't need to take an action (except for fixing schema to use viewMetadata object for view metadata).

keighrim commented 3 years ago

Should we consider specifying contains in the json schema?

marcverhagen commented 3 years ago

So in conclusion, this issue doesn't exist and we don't need to take an action

I agree that this isn't a big issue. However, it would be good to point out in the informal specifications what happens in such cases.

By the way, in my mind there is a third way of specifying the document an annotation refers to: just have one document in the documents list and not using the document feature in contains or annotations.

marcverhagen commented 3 years ago

Should we consider specifying contains in the json schema?

It is specified, but not used I think. I will look at the schema because I think they are a mix of two versions that are not integrated well.

keighrim commented 3 years ago

By the way, in my mind there is a third way of specifying the document an annotation refers to: just have one document in the documents list and not using the document feature in contains or annotations.

I don't think it's a good idea to allow this. I believe this will pose a significant level of complexity in implementation of an SDK (also really easy to overlook) in exchange of only small amount of saving of space. The document property is defined for all annotation types anyway, and in the end of the day, it would save only one line in the resulting MMIF.

marcverhagen commented 3 years ago

The only reason I had that on my mind is because it is that way in LIF. But I see your point and have no objections to requiring to have a document property on either the view metadata or the annotation.

keighrim commented 3 years ago

No further actions are required, closing this issue. (last reverting was a mis-click of mine)

keighrim commented 3 years ago

I can't remember why I re-opened this. Closing again.