clarin-eric / standards

work space for the Standards and Interoperability Committee
https://www.clarin.eu/content/standards
4 stars 15 forks source link

Cataloguing unregistered media types and parameters #146

Open bansp opened 2 years ago

bansp commented 2 years ago

If they are registered with some working pipeline architectures or recognized by the Switchboard, we want to catalogue media types and parameters, because they may be (or are) emerging best practices and then we

bansp commented 2 years ago

At the meeting today, we gave green light to this. Setting priority to try to handle this by the next meeting (no promises).

bansp commented 2 years ago

To be more precise, next to

<mimeType>application/xml</mimeType>

we would have something like

<mimeType status="unregistered">application/folia+xml</mimeType>

and then, instead of what should be:

<mimeType status="unregistered">application/tei+xml;format-variant=tei-iso-spoken</mimeType>

-- which is not fully precise, because the "application/tei+xml" is by all means registered -- we could do, with a twist, the following :

<mimeType value="application/tei+xml">
  <mimeParam name="format-variant" value="tei-iso-spoken" status="unregistered"/>
</mimeType>

and we could (or should) push a step further, and do also:

<mimeType value="application/tei+xml">
  <mimeParam name="format-variant" value="tei-iso-spoken" status="unregistered"/>
  <mimeParam name="tokenized" value="{0,1}" status="unregistered"/>
</mimeType>

-- with a convention that {} embrace a set of values.

A further step is to identify the registering entity (rather than just saying "yes" or "no"), and have the @registered attribute as obligatory, and do:

<mimeType registered="IANA">application/xml</mimeType>

and then we would have something like

<mimeType registered="CLARIN">application/folia+xml</mimeType>

(it could also be @registered="no", of course, but see below)

Let's consider one further hypothetical step:

<mimeType value="application/tei+xml" registered="IANA">
  <mimeParam name="format-variant" value="tei-iso-spoken" registered="CLARIN"/>
  <mimeParam name="tokenized" value="{0,1}" registered="CLARIN"/>
</mimeType>

The final form

bansp commented 2 years ago

Meeting decision: we'll see if the CSC has anything to suggest here, and may at some point split this into individual tasks (processing in various ways -- at least two tasks: redefining the XPaths and providing a way to aggregate the parameters in some way; redoing the schema; checking what we can provide via the API and how).

bansp commented 2 years ago

Meeting decision: I'll hack at this in a separate branch.

bansp commented 2 years ago

Meeting decision: move to the next milestone.

bansp commented 2 years ago

I fully concur with the above decision: move to the next milestone.

bansp commented 1 year ago

Re-read, potentially open a new branch for this.

bansp commented 1 year ago

This is connected to #157

bansp commented 1 year ago

I am prioritising this one for myself, not to lose it from my sights.

bansp commented 1 year ago

This ticket is connected to #157 .

bansp commented 2 months ago

One stumbling block above was consultation with the SIC. I think the order should be different: first a prototype, then consultation. Otherwise the members will not be sure what they are being consulted about.

BUT the prototype has to come post-CAC, or in a separate branch.