How to include custom fields?

letmaik commented 8 years ago

For cases where the spec doesn't define properties, for example, how to express certain statistical information, how should these be included in a custom extension-like way?

I don't really like the way of having a separate container like "properties" or "extensions", it feels a bit heavyweight.

How about forcing a certain naming scheme for custom properties? Like browser vendors do for CSS fields.

For example:

  "observedProperty" : {
    "label" : {
      "en": "Sea surface temperature daily mean"
    },
    "uor_statisticalMeasure": "http://www.uncertml.org/statistics/mean",
    "uor_statisticalPeriod": "P1D"
  },

So the requirement would be to have an arbitrary prefix (identifying the organization/person that created the custom property) followed by a underscore followed by the actual property name. If it gets included in the spec, then people would transition to the official names, maintaining compatibility in their clients to the custom name. Since we use camel case everywhere, those custom properties would not collide with anything.

jonblower commented 8 years ago

Looks like a reasonable plan

letmaik commented 8 years ago

I'm not convinced yet, I think people don't want to use prefixes like that, especially for cases when such custom fields never make it into an "approved" spec. They would always be second-class citizens which should not be the case.

So... how about looking at it from a software library dependency management point of view.

Let's say everyone is allowed to define an extension (bunch of properties etc.) and publish that somewhere, maybe github. Each extension is versioned and each version has a URI (which should be a resolvable URL), like https://github.com/Reading-eScience-Centre/stats-for-covjson/releases/tag/0.1.0

How could someone express that they are using that extension in a CovJSON document? I would say this is what profiles are for:

{
  "type" : "Coverage",
  "profiles": ["GridCoverage", "https://github.com/Reading-eScience-Centre/stats-for-covjson/releases/tag/0.1.0"],
  ...
}

So maybe GridCoverage etc. are not really profiles after all? Maybe it really is something like "coverageType" and "domainType". And maybe, the basic CovJSON structure itself is just a profile:

{
  "profiles": ["http://coveragejson.org/1.0", "https://github.com/Reading-eScience-Centre/stats-for-covjson/releases/tag/0.1.0"],
  "type" : "Coverage",
  ...
}

I would even go so far as to take the profiles apart, turn it into an object and make version handling easier:

  "profiles": {
    "http://coveragejson.org": "1.0",
    "https://github.com/Reading-eScience-Centre/stats-for-covjson": "1.0"
  }

If you then require extensions to follow semantic versioning, then it is easy for clients to know which versions they understand (e.g. if they have support for 1.0, then they will understand 1.1 as well, just not all the new features, but 2.0 would be breaking).

How does this fit in with the 'profile' Link relation type (as in HTTP Link headers or the "profile" parameter in the media type). I would say it fits very well. You could imagine that one logical coverage is offered in multiple profiles, e.g. different CovJSON versions or different statistics extensions. A client can request those variants:

curl http://example.com/cov -H "Accept: application/prs.coverage+json; profile=\"http://coveragejson.org/2.0 https://github.com/Reading-eScience-Centre/stats-for-covjson/1.0\""
// could redirect to http://example.com/cov_with_uor_stats.covjson2

So each extension/profile must also define a URI for each version, but also a base URL for use in the CovJSON document itself for easy handling. If the client didn't send any preference for a certain profile then the server can send whatever it thinks is best. If the server doesn't have a perfect version match available, then again it sends the best it can, so it would actually parse the profile URI and extract the version number to do calculations on it (I think this is fine, even though URIs are opaque... in that case, there is a clear idea and purpose of all that and the extension base URI will be the same between versions). And that's content negotiation.

About JSON-LD... the point of all this is to be independent of any complicated JSON-LD processing. However, it does not mean that there can't be a JSON-LD context for all the used extensions that are embedded in the document. For example, someone might put a lot of provenance metadata in the root of a CoverageJSON coverage document which could use the PROV-O ontology as it is, and this is a perfect fit for embedded a JSON-LD context since it's clean JSON-LD/RDF by nature, but in addition the profile would prescribe a certain structure so that as many clients as possible can use that provenance data in an easy way.

Opinions please!

letmaik commented 8 years ago

One more thing: By using extensions in that way, it would also be easy for people to define new domain types and not be forced to use full URIs when referring to them. Same for domain value data types. There would never be a conflict since it is always clear which domain types etc. are in scope by explicitly importing an extension/profile.

jonblower commented 8 years ago

I can't pretend I've understood all of that, but it strikes me that we're probably not the first people to have a problem that could be addressed in this way. Is there any precedent for this approach? I think Rob Atkinson has been interested in modular profiles for a while.

Also is there a danger that all this becomes a bit too "meta" and makes client development harder because all clients will have to anticipate the possibility of multiple profiles being in play?

For CovJSON v1, is it perhaps safer just to say that custom fields can be included but will probably be ignored by most clients? Perhaps in future we will see what kinds of extensions people propose (or whether they propose them at all) and use this to inform the design of a more sophisticated mechanism.

letmaik commented 8 years ago

Very related, especially the comments: https://www.mnot.net/blog/2011/10/12/thinking_about_namespaces_in_json

letmaik commented 8 years ago

And https://tools.ietf.org/html/draft-saintandre-json-namespaces-00

letmaik commented 8 years ago

If we would use profiles for identifying extensions/conventions then this is really a rehash of JSON-LD contexts and probably not ideal. The problem with JSON-LD however is that in the current version with a root context you cannot define a property that is only valid in some subtree, it is always defined globally. And of course, it would still add complexity if clients would need to parse that JSON-LD context, so we possibly have to put some restrictions on it, like only being allowed to link to contexts, and not define definitions in-place:

"@context": [
   "http://example.com/my-extension"
]

This is similar to the profiles idea with the advantage that the terms could be described in a more technical manner. So, a client could then check for an extension with

if (doc['@context'].contains('http://example.com/my-extension')) { ...use extension... }

letmaik commented 8 years ago

I thought about this again and re-read the relevant sections in the Activity Streams 2.0 core spec: 2.1 JSON-LD and especially 5. Extensibility.

I think in the longer run we are better off just doing what they do. We have to adapt the text slightly since CovJSON is not JSON-LD-only in its core, but I don't see a problem with that. And they also mention the case that extensions may include properties not formally defined with JSON-LD:

While implementations are free to use such constructs as extensions within an Activity Streams 2.0 document, consumers that use the standard JSON-LD Processing Algorithms will be required to either ignore such extensions or map those to alternative compatible constructs prior to applying the JSON-LD algorithms. Simple GeoJSON Points, for instance, can be mapped to Place objects, while more complex geometries can be converted to GeoSparql "Well-Known Text" representations as illustrated in the non-normative examples below.

One detail I really like about how they handle non-JSON-LD defined extensions is that they have "@vocab": "_:" in their normative JSON-LD context. This means that by default any additional property not defined in the JSON-LD context is mapped to a "blank node" , e.g. "foo": 5 becomes "_:foo": 5 when processed with JSON-LD, and it becomes "foo": 5 again when applying the JSON-LD compaction algorithm, essentially losing no data.

So in the end this means extension developers have multiple options, starting from just adding random properties as they like, then maybe putting them under a namespace with a prefix, like "prov:generated": ... or directly assigning a non-prefix property name in the JSON-LD context. The main issue to discuss would be how we handle the evolution of the format ourselves, since we don't want to introduce conflicts later on by arbitrarly extending our own core JSON-LD context and risking property name collisions with extensions. ActivityStreams 2.0's core normative context will never change as far as I see. Currently the only solution I see to this is to add a version number to the context URL like http://covjson.org/context-1.jsonld and then state that this context always applies (even if missing in the document; Activity Streams does the same) if not explicity overridden by a later version. I think versioning contexts is fine, since this is different than putting version numbers in concept URIs themselves, which is not a good idea. I'll have to think this through a bit more though.

letmaik commented 8 years ago

I think we just have to take a decision here. There's no clear "best" solution, so I think the way to go is to have a solution which allows a backwards-compatible evolution of the format (meaning, old documents are still valid in newer spec versions) and at the same time allow people to add custom extensions in varying degrees of being interoperable/forward-compatible. And this is completely independent of RDF/JSON-LD, though it integrates it.

My suggestion would be:

Document creators may add new fields as they see fit, however it is recommended to use a namespace prefix followed by a colon (a colon to make it JSON-LD compatible as well) if interoperability is considered important. So, for example, "dct:publisher": "bla".
Document creators may use custom values for the following type fields: domain type, axis data type, reference system type, range type. Again, it is recommended to use a namespace prefix if interoperability is important. [So this would be a spec change since we currently require full URIs if custom types are used, but I think it's better / more convenient that way]
A namespace prefix is recommended to avoid conflicts with future CovJSON versions.
To increase interoperability without having to parse a JSON-LD context, namespace prefixes should be taken from https://prefix.cc/ and new ones be registered there [see note below for a more normative approach]. This makes combination of multiple extensions easy and prevents conflicts.
For increased self-description, any namespace that is used and that has a URI should be added to the root-level @context field (directly within a context object, not by referencing a remote JSON-LD context). This is not just for JSON-LD clients, but also for humans to immediately see what the extension fields are about.
If an extension would be beneficial for a majority of CovJSON documents, then it should be suggested for inclusion into the standard.

Based on that, there are several levels how people can add/use extensions:

Low interoperability and not future-proof: Add an additional field / use a new type (domain type, domain data type, reference system type, range type) without a namespace prefix and hard-code your client to use that. Suitable if the CovJSON document is not meant to be used by others or there is a strong understanding by all parties about the new field/type name and its meaning.
Medium interoperability and future-proof: Add an additional namespaced field / use a namespaced type and associate the namespace prefix to the full URI of the namespace within @context. This prevents breakages with future CovJSON versions and is suitable for draft versions of an extension.
High interoperability and future-proof: Add an additional namespaced field / use a namespaced type, add the namespace to @context, register/use a namespace from https://prefix.cc/, and if appropriate create public documentation of the extension together with example documents to show how to use it. The documentation does not have to live under the namespace URI. Documentation is necessary if a new field has an object or other complex type as value, and in general for any new type.

So, in a way, this extension concept is quite simple because it relies on just two main things on how to do it: either use non-namespaced extensions (use case: unshared data used in own web app etc.), or use a namespace (use case: be future-proof and/or share data).

The only problem may be the reliance on https://prefix.cc since this is meant for RDF and wouldn't fit for things like new domain or range types I think. And apart from that it is not normative and a moving target. The alternative would be to have our own registry in the form of a JSON-LD context file (just containing namespace prefixes) which people could extend via GitHub pull requests and which would be imported by the main CovJSON context file so that newly registered extension namespaces are automatically available for JSON-LD clients. Not a bad idea actually.

Also, an advantage of using namespaces for new types and then registering the namespaces, e.g. "domainType": "metoffice:Corridor" is that the namespace URI itself can be changed centrally (in our new http://covjson.org/prefixes.jsonld file, imported by http://covjson.org/context.jsonld) without having to modify the document itself. I think with registered namespaces there would be no need to explicitly include the namespace + URI in the "@context" field in each document since by default the CovJSON main context is included implicitly even if missing, and this would include all the registered prefixes automatically. Also, we could automatically generate an HTML page out of the prefixes.jsonld file which lists the registered prefixes together with instructions on how to add a new one. A page within covjson.org that links to external extension documentations is probably a good idea as well.

What do you think?

jonblower commented 8 years ago

That all sounds sensible to me. I tend to think that having our own registry as a JSON-LD context file is better than using prefix.cc, for the reasons you give.

letmaik commented 8 years ago

I just thought about this again and I think the strong focus on namespace prefixes is only a good idea for new fields, but not necessarily for types. For example, if someone invents a new domain type and has a single URI for that, then it probably doesn't make much sense to force that person to invent a prefix/namespace and register that with us, as this may be a bit arbitrary. So, I think for types it should be either a compact URI (prefix:name), a full URI, or a simple name. The important part is that the extension author has to decide for one of those which is then considered the normative one for the extension. And only the first two variants are allowed to be added to our extensions registry.

jonblower commented 8 years ago

Sounds reasonable. Could it be generalised - e.g. "use prefix:name for property names, and full URIs for property values"?

letmaik commented 8 years ago

I'm just thinking about the same. But it may not be appropriate in some cases, for example if you have:

"dct:publisher": {
  "type": "http://xmlns.com/foaf/0.1/Organization",
  "foaf:name": "Vista GmbH"
}

instead of

"dct:publisher": {
  "type": "foaf:Organization",
  "foaf:name": "Vista GmbH"
}

jonblower commented 8 years ago

Yes, true.

letmaik commented 8 years ago

I've tried to change the spec accordingly and added an Extensions section, together with adjusting the JSON-LD section afterwards. What do you think? I think it could work.

jonblower commented 8 years ago

I think it looks good. To summarise, does it work like this:

Use ad-hoc property names if you're not too worried about interoperability or clashes
Use the Extensions mechanism (i.e. register a prefix with us) for better interop / avoiding clashes
Use a custom JSON-LD context if you want to enable conversion to RDF.

Can you combine 2 and 3, i.e. use a compact URI whose definition is given in JSON-LD? Is this the purpose of the example in section 8? I guess people might ask why they would need to use "dct:license" when the context file could simply define the meaning of plain "license".

letmaik commented 8 years ago

Yes, it works like that. I added a paragraph in the JSON-LD section on your last comment, hopyfully clarifying it.

letmaik commented 8 years ago

I added this page now: http://covjson.org/prefixes/ Should be fairly simple for people to add new namespace prefixes. Let me know if you think it's too complicated.

jonblower commented 8 years ago

Looks v good to me

letmaik commented 8 years ago

OK, let's close this one then. Finally!

covjson / specification

How to include custom fields? #50