frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
497 stars 113 forks source link

Use Namespaces to add terms from other Metadata Schemes #403

Closed ericbusboom closed 4 years ago

ericbusboom commented 7 years ago

See also:

There is a wide variety of metadata schemes. They share a common core of terms, but also have many disjoint terms. To facilitate interoperability, it would be valuable to have a way to link the common terms across these schemes, and when foreign properties are added to data package metadata, to identify the source scheme.

A simple way to do this would be to create a translation table for common terms, and add a namespace feature to prefix foreign terms.

For reference to the terms in other schemes, here is a spreadsheet that links terms between Data Package, Metatab, POD, Dublin Core, CKAN, DCAT and Schema.Org.

A possible implementation of this feature would be to add a 'namespace' property:

{
   "namespaces":  {
    "pod" : "http://example.com/pod/namespace/url",
    "mt": "http://metatab.org"
  }
}

Then, foreign terms could be prefixed:

{
  "name": "my-unique-datapackage",
  "title": "My Unique Data Package",
  "pod:conformsTo": conformity-spec,
  "mt:keywords": { 'kw1', 'kw2' }
}

Processing With Namespaces

Tools that process datapackage metadata can use the name spaces to produce fully-qualified JSON, in which every property name is prefixed. Property names that are originally un-prefixed and are specified by a Data Package spec would be given a default datapackage prefix, for instance "dp:". Terms that are linked could be re-mapped into the Data Package namespace. For instance the property name "mt:Title" could be remapped to "dp:Title".

The namespaces feature should be optional. Tools that don't recognize namespaces can use the defined data package properties without prefixes.

Impact

If the defined Data Package terms are never prefixed, then this feature could be implemented as a customization, and does not need to be incorporated into the spec. That is, the "namespaces" property can be added by end users as a custom property, and foreign terms can be prefixed, without changes to the core spec. So, this feature could be implemented by end users as a common practice, rather than a change to the spec.

Separability

Even if Data package do not incorporate namespaces, maintaining a term map like the one referenced above may be valuable to allow data package metadata to be transformed into other schemes.

Criticisms

Here is a discussion about why adding names spaces to JSON is a bad idea, or at least an accounting of all of the ways to do it wrong.

rufuspollock commented 7 years ago

@ericbusboom to start with it should be stated that the specs allow for extension as is so there is nothing to stop a community trialling this approach or any other metadata extension.

In terms of adding namespaces to the core spec: my concern would be that it is significant departure from our aim of zen-like simplicity. To be justified it would have to deliver significant to publishers and/or implementors and/or consumers. For implementors this seems an additional burden.

For publishers and consumers I'm not so clear. It would be good to hear from you about your thoughts on the specific use cases where this would be valuable.

ericbusboom commented 7 years ago

"zen-like simplicity": completely agree. The feature could be implemented as an optional extension, and if core terms aren't prefixed, it is backwards compatible and doesn't require implementers to support the feature.

In fact, the feature could entirely be a convention for adding new terms, with no changes to the main spec, and no implementation requirements. It would just need to be documented as the best way to add custom terms.

The main situations where qualified terms would be valuable are:

For use by a single party -- the creator and consumer are part of the same organization -- only (c) is really important. But if the metadata is used outside the creator's org, all of these cases can result in conflict and ambiguity.

So, maybe we ( ok, I ) write this up as a formal convention?

rufuspollock commented 7 years ago

@ericbusboom agree on all points.

So, maybe we ( ok, I ) write this up as a formal convention?

That would be great 😄

eocarragain commented 6 years ago

@rufuspollock @ericbusboom I know this has been discussed in before (see #110; #218), but isn't json-ld a simple, widely adopted, unobtrusive, developer-friendly way of tackling his problem? Apologies if I'm missing something obvious!

rufuspollock commented 4 years ago

DUPLICATE. I'm closing this in favour of the newer issue that covers the same core ground #663.