frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
484 stars 110 forks source link

rdfProp and hierarchical rdfType, proposal #451

Open ppKrauss opened 7 years ago

ppKrauss commented 7 years ago

This issue is a proposal for adopt hierarchical principle in the rdfType interpretation, and for add its complementar field-descriptor, rdfProp. Its to enhance flexibility and expressiveness power of the Frictionlessdata specs.

Summary

The proposal is:

  1. Stay with rdfType as "semantic Class" of a field description (its value is an URL of a vocabulary item), see frictionlessdata/specs#343 and frictionlessdata/specs#217.

  2. By hierarchical principle (see frictionlessdata/frictionlessdata.io#852 and frictionlessdata/frictionlessdata.io#866), adopt rdfType as optional descriptor at resources.
    It will be interpreted as "default rdfType" in all fields.
    PS: when a field have other rdfType, will overridden by it (? or interpreted as a specialization).

  3. Add descriptor rdfProp as "semantic Property", to express the semantic (in the context of rdfType when it exists) of a field value.

Example

HTML5-RDFa 1.1 Lite equivalent:

<table vocab="http://schema.org/" typeof="Movie">
  <tr><th>Movie name</th>   <th>Director</th> <td>Price</td></tr>
  <tr> <td property="name">Avatar</td> <td property="director">James Cameron</td>
          <td typeof="Offer" property="price">US$3.00</td> 
  </tr>
  <tr> <td property="name">Metropolis</td> <td property="director">Fritz Lang</td>
          <td typeof="Offer" property="price">US$2.50</td>
  </tr>
</table>

CSV and datapackage files:

Movie name,Director
Avatar,James Cameron,US$3.00
Metropolis,Fritz Lang,US$2.50
{
  "name": "movies",
  "...": "...",
  "resources": [{
      "name": "movies.csv",
      "mediatype": "text/csv",
      "rdfType": "http://schema.org/Movie",
      "schema": {
        "fields": [
          {
            "name": "Movie name",
            "rdfProp": "http://schema.org/name",
            "type": "string"
          },
          {
            "name": "Director",
            "rdfProp": "http://schema.org/director",
            "type": "string"
          },
          {
            "name": "Price",
            "rdfType": "http://schema.org/Offer",
            "rdfProp": "http://schema.org/price",
            "type": "currency"
          }
       ]
     }
}

Other example: datasets-br/state-codes (see br-state-codes.csv).

jyutzler commented 6 months ago

I believe that support for rdf:Property references is a useful enough capability that I have experimented with allowing more flexibility in rdfType in my internal research. However, long-term I prefer adding rdfProp over changing rdfType to have a range of either rdfs:Class or rdf:Property.

In one of these discussions, there was a question of what should take priority if there is a disconnect between type, rdfType, and rdfProp. This is complicated by the fact that a) the MUST stipulation of rdfType being an rdfs:Class is impractical if not impossible to test and b) clients are less likely to implement rdfType support and are even less likely to implement rdfProp support. If it were my system, I would prioritize [rdfs:range of rdfProp] > rdfClass > type. However, considering the real world implications, I believe that the only pragmatic thing to do is to not mandate any behavior but perhaps recommend my proposal.

I would flag this sort of disconnect with a warning if I could detect it.

rjgladish commented 3 months ago

I don't think recasting rdfType (or rdfProp) as the defaul type for unspecified fields[].rdfType is particularly useful. Instead, consider using table level rdfType property to define the type of the row object (aligning with CSVW tableschema datatype).

IFF a table-level type and format default is deemed useful, that structure should duplicate the field descriptor, sans name, title, and deacription, to include type, format, rdfType and rdfProp et al.