frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
498 stars 113 forks source link

rdfType attribute allows only one Class to be provided. #686

Closed proccaserra closed 2 weeks ago

proccaserra commented 4 years ago

The rationale here is to enable types from different ontologies/vocabularies to be provided, for example:

side note: In the case of complex header (as made available using the tabulator function), a more complex expression may be supplied to the 'rdfType' attribute. For example, with a field such as experimental_condition1.auc, setting the rdfType with area under curve to "http://purl.obolibrary.org/obo/STATO_0000209" tells only part of the story and does not cover the notion of 'experimental condition).

lwinfree commented 4 years ago

Hi @proccaserra Could you write out an example of what this would look like? Is your idea to have 2 (or more) entries for rdfType in say an array? (I know we talked about this before, but now I can't remember the exact use case). I think an example would be helpful for the FD team to think about and also for other users to see & comment on. Thanks!

pwalsh commented 4 years ago

Relates to https://github.com/frictionlessdata/specs/issues/451 https://github.com/frictionlessdata/frictionlessdata.io/issues/852 https://github.com/frictionlessdata/specs/issues/437 https://github.com/frictionlessdata/frictionlessdata.io/issues/854 https://github.com/frictionlessdata/specs/issues/218

And maybe most specifically to https://github.com/frictionlessdata/specs/issues/343

proccaserra commented 4 years ago

@lwinfree sure, let me try the following (assuming non-repeating keys)

{
  "name": "HMET-experiment1",
  "...": "...",
  "resources": [{
      "name": "metabolite_profile.csv",
      "mediatype": "text/csv",
      "rdfType": ["http://schema.org/Dataset","http://purl.obolibrary.org/obo/IAO_0000100"],  <- second value is #Data set in IAO ontology
      "schema": {
        "fields": [
          {
            "name": "chemical name",
            "rdfType":[ "http://schema.org/Substance","http://purl.obolibrary.org/obo/CHEBI_23367", [<- #Chemical Entity in CHEBI ontology]
            "type": "string"
          },
          {
            "name": "inchi",
            "rdfType": ["http://schema.org/Text", "http://nmrML.org/nmrCV#NMR:1000412"],
            "type": "string"
          },
          {
            "name": "experimental_condition1.auc",
            "rdfType": ["http://schema.org/PropertyValue","http://purl.obolibrary.org/obo/STATO_0000209"], [<- 2nd value #area_under_curve in STATO ontology]
    "type": "float" 
          }
       ]
     }
}

but then going over issue frictionlessdata/specs#451 referenced by @pwalsh , do I get right additional rdf elements are being considered (e.g. rdfProp)? What other elements ?

reason for asking is the following:

"experimental_condition1.auc" is more than just 'auc' as currently mapped. I'd like to specify'auc' 'computed_over' 'experimental_condition1', which is currently lost in the representation .

lwinfree commented 4 years ago

A related question here is: how can we validate these RDF values (starting with RDFtype)?

nichtich commented 10 months ago

The use of rdfType is limited to a very simple mapping from tabular data to RDF data. Everything beyond is a deep "rabbit hole": RDF data can be expressedn in many complex ways. There are several RDF standards and technologies to model, process, express, and extend RDF. If you want RDF, better use RDF instead of tabular data.

The current use case can be solved by stating (with proper RDF technology e.g. OWL) that