frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 107 forks source link

JSON-LD context for Data Package and Tabular Data Package #218

Open rufuspollock opened 8 years ago

rufuspollock commented 8 years ago

This issue is about creating a valid JSON-LD context a Data Package / Tabular Data Package.

Previous discussion on related topics in #110

pwalsh commented 8 years ago

@rgrp

I'd really like to see if we can move forward on this.

I've gone over this thread #110 and while interesting background information, it does very little to help me get a clear picture on how to move forward practically.

If I summarise that thread for myself, it is basically:

  1. rgrp: JSON-LD, RDF and linked data is good, and we do want to have datapackage.json be compatible. However, datapackage.json mostly targets the fact that data is published in CSV and Excel (and the like), and therefore, we don't want to MUST in the direction of JSON-LD (> RDF), in the interest of keeping the core spec as minimal as possible.
  2. others: linked data is the future, and adoption in datapackage.json can be automated (no friction for users). datapackage.json MUST be valid JSON-LD to be a thing.

What I'm completely lacking there is actual examples and evidence for JSON-LD adoption - the conversation focusses on the spec itself and linked data / semantic web, but not at all on the actual state of data publication and reuse. The only example I could extract out of that thread is around search engines and schema.org. As someone who has been involved in search engine optimisation extensively in the past, I'm not convinced that schema.org is at all a success in terms of the web as a data catalog due to extremely low adoption (please show me otherwise if I am out of touch there).

My position is that datapackage.json should not MUST on JSON-LD, but, that we should definitely focus on documentation, specific, real world examples, and possibly a MAY "spec" for making datapackage.json valid JSON-LD.

I'd be interested to setup a small group of contributors with specific interest here and outline the shape of such work. I believe this is about much more than @context and @id as discussed in #110, and at least we should directly address @type, which relates to the JSON Table Schema spec used in Tabular Data Package.

Is there anyone regularly working with linked data at a practical level who is interested in this?

@jbenet @sballesteros do you still have interest?

@marek-dudas @jindrichmynarz is this something you would be interested in contributing to? It ties in very well with the https://github.com/openbudgets project, where we represent data in both Fiscal Data Package and the OpenBudgets Data Model (RDF), are exploring ways to transform data between both formats. As you are both Linked Data experts, and, we have here a clear use case around real data published by governments, I'm interested in your thoughts on datapackage.json as JSON-LD.

jindrichmynarz commented 8 years ago

I would consider Schema.org a success. According to a recently published paper by Google and Microsoft, Schema.org can be found in 31.3 % or web pages (based on a 10M sample from Google index and WebDataCommons). JSON-LD is used in many widely deployed applications, such as GMail actions.

However, it seems to be that using JSON-LD for FDP is quite indirect. A more appropriate tool may be the recent CSV2RDF W3C recommendation, which uses JSON-LD in part.

We at the University of Economics in Prague will work on converting FDP to RDF. Whether doing so would require data package in JSON-LD is unclear.

pwalsh commented 8 years ago

@jindrichmynarz, thanks for the link to the article.

Schema.org has seen a huge jump in adoption in the last 12 months, that's awesome!

I would like to see a deeper analysis of this without Wordpress themes that simply markup blog posts and other types in WP, which, while very cool, probably skews these figures a bit considering how much of the web runs on Wordpress.

sballesteros commented 8 years ago

@pwalsh I have lost interest and have decided to stick with schema.org Dataset class. Regarding tooling and interop, the W3C CSV on the web effort covers all my needs, and I complement schema:Dataset with what I need from the W3C Metadata Vocabulary for Tabular Data.

In addition to the paper that @jindrichmynarz mentioned, I find things like https://developers.google.com/knowledge-graph/ pretty exciting! I really hope that one day we can easily use mainstream search engines (and their hypermedia APIs) to query things like clinical trials (hence my interest in schema.org).

Hopefully sooner rather than later, the schema.org Dataset class will evolve and adopt a significant part of the W3C Metadata Vocabulary for Tabular Data (or maybe it will be a schema.org extension). Who knows, maybe schema.org potential Actions (see blog post and overview doc) will be used to expose potential data transformation (or sync options) to the user in a nice interoperable hypermedia API.

Anyway, I don't think this is helping the thread, so feel free to delete, but I just wanted to answer the question from @pwalsh regarding loss of interest.

pwalsh commented 8 years ago

@sballesteros thanks for your response, and, no, there is no need to delete it :) - your post and Jindrich's have prompted me to look more deeply into recent work around schema.org. I'm quite interested in schema.org generally, but for a number of reasons, quite skeptical of mainstream search engines as generic data catalogues in the way you describe.

As an aside, you mentioned clinical trials - are you aware of the large project we are working on in this domain (opentrials.net)? Maybe it is worth syncing on that in another channel?

pwalsh commented 8 years ago

@jindrichmynarz about your points on JSON-LD for Fiscal Data Package: yes, I see it is indirect, but here we are talking about a general pattern for making datapackage.json JSON-LD compatible, which is a simpler case. The next step after that might be, of course, to convert to/from Tabular Data Package <> CSVW, which, as I see it now, it quite straight forward one we establish the pattern for metadata in JSON-LD, considering the common basis of each.

jindrichmynarz commented 8 years ago

@pwalsh: You're right. I assumed the narrower context of Fiscal Data Package. However, here the discussion is about data packages in general. In this context, it surely makes sense to see what steps need to be taken to turn datapackage.json into JSON-LD.

Regarding the mapping from FDP to RDF, @marek-dudas recently started sketching the process here.

pwalsh commented 8 years ago

So @jindrichmynarz if I could pick your brain on making datapackage.json JSON-LD compatible in the new year that would be great. We have lots of developments happening around datapackage.json and I'm quite keen to get clear alignment.

ppKrauss commented 8 years ago

+1 vote to this issue!


For nowadays (not far future), there are some directive or new Dataprotocols convention to express semantic in fields? (resources/schema/fields at tabular-data-package stanard)... Example: the W3C's propertyUrl and aboutUrl will be usefull for express semantics at Datasets.


(sorry, can I post this kind of comment here?)

pwalsh commented 8 years ago

@ppKrauss could you provide an example here?

ppKrauss commented 8 years ago

@pwalsh, Is a REST concept based on end-points that are cool URLs, to use in URI-templates. Well known examples, are the so-called URN resolvers,

NOTE: the VAT number of an organization (ex. www.outlandish.com is at UK and have the VAT number 102018679) may be resolved by a template-URL, but not all countries offer a REST system for VAT resolution. In the example http://ec.europa.eu/taxation_customs/vies/vatRequest.html resolves by XML-POST, so is not valid as template-URL.

pwalsh commented 8 years ago

@ppKrauss thanks. What I meant was, an example of a datapackage.json or a JSON Table Schema portion that demonstrates your suggestion.

hbruch commented 3 years ago

When republishing inaccessible datasets, I started using json-ld/schema.org to make them findable e.g. via google dataset search. Json-ld and datapackage.json have much overlap, though not completely. Datapackages lend themselves more to an in depth description of the data and provides support to easily process the data (at least tabular data).

Would be nice to create a (basic) data package from json-ld or inversely generate json-ld from a datapackage.json. This issue seems to have gone stale since 2016. Are there any current plans to either make them more compatible or provide conversion utilities?

ioggstream commented 1 year ago

@hbruch I am interested in generating json-ld from datapackage.json too. Currently there are a lot of possible options and a simple solution could even be based on passing a JSON-LD context in datapackage.json

rufuspollock commented 1 year ago

@hbruch @ioggstream this is really welcome - we just need someone to step up to make it happen 😄

ioggstream commented 1 year ago

@rufuspollock Ok, so for now I have a proposal loosely based on this I-D https://datatracker.ietf.org/doc/draft-polli-restapi-ld-keywords/

The general idea I'm working is this one:

Given this CSV

id,label_it,label_en
FRA,Francia,France
ITA,Italia,Italie

I have this DataPackage

    schema:
      fields:
        - { name: id,  type: string }
        - { name: label_it,  type: string }
        - { name: label_en, type: string }
      # Extension keyword to provide a json-ld context  
      x-jsonld-context:  
        "@vocab": https://countries.example/
        skos: http://www.w3.org/2004/02/skos/core#

        id:
          "@type": "@id"
        # Localize labels. Order is relevant.
        label_en:
          "@id": skos:prefLabel
          "@language": en
        label_it:
          "@id": skos:prefLabel
          "@language": it
      missingValues:
        - ""

The CSV can be easily trasformed in json, and then in json-ld adding the above context open in json-ld playground

{
  "@context": {
    "@vocab": "https://countries.example/",
    "skos": "http://www.w3.org/2004/02/skos/core#",

    "id": { "@type": "@id", "@id": "@id" },
    "label_en": { "@id": "skos:prefLabel", "@language": "en" },
    "label_it": { "@id": "skos:prefLabel", "@language": "it" }
  },
  "@graph": [
    { "id": "ITA", "label_it": "Italia", "label_fr": "Italie" }, 
    { "id": "FRA", "label_it": "Francia", "label_fr": "France" }
  ]
}

This approach introduces #451 without having to define specific behavior, and delegates all the LD processing to the JSON-LD specifications: this means that if JSON-LD adds new features in context, we just inherite them.

WDYT?

cc: @mfortini @giorgialodi @hbruch

rufuspollock commented 1 year ago

@ioggstream seems good and i'm happy to have any concrete proposal to move things forward 😄

ioggstream commented 1 year ago

I am drafting this document to better analyse the possible choices https://docs.google.com/document/d/1ACMG0dbzHt1NSXxeJ2pHf8zFnnbl7pSiZ6X_-uggdQI/edit?usp=drivesdk