Closed rufuspollock closed 1 month ago
@rgrp
I'd really like to see if we can move forward on this.
I've gone over this thread #110 and while interesting background information, it does very little to help me get a clear picture on how to move forward practically.
If I summarise that thread for myself, it is basically:
JSON-LD
, RDF
and linked data is good, and we do want to have datapackage.json
be compatible. However, datapackage.json
mostly targets the fact that data is published in CSV and Excel (and the like), and therefore, we don't want to MUST
in the direction of JSON-LD
(> RDF
), in the interest of keeping the core spec as minimal as possible.datapackage.json
can be automated (no friction for users). datapackage.json
MUST
be valid JSON-LD
to be a thing.What I'm completely lacking there is actual examples and evidence for JSON-LD
adoption - the conversation focusses on the spec itself and linked data / semantic web, but not at all on the actual state of data publication and reuse. The only example I could extract out of that thread is around search engines and schema.org. As someone who has been involved in search engine optimisation extensively in the past, I'm not convinced that schema.org is at all a success in terms of the web as a data catalog due to extremely low adoption (please show me otherwise if I am out of touch there).
My position is that datapackage.json
should not MUST
on JSON-LD
, but, that we should definitely focus on documentation, specific, real world examples, and possibly a MAY
"spec" for making datapackage.json
valid JSON-LD
.
I'd be interested to setup a small group of contributors with specific interest here and outline the shape of such work. I believe this is about much more than @context
and @id
as discussed in #110, and at least we should directly address @type
, which relates to the JSON Table Schema spec used in Tabular Data Package.
Is there anyone regularly working with linked data at a practical level who is interested in this?
@jbenet @sballesteros do you still have interest?
@marek-dudas @jindrichmynarz is this something you would be interested in contributing to? It ties in very well with the https://github.com/openbudgets project, where we represent data in both Fiscal Data Package and the OpenBudgets Data Model (RDF), are exploring ways to transform data between both formats. As you are both Linked Data experts, and, we have here a clear use case around real data published by governments, I'm interested in your thoughts on datapackage.json
as JSON-LD
.
I would consider Schema.org a success. According to a recently published paper by Google and Microsoft, Schema.org can be found in 31.3 % or web pages (based on a 10M sample from Google index and WebDataCommons). JSON-LD is used in many widely deployed applications, such as GMail actions.
However, it seems to be that using JSON-LD for FDP is quite indirect. A more appropriate tool may be the recent CSV2RDF W3C recommendation, which uses JSON-LD in part.
We at the University of Economics in Prague will work on converting FDP to RDF. Whether doing so would require data package in JSON-LD is unclear.
@jindrichmynarz, thanks for the link to the article.
Schema.org has seen a huge jump in adoption in the last 12 months, that's awesome!
I would like to see a deeper analysis of this without Wordpress themes that simply markup blog posts and other types in WP, which, while very cool, probably skews these figures a bit considering how much of the web runs on Wordpress.
@pwalsh I have lost interest and have decided to stick with schema.org Dataset class.
Regarding tooling and interop, the W3C CSV on the web effort covers all my needs, and I complement schema:Dataset
with what I need from the W3C Metadata Vocabulary for Tabular Data.
In addition to the paper that @jindrichmynarz mentioned, I find things like https://developers.google.com/knowledge-graph/ pretty exciting! I really hope that one day we can easily use mainstream search engines (and their hypermedia APIs) to query things like clinical trials (hence my interest in schema.org).
Hopefully sooner rather than later, the schema.org Dataset class will evolve and adopt a significant part of the W3C Metadata Vocabulary for Tabular Data (or maybe it will be a schema.org extension). Who knows, maybe schema.org potential Actions (see blog post and overview doc) will be used to expose potential data transformation (or sync options) to the user in a nice interoperable hypermedia API.
Anyway, I don't think this is helping the thread, so feel free to delete, but I just wanted to answer the question from @pwalsh regarding loss of interest.
@sballesteros thanks for your response, and, no, there is no need to delete it :) - your post and Jindrich's have prompted me to look more deeply into recent work around schema.org. I'm quite interested in schema.org generally, but for a number of reasons, quite skeptical of mainstream search engines as generic data catalogues in the way you describe.
As an aside, you mentioned clinical trials - are you aware of the large project we are working on in this domain (opentrials.net)? Maybe it is worth syncing on that in another channel?
@jindrichmynarz about your points on JSON-LD
for Fiscal Data Package: yes, I see it is indirect, but here we are talking about a general pattern for making datapackage.json
JSON-LD
compatible, which is a simpler case. The next step after that might be, of course, to convert to/from Tabular Data Package <> CSVW, which, as I see it now, it quite straight forward one we establish the pattern for metadata in JSON-LD
, considering the common basis of each.
@pwalsh: You're right. I assumed the narrower context of Fiscal Data Package. However, here the discussion is about data packages in general. In this context, it surely makes sense to see what steps need to be taken to turn datapackage.json into JSON-LD.
Regarding the mapping from FDP to RDF, @marek-dudas recently started sketching the process here.
So @jindrichmynarz if I could pick your brain on making datapackage.json
JSON-LD compatible in the new year that would be great. We have lots of developments happening around datapackage.json
and I'm quite keen to get clear alignment.
+1 vote to this issue!
For nowadays (not far future), there are some directive or new Dataprotocols convention to express semantic in fields? (resources/schema/fields
at tabular-data-package stanard)... Example: the W3C's propertyUrl
and aboutUrl
will be usefull for express semantics at Datasets.
(sorry, can I post this kind of comment here?)
@ppKrauss could you provide an example here?
@pwalsh, Is a REST concept based on end-points that are cool URLs, to use in URI-templates. Well known examples, are the so-called URN resolvers,
http://dx.doi.org/{DOI}
where the placeholder {DOI}
is a DOI, like doi:10.1038/ncomms7368, so http://dx.doi.org/10.1038/ncomms7368 is the resulted URL from template+value.http://www.lexml.gov.br/urn/{fullURN}
where the placeholder {fullURN}
is a valid URN, like urn:lex:br:federal:lei:2014;13019, so http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2014;13019 is the resulted URL from template+value.NOTE: the VAT number of an organization (ex. www.outlandish.com is at UK and have the VAT number 102018679) may be resolved by a template-URL, but not all countries offer a REST system for VAT resolution. In the example http://ec.europa.eu/taxation_customs/vies/vatRequest.html resolves by XML-POST, so is not valid as template-URL.
@ppKrauss thanks. What I meant was, an example of a datapackage.json or a JSON Table Schema portion that demonstrates your suggestion.
When republishing inaccessible datasets, I started using json-ld/schema.org to make them findable e.g. via google dataset search. Json-ld and datapackage.json have much overlap, though not completely. Datapackages lend themselves more to an in depth description of the data and provides support to easily process the data (at least tabular data).
Would be nice to create a (basic) data package from json-ld or inversely generate json-ld from a datapackage.json. This issue seems to have gone stale since 2016. Are there any current plans to either make them more compatible or provide conversion utilities?
@hbruch I am interested in generating json-ld from datapackage.json too. Currently there are a lot of possible options and a simple solution could even be based on passing a JSON-LD context in datapackage.json
@hbruch @ioggstream this is really welcome - we just need someone to step up to make it happen 😄
@rufuspollock Ok, so for now I have a proposal loosely based on this I-D https://datatracker.ietf.org/doc/draft-polli-restapi-ld-keywords/
The general idea I'm working is this one:
Given this CSV
id,label_it,label_en
FRA,Francia,France
ITA,Italia,Italie
I have this DataPackage
schema:
fields:
- { name: id, type: string }
- { name: label_it, type: string }
- { name: label_en, type: string }
# Extension keyword to provide a json-ld context
x-jsonld-context:
"@vocab": https://countries.example/
skos: http://www.w3.org/2004/02/skos/core#
id:
"@type": "@id"
# Localize labels. Order is relevant.
label_en:
"@id": skos:prefLabel
"@language": en
label_it:
"@id": skos:prefLabel
"@language": it
missingValues:
- ""
The CSV can be easily trasformed in json, and then in json-ld adding the above context open in json-ld playground
{
"@context": {
"@vocab": "https://countries.example/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"id": { "@type": "@id", "@id": "@id" },
"label_en": { "@id": "skos:prefLabel", "@language": "en" },
"label_it": { "@id": "skos:prefLabel", "@language": "it" }
},
"@graph": [
{ "id": "ITA", "label_it": "Italia", "label_fr": "Italie" },
{ "id": "FRA", "label_it": "Francia", "label_fr": "France" }
]
}
This approach introduces #451 without having to define specific behavior, and delegates all the LD processing to the JSON-LD specifications: this means that if JSON-LD adds new features in context, we just inherite them.
WDYT?
cc: @mfortini @giorgialodi @hbruch
@ioggstream seems good and i'm happy to have any concrete proposal to move things forward 😄
I am drafting this document to better analyse the possible choices https://docs.google.com/document/d/1ACMG0dbzHt1NSXxeJ2pHf8zFnnbl7pSiZ6X_-uggdQI/edit?usp=drivesdk
This issue is about creating a valid JSON-LD context a Data Package / Tabular Data Package.
Previous discussion on related topics in #110