Closed CharlesNepote closed 5 years ago
@CharlesNepote very useful and interesting suggestion.
Much of this metadata looks like it is very similar to the generic descriptive metadata on Data Package and Data Resource so we can probably reuse somewhat.
As a first step would you like to tidy this up into a "pattern" -- our approach atm is first to get a solid pattern, do some community review and then after there's been time to see this is solid and useful to look at whether it goes into the main spec.
@rufuspollock Do you mean I should propose a new section of the pattern page (with a clear specification)?
[...] do some community review
How? Where?
@CharlesNepote once you have a draft we'll put it on patterns. Review will then
Here is a very first draft. I wrote it the way the other patterns were written. Is it ok? My english is not as good as I would like to, there are probably mistakes in my proposition.
I tried to keep things simple but many aspects can be discussed:
author
as in the other frictionless specs while Dublin Core says creator
name
, not sure it is relevanturl
could be identifier
as in Dublin Coreprevious
should be named previous_version
===========================================
Documentation is a fundamental aspect of data sharing. Table Schema needs contextual metadata to let people and softwares understand and manage schemas.
That way, people creating or reusing schemas would have a better understanding of this ones. Some softwares would be able to produce user documentation based on schemas. In the future, some others would be able to use metadata for different tasks such as: verifying schema integrity and provenance, schemas crawling and cataloguing, schemas versioning, etc.
There are no known implementations at present.
To allow schema documentation, implementations MUST
start with the schema
object. This object contains the already defined fields
array, as in a classical Tabular Data Package. It also contains some other properties useful for schema documentation. These other properties comes from widely adopted practices to describe a resource.
Each schema MUST
have a title
which represent the human-readable title of the schema. The title
is the only required property.
All other properties SHOULD
be implemented:
author
is the author of the schema ; one or more people and/or one or more organisationspublisher
is the publisher of the schema ; one or more organisations or one or more peoplecontributor
is a list of contributors of this schemaversion
is a version number of the schema to let people be sure they talk about the same thingdate
is the release date of the schema in a free formatdescription
is where authors can describe the schema in a few sentences; description
is highly recommended; it can be a place to use keywords with #hashtags; Markdown format is encouragedhomepage
is the home on the web that is related to this schema (not the schema itself); it's a well formed URLurl
is the web address where the schema can be retrievedprevious
is the URL of the current schema previous version (allowing tools to produce schema diffs)A user might built a schema.json
as follows:
{
"schema": {
"title": "Postal codes list schema",
"author": "Jacques Facteur",
"publisher": "Postal codes committee",
"contributor": "Julie Martin, Max Dupont, Estelle Bois",
"version": "0.1 beta",
"date": "2017/01/31",
"description": "Postal codes list schema defines the raw list of postal codes in France.",
"homepage": "http://example.com/postal-code-list-schema.html",
"url": "http://example.com/2017/03/13/schema.json",
"previous": "http://example.com/2017/02/21/schema.json",
"fields": [
{
"name": "postal_code",
"title": "postal code",
"type": "string"
}
]
}
}
Thanks @CharlesNepote for this great suggestion. A few remarks:
contributor
field should be an array or a single stringdate
field, what do you mean "in a free format"? Do you mean there's no standard format to be followed? In that case I would be worried this data can not be used.I'm not sure whether the contributor field should be an array or a single string.
If it's an array, what is the use case? I think this field is just to thanks contributors and not to a field to allow new usages. Keep it simple.
in the date field, what do you mean "in a free format"? Do you mean there's no standard format to be followed? In that case I would be worried this data can not be used.
I agree. A standard format will allow many interesting use cases: schema cataloguing by date, RSS feeds, etc. I'll precise "ISO-8601" format.
the metadata for a given version of the schema lets you know if there are previous version but not if tehre are more uptodate version...
The homepage
field should be used for this purpose. It should the web page where the schema environment is described: history, changelog, versions, etc. I'll precise that in the documentation.
finally, some of the metadata you can find on schema pages (e.g. http://specs.frictionlessdata.io/table-schema/) is not in your documentation
Yes it's intentional. These documentation would take place:
Here is the updated spec. @rufuspollock do I have to make a pull request?
=============================================
Documentation is a fundamental aspect of data sharing. Table Schema needs contextual metadata to let people and softwares understand and manage schemas.
That way, people creating or reusing schemas would have a better understanding of this ones. Some softwares would be able to produce user documentation based on schemas. In the future, some others would be able to use metadata for different tasks such as: verifying schema integrity and provenance, schemas crawling and cataloguing, schemas versioning, etc.
There are no known implementations at present.
To allow schema documentation, implementations MUST
start with the schema
object. This object contains the already defined fields
array, as in a classical Tabular Data Package. It also contains some other properties useful for schema documentation. These other properties comes from widely adopted practices to describe a resource.
Each schema MUST
have a title
which represent the human-readable title of the schema. The title
is the only required property.
All other properties SHOULD
be implemented:
author
is the author of the schema ; one or more people and/or one or more organisationspublisher
is the publisher of the schema ; one or more organisations or one or more peoplecontributor
is a string which list the contributors of the schemaversion
is a version number of the schema to let people be sure they talk about the same thingdate
is the release date of the schema (ISO-8601 format, YYYY-MM-DD should be enough)description
is where author(s) can describe the schema in a few sentences; description
is highly recommended; it can be a place to use keywords with #hashtags; Markdown format is encouragedhomepage
is the home on the web that is related to this schema (not the schema itself); it should be where the schema environment is described: history, changelog, versions, etc.; this field is a well formed URLurl
is the web address where the schema can be retrievedprevious
is the URL of the current schema previous version (allowing tools to produce schema diffs)A user might built a schema.json
as follows:
{
"schema": {
"title": "Postal codes list schema",
"author": "Jacques Facteur",
"publisher": "Postal codes committee",
"contributor": "Julie Martin, Max Dupont, Estelle Bois",
"version": "0.1 beta",
"date": "2017-01-31",
"description": "Postal codes list schema defines the raw list of postal codes in France.",
"homepage": "http://example.com/postal-code-list-schema.html",
"url": "http://example.com/2017/03/13/schema.json",
"previous": "http://example.com/2017/02/21/schema.json",
"fields": [
{
"name": "postal_code",
"title": "postal code",
"type": "string"
}
]
}
}
The family of specs already has fields that deal with title, author/contributor/publisher (contributors), date (created), description. We could reuse these on the table schema spec.
The family of specs already has fields that deal with title, author/contributor/publisher (contributors), date (created), description. We could reuse these on the table schema spec.
Thanks @pwalsh, I appreciate your help.
I made a global search on each spec and here is my observations.
title
and description
properties share the same semantic and format as in frictionlessdata specsdate
might be the same as created
and I could change for it: I understand that RFC 3339 comes from ISO-8601 but does it allow "1985-04-12" instead of "1985-04-12T23:20:50.52Z"?url
property might the same as uri
property in frictionless data specs and I could change for it: I know an URL is an URI but I would like to be sure there is no consequencesauthor
it is strangely mentioned in one of the data packages examples, but it is not specified (is it a bug? should I open an issue?)publisher
, version
and previous
properties, which can be discussed. I was particularly surprised that publisher
have not been used by frictionlessdata specs.homepage
is not as sophisticated as homepage
in data package properties; I can change it (even if I think more complicated to have "homepage" { "name": "My web page", "uri": "http://example.com/" }
contributor
is not as sophisticated as contributors
in data package properties: I wonder to know why it is so rich. (By the way, your role
is not specified at all, should I open an issue?)Hi @CharlesNepote if you could add the problems / bugs to this issue it would be great https://github.com/frictionlessdata/specs/issues/385
@pwalsh @rufuspollock
I've updated my spec below, making efforts to take in account existing properties. My proposal creates 3 new properties:
publisher
but I think we can abandon it if necessaryversion
: I think it is important to let people be sure they are speaking of the same thingprevious
: is the previous version of the schema, which I found important to ensure schema traceabilityI'm still uncomfortable with 2 properties of frictionlessdata:
homepage
is too sophisticated in my opinion; it should be only an URI; schema structure should be as flat as possible to let people built schema easilycontributors
is worse and not correctly documented; implementations have to test what is the role of the contributor to know what to do with it; again it would be better, IMHO, to have a flat structure with author
, contributor
, etc. to keep it simple. So I kept author
and contributor
for the moment.Are homepage
and contributors
definitely decided in frictionless data specs?
========================
Documentation is a fundamental aspect of data sharing. Table Schema needs contextual metadata to let people and softwares understand and manage schemas.
That way, people creating or reusing schemas would have a better understanding of this ones. Some softwares would be able to produce user documentation based on schemas. In the future, some others would be able to use metadata for different tasks such as: verifying schema integrity and provenance, schemas crawling and cataloguing, schemas versioning, etc.
There are no known implementations at present.
To allow schema documentation, implementations MUST
start with the schema
object. This object contains the already defined fields
array, as in a classical Tabular Data Package. It also contains some other properties useful for schema documentation. These other properties comes from widely adopted practices to describe a resource.
Each schema MUST
have a title
which represent the human-readable title of the schema. The title
is the only required property.
All other properties SHOULD
be implemented:
author
is the author of the schema ; one or more people and/or one or more organisationspublisher
is the publisher of the schema ; one or more organisations or one or more peoplecontributor
is a string which list the contributors of the schemaversion
is a version number of the schema to let people be sure they talk about the same thingcreated
is the release date of the schema (ISO-8601 format, YYYY-MM-DD should be enough)description
is where author(s) can describe the schema in a few sentences; description
is highly recommended; it can be a place to use keywords with #hashtags; Markdown format is encouragedhomepage
is the home on the web that is related to this schema (not the schema itself); it should be where the schema environment is described: history, changelog, versions, etc.; this field is a well formed URLuri
is the web address where the schema can be retrievedprevious
is the URL of the current schema previous version (allowing tools to produce schema diffs)A user might built a schema.json
as follows:
{
"schema": {
"title": "Postal codes list schema",
"author": "Jacques Facteur",
"publisher": "Postal codes committee",
"contributor": "Julie Martin, Max Dupont, Estelle Bois",
"version": "0.1 beta",
"created": "2017-01-31",
"description": "Postal codes list schema defines the raw list of postal codes in France.",
"homepage": "http://example.com/postal-code-list-schema.html",
"uri": "http://example.com/2017/03/13/schema.json",
"previous": "http://example.com/2017/02/21/schema.json",
"fields": [
{
"name": "postal_code",
"title": "postal code",
"type": "string"
}
]
}
}
Just to let you know, we generated a Gitbook documentation of some schemas we are using, based on the enhancements proposed by @CharlesNepote, sightly adapted.
For example:
I intend to propose a "pattern" as suggested above, in the next weeks, to reopen the discussion about adding metadata to table schemas, on a concrete base.
@cbenz great - that would be amazing!
@rufuspollock @pwalsh @frictionlessdata Our team at @jailbreak-paris has increasingly used Table Schema for the past year and a half and we're seeing more and more adoption around us (including at @Etalab). We think it has a bright future. So we've finally rolled up our sleeves and (finally!) got to work on this. Our draft propostion and (most importantly) questions are here: https://github.com/frictionlessdata/specs/pull/627
@johanricher and @cbenz this is awesome, thank-you. I will comment on the PR. I also note you could reuse @CharlesNepote proposal for text (which I think was great - we just did not get it as an actual PR against the patterns page https://github.com/frictionlessdata/specs/blob/master/specs/patterns.md)
Hey @rufuspollock, thanks for the feedback! We followed your instructions by starting this new PR against the patterns page: #630.
As with #627, we built upon @CharlesNepote's idea but also tried to remain closer to the other Frictionless Data specs by using properties from common.yml
(e.g. author
and publisher
as proposed by Charles are redundant with the contributors
property).
I would find very usefull to build schema documentation based on Table Schema. That way documentation and data would stay closed together, because the schema can also be used for validation with CSV Lint.
To do that, we need contextual information about the schema: its title, author, version, and so on... Here is an example to show what I would find useful:
I made a very quick and (very) dirty tool to show that. Unfortunatly, it's only in french for the moment, I'll translate it if you're interested in: http://dataliteracyconference.net/specificator/demo3.html (link will change).
Adding some contextual informations won't break anything, I think. And open data movement needs more professionnal yet simple tools. Thanks for your efforts!