frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
498 stars 113 forks source link

Triple checking json schemas are in sync with specs #490

Closed rufuspollock closed 7 years ago

rufuspollock commented 7 years ago

See comment here https://github.com/ropenscilabs/datapkg/issues/12#issuecomment-315184298 for a list of out of sync items:

Schemas vs. specs As the tip of the iceberg, here are the differences I see between the data-package schema and the data-package specs. Am I looking at the wrong schema files?

Specs Schema
name recommended name required
id ---
profile ---
created ---
licenses license
--- dataDependencies
--- author
ezwelty commented 7 years ago

Some more for Data Resource:

Specs Schema
path url and path (also see anyOf)
licenses license
--- dialect
profile ---
"minItems": 1 "minItems": 0
"required": "name" ---

Looks like Tabular Data Package needs similar updates.

Also, http://schemas.datapackages.org/definitions.json#/define/schema makes no mention of missingValues, primaryKey, foreignKeys, field-specific properties, or any of their definitions.

pwalsh commented 7 years ago

Unfortunately the text of the specs was completely rewritten post v1.rc1, and the schemas were not. The way the specs are now written takes us back to having to update every change to the specifications in the spec document itself, and in the schema for each spec.

There is no clear changelog for the changes made to the specs post rc1, so the best we can do is fix issues as people find them.

I'm going to assign this issue to myself for now, but @rufuspollock any help you can give here is much appreciated.

pwalsh commented 7 years ago

@rufuspollock @ezwelty

I'm trying to follow your comments, but:

Because every single issue you have both raised in not correct, clearly you are looking in the wrong place. I'd like to understand how you got to the wrong place. All up-to-date specs are available at the specs.frictionlessdata.io/schemas/* location, for example data-resource.json, and the source descriptors for all of these specs are located here.

I'm closing this issue now.

ezwelty commented 7 years ago

@pwalsh Well this is frustrating.

I've never known about http://specs.frictionlessdata.io/schemas/data-resource.json because it isn't linked to from any of the following:

For my Data Resource table https://github.com/frictionlessdata/specs/issues/490#issuecomment-315433188, I was looking at the Resources schema embedded within https://schemas.frictionlessdata.io/data-package.json. Shouldn't that match http://specs.frictionlessdata.io/schemas/data-resource.json or refer to it in some way?

My original table for Data Package still holds true https://github.com/frictionlessdata/specs/issues/490#issue-242992335. I compared https://schemas.frictionlessdata.io/data-package.json to http://specs.frictionlessdata.io/data-package/:

name

Specs

Recommended Properties In addition to the required properties, the following properties SHOULD be included in every package descriptor: name

Schema

"required": [ "name", "resources" ],

id

Specs

id A property reserved for globally unique identifiers. Examples of identifiers that are unique include UUIDs and DOIs.

Schema

... nada ...

And so on.

ezwelty commented 7 years ago

@pwalsh Argh, OK, I've caught the problem.

http://specs.frictionlessdata.io/schemas/data-package.json != https://schemas.frictionlessdata.io/data-package.json

So for the sake of all our sanity, probably the following should happen:

pwalsh commented 7 years ago

@ezwelty

How are you getting to

and not to

Both need to exist, as we still have most libraries at pre-v1 now until we close off the remaining v1 issues.

ezwelty commented 7 years ago

@pwalsh See my comment sent right before yours.

For example:

http://specs.frictionlessdata.io/data-package/

JSON Schema (for spec) https://schemas.frictionlessdata.io/data-package.json
pwalsh commented 7 years ago

@rufuspollock I'm really having trouble keeping up with the things that broke from your changes post rc1. The above issue is caused because the rc1 templates generated the correct schema reference from the slug of the spec, but the post rc1 specs added a jsonschema key with incorrect links to the pre v1 specs.

rufuspollock commented 7 years ago

@pwalsh so let's just correct the jsonschema links - i only added that at the time as I did not know how to add the link o/w. At the time I just did my best looking around and found the schemas.* urls. I've now gone through and fixed all the links on these.