frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
488 stars 112 forks source link

Fixing display of v1 specs #385

Closed pwalsh closed 7 years ago

pwalsh commented 7 years ago

This is a placeholder. All known bugs have already been fixed, but there are still possibly issues with readability, as the spec generation has changed significantly.

Some of the changes were very purposeful.

Example: each Table Schema field is self contained and repeats data, but this was to make some things very explicit per field, such as exactly which constraints each field supports, rather than in the old presentation, where much was ambiguous (evidenced by some questions in issues, and even in the implementations themselves).

However we obviously need ensure that things are clear and understandable.

@stevage @rufuspollock @roll @Stiivi @akariv

I'd really appreciate input from all of you, if you have time, by detailed comments on this thread.

Please do not hold back - if I lost too much readability in shifting to a generated spec rather than a strictly narrative one, then I also need to resolve this (and it is easy enough to resolve).

CharlesNepote commented 7 years ago

Yes, the example you take is indeed very unclear. It took me some time to understand that the repetition was for each field: at least all schema fields sections should be clearly visually separated from each other.

akariv commented 7 years ago

Nevertheless, there are some fields which are mandatory regardless of type (e.g. name). From the current presentation it's nearly impossible to understand that.

pwalsh commented 7 years ago

I guess the major problem is Table Schema fields. I have some points on that, but I'll wait for more feedback, and see what else comes up in general.

roll commented 7 years ago
roll commented 7 years ago
pwalsh commented 7 years ago

format: mm-dd-yyyy

roll commented 7 years ago

Here is decided to apply constraints only on cast values - https://github.com/frictionlessdata/specs/issues/296#issuecomment-268471008

It seems it's kind of mechanical mistake that pattern constrains intended only for strings have leaked to other types in current Table Schema v1.0.0rc1

pwalsh commented 7 years ago

@roll it was not decided there - it was a suggestion I made that was never confirmed by anyone else in that thread. It also needs to be field specific, as only applying on cast values cannot work for date/time fields at least. The solution in a call following that thread was to just be explicit on constraints for every single type.

roll commented 7 years ago

by @pwalsh

there is a bug here (about missingValues defaults) -https://github.com/frictionlessdata/specs/blob/master/sources/dictionary/tableschema.yml#L167

roll commented 7 years ago

From Table Schema:

Form
The descriptor MUST be valid JSON, as described in RFC 4627, and SHOULD be in one of the following forms:
- A file named tableschema.json.
- An object, either on its own or nested in another data structure.

Doesn't it sounds too restrictive (ok it's only SHOULD not MUST but anyway)? Because I suppose the most common way to name your table schema after name of the data file like:

roll commented 7 years ago
roll commented 7 years ago

PS. We have basic concepts explanation (name, constraint etc) in the very beginning of the spec but there is a big distance in pages between it and fields paragraph.

roll commented 7 years ago

@pwalsh So what I think we miss in current Table Schema spec is something like Concepts section inside the Specification section with list of key spec concepts. And it could solve problem with clarity on constraints applying we was discussing in Slack. So something like this:


Concepts

tabular data

There is a note about tabular data but should we provide a quick introduction on what could be described by the spec?

data value

What is data value that could be described by the field object. How it's related to field type/format. If data value conforms to a field it means that it must conform to type/format of field. Describe concept of raw and typed (cast, parsed) data value?

null value

The same as in SQL null value is important concept of the spec. So we should clarify what does it mean if value is inside missingValues. In SQL null is not an implementation concept but the spec concept. The same for Table Schema I suppose.

constraints

Field could have constraints but what does it mean? What is constraint value (related to data value)? It mean something like - data value only conforms to field if it satisfy all field constraint. Where satisfying a constraint means:


I do understand there is a good chance to touch some implementations details we don't want. But with good wording I suppose we could find a good balance between clarifying core concepts and not being implementation-specific.

CharlesNepote commented 7 years ago

In Table Schema spec the following example is wrong: { "name": "extra" "type": "object" }.

A comma is missing.

roll commented 7 years ago

moved to https://github.com/frictionlessdata/specs/issues/393

pwalsh commented 7 years ago

@roll the number docs are all old docs, nothing new or changed there, so maybe the above comment is a good candidate for a distinct issue (it is not something I'd want to address as part of fixing v1 + display issues).

roll commented 7 years ago

@pwalsh done!)


PS. upd comment to don't spam people too much

CharlesNepote commented 7 years ago

author is mentioned in one of the data packages examples, but it doesn't seem to be specified.

CharlesNepote commented 7 years ago

In data package properties, role is not specified at all.

roll commented 7 years ago

Table Schema

Optional properties

A Table Schema descriptor SHOULD include the following properties.

It's optional. Shouldn't it be MAY?

primaryKey

Items

Each item in the array is a string. The property is required, and other defined properties are optional.

Not clear what second sentence mean.

foreignKeys

The whole section should be reviewed I suppose (just not finished).

roll commented 7 years ago
CharlesNepote commented 7 years ago

HTML code is not valid, see:

amercader commented 7 years ago

@pwalsh Here are my 2 cents (or perhaps a bit more than 2) after a full read of the specs:

Table Schema

Formatting and readability

Specs language

Data Resource

Tabular Data Resource

Data Package

Tabular Data Package

stevage commented 7 years ago

@amercader

So if a Data Package has 3 resources, 2 CSVs and 1 PDF is not a Tabular Data Package

Hmm, good question. My understanding was that it would have 2 resources (the csv files) defined in the datapackage.json, while the PDF file would be included in the bundle of file, but not be referred to in the JSON. But now that doesn't sound right to me.

pwalsh commented 7 years ago

@amercader @stevage

So if a Data Package has 3 resources, 2 CSVs and 1 PDF is not a Tabular Data Package

Correct, a Tabular Data Package requires that each resource is a Tabular Data Resource. Until v1, Data Resource was not a top-level concept, so, practically speaking, it would not have been possible to have non-tabular data as resources in a TDP. However, now that we have Data Resources specified, and they too have profiles, then it is easier to declare a generic Data Package where some Data Resources are of one type, and other of another.

roll commented 7 years ago

The data in the JSON Pointer seems to reference the own data property in the Resource itself. After some thought, and if I understood the specs correctly this complete example would be

And yes, and no) That's the gotcha of JSON Pointers - it goes against newly introduced composability of specs (schema-resource-package).

Example from Data Resource:

because those descriptors have different roots.

rufuspollock commented 7 years ago

Comment: I'm finding this thread sort of tough to follow as we get more interleaving comments. What would people think of a hackmd doc (e.g. this blank one) where we could consolidate things?

roll commented 7 years ago

The description for currency is really confusing: "A number that may include additional currency symbol". Do you need to provide an actual number? The currency symbol or name?

May be it means currency is a boolean flag?

pwalsh commented 7 years ago

@roll let's try to seperate new display issues, from things like this currency issue which is simply wording from the old specs

roll commented 7 years ago

@pwalsh (cc @amercader) prev1:

gyearmonth
A specific month in a specific year as per XMLSchema gYearMonth.
Usual lexical representation is: YYYY-MM. There are no format options.

v1:

Year Month Field
A calendar year month, being an integer with 1 or 2 digits. Equivalent to gYearMonth in XML Schema

upd. prev1 is correct

roll commented 7 years ago

Table Schema datetime default format:

default: An ISO8601 format string for datetime.

It should provide a concrete pattern like it was in pre-v1 - https://pre-v1.frictionlessdata.io/json-table-schema/#date. Just ISO8601 is a not concrete enough.

rufuspollock commented 7 years ago

REQUEST: no more commenting in this issue as the comment thread is becoming unreadable.

Please post stuff in the hackmd https://hackmd.io/CwUwzAbAxgDAJjAtNOZHAGYYIyIBxgBGAhusHtlIQOxRwCcEcQA=

pwalsh commented 7 years ago

DUPLICATE. Info went elsewhere e.g. #420