Open peterdesmet opened 9 months ago
+1 -- When I heard the v2 announcement, I immediately assumed it would include breaking changes and was surprised to find it was going to be backwards compatible.
Was v2 chosen because v1.1 felt like it wasn't communicating enough "distance" from v1.0 given the new website, dplib, etc.? If so, a jump to v1.5 might be another option to create separation before/after this initiative, which I would interpret as "major overhaul but no breaking changes".
... that said my opinion isn't very strong on this, so I'm happy to defer to whatever strategy has the most consensus/momentum.
Sidenote: will we version Data Package (the collection of standards) as a whole or will the 4 standards be versioned separately (current approach)? I see benefits and downsides with both approaches.
I think this is an excellent question and definitely warrants further discussion. How it is handled seems intertwined with the standard's governance structure / processes moving forward... Is this the sort of thing we want to/are planning to cover in the working group?
I would not be surprised if there will be an edge case of some artificial piece of data being compliant with 1.0 but not with the new version because the existing wording allows things not planned to be allowed. Moreover I think a version 2.0 will more attract than discourage use.
I don't even think we will need artificial data to hit this problem. https://github.com/frictionlessdata/specs/issues/379 and https://github.com/frictionlessdata/specs/issues/697 are breaking changes likely to be discussed which at some point were added[^20231222T082231] to frictionless-py
v5.
[^20231222T082231]: I think https://github.com/frictionlessdata/specs/issues/379 was removed after https://github.com/frictionlessdata/frictionless-py/issues/868 but frictionless-py
5.16.0 converts "dialect": {"delimiter": ";"}
to "dialect": {"csv": {"delimiter": ";"}}
unless system.standards = "v1"
is specified. I noticed this after having some difficulties in creating data packages that would play nice with both frictionless-py
and frictionless-r
.
Sidenote: will we version Data Package (the collection of standards) as a whole or will the 4 standards be versioned separately (current approach)? I see benefits and downsides with both approaches.
Thinking about "communication simplicity" I think they should be versioned as a whole. This quote from @roll captures the problem quite well:
For example, we would like to make our Python libs 100% compatible/implementing the specs. TBH at the moment, I don't really understand what does it mean. Whether there is a frozen v1 of the specs to be compatible with and where all the current spec changes go
v1.1/v2
branch of the specs etc
To give another example, I can see how frictionless-r
could support Tabular Data Resource v2 with https://github.com/frictionlessdata/specs/issues/379 but not support CSV/Table Dialect v2 with https://github.com/frictionlessdata/specs/issues/697. However this creates an explosion on the number of ways a client could be "standard compliant" creating confusion for users.
I think it's a valid point, and as a Working Group, we can vote on the version when we have finished the changelog.
Peter outlined the pros of staying on v1.1 so I'll add some arguments in favor of v2:
package.propX
by semver we will be still updating to v1.3. So we will get two versions v1.2 and v1.3 (and following) not comparable in size and importance. I think v2 and following small v2.1, v2.2, etc will communicate better the structure of changesTBH, I'm not sure if the specs need 100% compliance to semver as it's not software. For example, JSON Schema versioning has been like Draft X
for years and now it's yyyy-mm
based. Honestly speaking, those Draft X
looked really weird but actually they kinda worked implementors just thought about being compliant with draft "version X"
@peterdesmet
I think we need to work with core standard and domain-specific extensions as projects so it will be core vX
, camtrap vY
, fiscal vZ
etc. So I would just version datapackage
repository as whole (I guess you do the same for camtrap
).
PS. Fiscal Data Package as a domain-specific extension moved to its own project - https://github.com/frictionlessdata/datapackage-fiscal
I just realized "backwards compatibility" / "no breaking changes" has different levels/types of strict-ness, and I'm not clear where we stand:
1) An implementation designed for v2 spec should be equally capable of reading v1 data packages
2) An implementation designed for v1 spec should be capable of reading v2 data packages (albeit with reduced features)
Different types of modifications to the spec break in different ways:
adding a new optional prop in v2 does not break either type of compatibility
removing a prop in v2 breaks (1) but not (2)
changing a prop type from integer
in v1 to integer | string
in v2 breaks (2) but not (1)
etc.
In general, it's easier to upgrade software than existing data artifacts... so I'd argue we should hold to (1) and relax (2) to give us more freedom for v2 improvements. It also puts me squarely in the v2 semver camp because although a given v2 spec implementation will be "backwards compatible with v1 data", it still is "breaking" in that v2 data will not necessarily work with a v1 implementation.
Thanks @khusmann for the summary, I complete agree that we should hold to (1) and relax (2), i.e. future software application should still be able to read v1 data packages (since those will be around for a long time), but can be slow in adopting new features of v2.
I draw a different conclusion regarding the versioning though, since a v2 spec sounds (to me) that software implementations can at some point give up on v1. A v1.1 indicates that this is still within the same major version of the spec.
@peterdesmet Answering https://github.com/frictionlessdata/datapackage/pull/12#issuecomment-1881247519 as I think it will be good to have everything related to the versioning discussion in one place.
Why is it structurally non-breaking for implementations?
By structurally breaking change I mean something that will fail all the implementations on the next nightly-build. It will happen if we do a breaking change to one of JSON Schema profiles e.g. changing schema.fields
to be a mapping instead of an array.
Unfortunately, as the specs in some places were written very broadly, we also have a grey
zone. Maybe finiteNumber
was a bad example of it but something like any
format for dates. The specs just say that it's implementation specific so e.g. changing this will be implementation-specific breaking.
So in my head for v2 I have these tiers (and my opinion on change possibility):
Also, it's the specifics of working on standards that many kinds of new features (a property added) don't have full forward-compat
as e.g. a new constraint will kind of break validation completeness of the current implementations. So maybe this kind of changes might differentiate major and minor in our case. E.g.:
source.version
-> minor as it's a part of JSON Schema validationconstraints.inclusiveMaximum
-> major as it requires implementations updates and affects validation completeness @roll, since you wanted everything related to versioning be part of this discussion, I'm also referring to this comment by @khughitt and me regarding implementations retrieving or detecting the version of the Data Package spec:
Tangential but, this makes me wonder whether it would make sense to modify the validation machinery to support validating against earlier versions of the spec?
That would be useful, but rather than implementations (or users) guessing what version of the spec was used for a
datapackage.json
, it will likely be good if that was indicated. I don't think this is currently possible?
I think on the Standard side, we need to decide whether we provide standard version information for an individual descriptor e.g. as proposed here https://github.com/frictionlessdata/specs/issues/444
I think every implementation is free to decide how to handle it as it's just about resources. E.g. some implementation can have a feature that it validates against versions X, Y, and Z. And some just against Y
Note, that currently we consider datapackage.json
to be versionless
I think the rules for changing the Data Package spec should be declared (on the spec website or elsewhere). I currently find it difficult to assess if PR follow the rules. Here's a first attempt:
(in line with @khusmann's statement that software is easier to update than data artifacts https://github.com/frictionlessdata/specs/issues/858#issuecomment-1885401944)
datapackage.json
that is valid MUST NOT becoming invalid in the future.datapackage.json
MAY be invalid because a software implementation does not support the latest version of the specification (yet).Because of these rules datapackage.json
does not have to indicate what version of Data Package it uses (i.e. it is versionless). Implementations have no direct way of assessing the version (even though this would make it easier https://github.com/frictionlessdata/specs/issues/858#issuecomment-1909977780 it is not something that we can require from data publishers, imo).
type
type
(array) @roll you want to avoid this as a rule, but it does offer flexibility, cf. https://github.com/frictionlessdata/specs/issues/804#issuecomment-1913486995required
enum
enum
. Example: https://github.com/frictionlessdata/specs/pull/809enum
valuesenum
valuesformat
. Example: does https://github.com/frictionlessdata/datapackage/pull/23 align with this?format
pattern optionsformat
pattern optionsdatapackage.json
invalid (because of general rule 2). Example: https://github.com/frictionlessdata/datapackage/pull/24required
datapackage.json
invalid (because of general rule 1)Thanks for taking the time to put this together, @peterdesmet! This seems like a great idea..
I think it would be useful to use this as a starting point for a description of the revision process in the docs.
I'll create a separate issue so that it can be tracked separately from the issue discussion here.
My 2 cents here:
profile
attribute could refer to this https://raw.githubusercontent.com/frictionlessdata/specs/v2.0/package.json
. If profile == tabular-data-package
, this would mean this is a v1 datapackage._cache
property could also be used to cache the jsonschema@peterdesmet
Regarding provisional properties, I think we have an even more eloquent solution for example using a special Data Package Draft/Next extension (or a profile per feature) where we can test new features and ideas without actually affecting the core specs itself. Users will just need to use a draft
Data Package profile to join testing.
And then if we have a established release cycle we can merge tested features in the core specs based on schedule. Actually using this approach feature development can be even decentrilized
@roll sounds promising, would have to see it in action to fully understand. 😄
Hi all, the communication on the Frictionless specs update names it
v2
(version 2, see also #853 #857). The announcement blog post also states (emphasis mine):I'm very happy no breaking changes will be introduced, I think that should be a guiding principle. But following semantic versioning, the specs update should then be a minor version. Given that all major specs† are currently v1, I would argue that the upcoming release is
v1.1
.I understand that v2 indicates that there is serious momentum behind the current development (dedicated project, new website). But to anyone who's not closely following Frictionless v2 seems like it is a major overhaul without backward compatibility. A
v1.1
would (correctly) communicate that while Data Package is now its own standard and most things will work as expected. It also sets us on a path to incorporate more changes in future (minor) releases.Sidenote: will we version Data Package (the collection of standards) as a whole or will the 4 standards be versioned separately (current approach)? I see benefits and downsides with both approaches.
†All major specs are v1: Data Package, Tabular Data Package, Data Resource, Tabular Data Resource and Table Schema. The exception is CSV Dialect which is v1.2, but it seems this one is renamed to
Table dialect
so one could argue to start over. Some of the other experimental specs (like Fiscal Package or Views) have other version numbers like 1.0-rc.1 and 1.0-beta.