dat-ecosystem-archive / dat.json

The WIP specification for the dat.json meta format [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]
MIT License
27 stars 4 forks source link

Consider datapackage.json? #3

Open pwalsh opened 7 years ago

pwalsh commented 7 years ago

Would be happy to know why start a new format and not use datapackage.json or similar? We at Open Knowledge were definitely hoping to be able to integrate our higher-level tooling with dat via a common metadata format.

okdistribute commented 7 years ago

Hey pwalsh! we definitely are being compatible with datapackage.json, but we might need some fields that are specific to dat and aren't sure yet what those might be. Right now this is experimental. It might very well be the case that we won't need any special fields and then we will also encourage using datapackage.json as the metadata file name.

pwalsh commented 7 years ago

Hey @karissa no problem. It can be easier to go on your own, but I have a strong interest in ensuring we can remain compatible here, so please do keep in touch on it.

Just for information: datapackage.json allows for custom fields beyond what is declared in the spec, and while we have not formalised the "profiles" spec, we do have custom "profiles" that define additional MUST, SHOULD and MAY properties based on particular use cases, such as Tabular Data Package and Fiscal Data Package. So, there would be no problem to either:

joehand commented 7 years ago

Hi @pwalsh, I was at Dan Fowler's presentation yesterday on data packages and been thinking about this a bit since. I'm definitely excited about datapackage.json but also a bit hesitant to use it as our metadata file.

First, I want to note that a user could always have a datapackage.json file in their dat and share that, it will by synced with the rest of the files. So Dats will always be compatible with data packages regardless of if we have a special dat.json metadata file. The question then, is if the dat.json will be compatible (which is maybe what you meant).

In my mind a "dat" is not necessarily synonymous with a "datapackage" as defined in the spec:

A Data Package (or DataPackage) is a coherent collection of data and possibly other assets in a single ‘package’

We could have a single dat that has many datapackages (for example, we could distribute github.com/datasets all in a single dat). We will also have many use cases for users that aren't sharing data or packages as described in the spec, but collections of files.

The other hesitation comes from the resources being a required field. That will not be something we want to programmatically add. And requiring the user to do that will cause too much friction. We may be able to have our special dat profile, but remove a field from the required spec seems a bit odd for the custom profile idea.

pwalsh commented 7 years ago

Hi,

Resources is just a collection of files, so it could be 'pointers' to other 'packages' (possibly represented by their descriptor as an entry point) or 'just' files: the descriptor and the spec don't care about what is pointed to as a 'resource'.

Anyway, sure, if your use cases require something that diverges too far from what datapackage.json tries to do, it makes total sense to be driven by your use cases. However, the particular cases you describe above fit with it pretty exactly as I see.

I guess I jumped in here too early though, and we can come back to this as you iterate on what is being done here, as relevant.

juliangruber commented 7 years ago

I share the concern about the resources field too. Even if we made it just be a list of files, this would need to be included into the tooling everywhere, so it's maintained automatically. The semantic difference between a datapackage and a dat doesn't bother me too much, but yeah is also one point.