Swirrl / datahost-prototypes

Eclipse Public License 1.0
0 stars 0 forks source link

Find better names for entities in the publishing/versioning model #321

Open RickMoynihan opened 10 months ago

RickMoynihan commented 10 months ago

We currently have

(catalogue) -> [dataset series] -> [release] -> [revision] -> [commit]

The entities inside [ ] are the ones in scope for renaming.

Release/Revision are a little too close in terminological terms and it doesn't help that they both begin with Re.

Dataset Series / Series - may also be a little unfamiliar.

Also we may wish to consider a new name for commit; to perhaps avoid confusion with git terminology. In particular in git a commit identifies the state of the whole repository, i.e. it references everything prior to it. For us it doesn't, for us a commit is more like a delta with metadata/message; and the revision identifies the state of the whole repository.

Some options (unchanged entities are in ()):

  1. (catalogue) -> [dataset series] -> [release] -> [edition] -> (commit)

Like what we have at the minute except we rename revision to edition.

2.1 (catalogue) -> [publication] -> [dataset] -> [edition] -> (commit) 2.2 (catalogue) -> [publication] -> [dataset] -> [edition] -> [delta]

Make publiction be the catalogued entry; dataset be the stable release with schema, and edition is a precise version of that with ammendments.

3.1 (catalogue) -> [dataset] -> [publication] -> [revision] -> (commit) 3.2 (catalogue) -> [dataset] -> [publication] -> [edition] -> (commit)

An inversion of 2, make dataset be the catalog entry, and publication be the packaged/stable release of it with schema/methodology. It then has a revision or edition for locking any ammendments, and they give you access to the commit/delta log.

Discussion

Of the above I think there are arguments for 2 and 3.2. Initially I leaned towards 3.2 as my preference because we've always thought of a publication as a collection of related resources; and that they should be grouped with the stable package... i.e. the methodology/schema should be stable within a publication.

However I think upon reflection 2.2 might be the better choice; there is no reason a dataset can't have supporting methodology/schema. A publication could represent the series, the only thing I dislike is that publication sounds like an artifact not a series... arguably dataset has this problem too.

ricroberts commented 10 months ago

i think i prefer 2.1 or 2.2.

Happy to think of an alternative name for publication (but dunno what)!