Swirrl / datahost-prototypes

Eclipse Public License 1.0
0 stars 0 forks source link

Change how changes and the change-log works #246

Open RickMoynihan opened 11 months ago

RickMoynihan commented 11 months ago

The change log on the API is currently quite opaque.

e.g.

In a series revision with two commited appends requesting the list of revisions with GET /data/crimes-3021/releases-2020/revisions Accept: application/ld+json returns:

{
  "contents": [
    {
      "dcterms:description": "A second revision",
      "@type": "dh:Revision",
      "dh:appliesToRelease": "https://example.org/data/crimes-3021/releases/2020",
      "dcterms:title": "Rev 2",
      "dh:hasChange": "https://example.org/data/crimes-3021/releases/2020/revisions/2/changes/1",
      "@id": "crimes-3021/releases/2020/revisions/2"
    },
    {
      "@type": "dh:Revision",
      "dcterms:title": "Rev 1",
      "dcterms:description": "A test revision 1",
      "dh:hasChange": "https://example.org/data/crimes-3021/releases/2020/revisions/1/changes/1",
      "dh:appliesToRelease": "https://example.org/data/crimes-3021/releases/2020",
      "@id": "crimes-3021/releases/2020/revisions/1"
    }
  ],
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "dh": "https://publishmydata.com/def/datahost/",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dcterms": "http://purl.org/dc/terms/",
    "csvw": "http://www.w3.org/ns/csvw#",
    "appropriate-csvw": "https://publishmydata.com/def/appropriate-csvw/",
    "@base": "https://example.org/data/",
    "contents": {
      "@id": "dh:collection-contents",
      "@container": "@set"
    }
  }
}

There are a few issues here:

  1. The biggest is that we don't know what kind of change we're dealing with (append/retract/correction), that's important information to have in a change log.
  2. It's strange (I think an artifact of the early changesets idea) that the changes are named 1, e.g. /crimes-3021/releases/2020/revisions/1/changes/1. This would be better being appends retracts or corrections.

Additional considerations URI space/routes

I think a better pattern for the routes would be to remove the /changes route and have it be represented by three routes:

  1. GET /data/:series/releases/:release/revisions/:revision-id/:change-type where change-type MUST be either appends, retracts or corrections. These will return the plain delta/data in the user schema.
  2. Rename GET and POST .../revisions to /data/:series/releases/:release/latest, in each case it should ALWAYS redirect to the :revision-id route for the latest revision, whatever the content type.
rosado commented 11 months ago

Each change has s dh:changeKind which specifies whether it is a append/retraction/correction. We should just update db/get-revisionsto include that bit information (e.g. see db/get-changes-info-query).

rosado commented 11 months ago

As for the .../changes route: it makes sense if we were to allow multiple changes per revision. Currently we don't, but if that's the ultimate plan, then I think having a generic 'changes' route is more future proof. Also, forcing the user to supply the 'kind' of change to request it doesn't seem best no matter how many changes per revision we allow:

RickMoynihan commented 11 months ago

As for the .../changes route: it makes sense if we were to allow multiple changes per revision. Currently we don't, but if that's the ultimate plan, then I think having a generic 'changes' route is more future proof.

This was the original idea. To allow revisions to be like stable published states, maybe like PRs in that they can have their own descriptions etc; and the commits within them to describe precisely how you got there. However we removed that layer whilst I was away, and I was never sure if people would like it tbh.

Also, forcing the user to supply the 'kind' of change to request it doesn't seem best no matter how many changes per revision we allow... 1 change per revision: user needs to know which of appends | retractions | corrections to use.

Yes I know what you mean, but that's not quite how I see it. I don't think of the :change-type as being a change-type per say, but rather as just being there to differentiate the data from the metadata, it was redundant but a hint for users. Though this makes me think that we should really just not have /changes (or /appends /retracts corrections slugs/routes at all).

Nor do we need the dh:hasChanges links, because a revision is now essentially a change, it has two representations/parts (metadata in application/json and data in text/csv), but logically a revision is both of those representations combined.

Instead you should just do:

GET /data/:series/releases/:release/revisions
Accept: application/json
{
  "contents": [
    {
      "dcterms:description": "A second revision",
      "@type": "dh:DeleteRevision",
      "dh:appliesToRelease": "https://example.org/data/crimes-3021/releases/2020",
      "dcterms:title": "Rev 2",
      "@id": "crimes-3021/releases/2020/revisions/2"
    },
    {
      "@type": "dh:AppendRevision",
      "dcterms:title": "Rev 1",
      "dcterms:description": "A test revision 1",
      "dh:appliesToRelease": "https://example.org/data/crimes-3021/releases/2020",
      "@id": "crimes-3021/releases/2020/revisions/1"
    }
  ],
...

Which tells you what each revision is (append/retract or correction). Then for each you make a request

GET /data/:series/releases/:release/revisions/1
Accept: text/csv

foo,bar,baz
accidental,data,delete-me
blah,blah,blah

GET /data/:series/releases/:release/revisions/2
Accept: text/csv

accidental,data,delete-me

And use the metadata to tell you how to apply the delta/data.

This use case is better expressed in the tx schema layout of course, and you should be able to ask for those things too (once we've implemented the reified transactions proposal). The reason to do this though is to let people see the deltas in the user schema.