Closed kirahowe closed 1 year ago
@kiramclean: I'm not sure 1. is framed quite right as an issue; though it's useful context on what we mean by a TX log, though it's worth noting the format you suggest assumes #253 (which we might not want to do longer term on the main branch).
Essentially 2. is how the API supports reading the TX log 👍 we definitely need that. But we need at least two more issues in my mind to cover how we create that TX log by POST
ing TX format data; and then also how we create that TX data/delta.
So let me propose creating the following issues:
POST /my-data/my-release Content-Type: text/csv, Accept: application/x-datahost-tx-csv
to consult the latest revision/commit and calculate the delta from the whole file provided in user schema. The return value is then in application/x-datahost-tx-csv
and should contain just the subsets of rows that are appended/deleted/corrected in TX format. This will depend on deciding the revision commit/change format (for the main
branch), but if we're to apply this to the ons branch then on that branch at least it would want to follow this decision here #253. For main we will need to do #256 first. NOTE we don't need a /delta
slug on the route because the Accept
header identifies it as being a delta.POST /my-data/my-release Content-Type: application/x-datahost-tx-csv
for taking a delta (sequence of commits created via 1.
) and appending them to the TX log.2.
). The only point I'd make relating to those is that the GET /my-data/release
routes should all redirect to the latest revisions and delegate to them to return the actual data.I hope that makes sense, I'm not around this afternoon, but can sync up on monday if needed.
I have broken this issue up into three as described in my previous comment.
The issues are:
This development should happen against main and not the (soon to be) frozen ONS branch.
This is an issue to implement Rick's updated delta tool proposal here. This issue is for discussion and improvement leading up to (maybe) implementing this in the future.
There are likely a few issues here to break out if/when we get to a point of implementing this.
1. Accrete changes as one continuous/flat log
All file uploads ("commits") get parsed and added to a flat log of changes on a release, as opposed to grouping them (as one append, one retraction, one correction) per revision. E.g. corrections are not necessarily added verbatim, they would be broken down into the logical operations required to apply them, i.e. the middle two rows here (one append and one delete) make up a correction:
2. Implement new routes to retrieve these transaction logs
All routes to the data will support accessing both schemas/representations of the data via content neg, e.g.
GET /my-data/my-release Accept: text/csv
GET /my-data/my-release Accept: application/x-datahost-tx-csv
Revisions would work the same way
GET /my-data/my-release/revision/1 Accept: text/csv
GET /my-data/my-release/revision/1 Accept: application/x-datahost-tx-csv