gbv / bartoc.org

Source code of BARTOC.org user interface
https://bartoc.org/
23 stars 10 forks source link

Track changes in dumps in Git? #117

Open stefandesu opened 3 years ago

stefandesu commented 3 years ago

I see that we are doing daily dumps of all the data in our main instance, but as far as I can see, the data is not easily browsable and only the latest dump is linked on the site (https://bartoc.org/data/dumps/latest.ndjson).

Should we maybe have a separate Git repository that tracks the latest dump so that we can use the Git history to refer to older versions of the dump? Not sure if it should be all vocabularies in one file like the current dump or one file per vocabulary (which would allow more granular tracking of changes, but we'd have a ton of files).

What do you think?

nichtich commented 3 years ago

There are two use cases of data dumps:

  1. Provide dumps for download and analysis. For this it is enough to keep the last n days and then weekly or monthly snapshots
  2. Show what has been changed
    • A list of most recent changes (who created or modified which record when)
    • filterable by contributor and/or by record
    • A diff between two versions of a record

Maybe split the use case and first solve 1 by providing simple dump files.

At the backend versioning could be supported by versioning in JSKOS-Server but this might be too complex, so just using git might be a good option and one file per record allows for better querying .

Script bin/dump.js contains some versioning capabilities but only at the command line.