aspiers / book-indices

Indices for music books
29 stars 36 forks source link

research Dolt and DoltHub #39

Open aspiers opened 1 year ago

aspiers commented 1 year ago

I have discovered https://github.com/dolthub/dolt and DoltHub which look like very promising technologies. Perhaps they could be used as an alternative way of storing the data contained here. Certainly a full-blown SQL database would bring a ton of power and flexibility to this project, and when combined with a GitHub-like collaboration model with pull requests, it could be perfect.

aspiers commented 1 year ago

See also my "git sourcing" idea: https://github.com/githubocto/flat/discussions/64

strk commented 2 months ago

I'm not convinced a new format is needed. The biggest value of a project like this is the stability of the format specification, in that it allows multiple projects to depend on it, supporting read and write. Unless I misunderstood what dOLt is about (it isn't clear to me).

Note PostgreSQL supports querying CSV files as if they were tables in the database via Foreign Data Wrappers: https://www.postgresql.org/docs/current/file-fdw.html See for example https://gist.github.com/NikolayS/a819f139c37e0d54ad4a4ca70764f225

aspiers commented 2 months ago

The stability of the schema is largely orthogonal to whether we use CSV or an RDBMS like Dolt. E.g. we could use CSV but create undesirable instability by regularly reordering/renaming columns etc. Or we could use an RDBMS and keep it very stable by never changing the schema.

The main attractions of Dolt are due to RDBMS being much more flexible than CSV, e.g.

Also DoltHub gives us a very nice frontend for free which is specifically designed for decentralized collaboration on data sets, unlike GitHub.

But I admit it would be an increase in complexity too. Another option is introducing CI which does validation on the existing data.