belaudiobooks / website

Catalog of belarusian audiobooks
GNU General Public License v3.0
32 stars 6 forks source link

Sync database with github #78

Closed nbeloglazov closed 10 months ago

nbeloglazov commented 1 year ago

Regularly sync database to github.

frombrest commented 1 year ago

As far as I understand you want to sync data for testing purposes. If so, does it make sense to simply use pg_dump utility to create .sql file with data? To run tests we will be able to start docker container and fill it up by the .sql dump. If you worry about sensetive data, there are two options to manage that:

  1. Using --exclude-table-data=pattern param for pg_dump utility we can copy schema but do not copy data of user_* and django_admin_log tables. This is probably best option because after restoration dump into test db instance we will be able to test newly introduced db migrations during PR's pipelines. In addition we can insert syntetic superuser to user_user table to have have possibility quickly and easily run application locally.

  2. Run pg_dump twice. First run with --schema-only param to dump db structure, and second run with --data-only in combination with --table=pattern or --exclude-table=pattern params to copy data only for books_* and django_migrations tables. This option is very similar, but I think a bit more secured since we do not copy data from all django service tables except of migration history. Theoretically db migrations should work too.

pg_dump doc: https://www.postgresql.org/docs/current/app-pgdump.html

nbeloglazov commented 1 year ago

I had a few reasons in mind when organizing code with data repo:

  1. Production data when starting server locally for development. Easier to debug issues and don't need to worry about having fake data.
  2. Readily available data for tests. No need to create fake data.
  3. JSON format allows for easier local manipulation/migration of data. Useful when the schema is significantly changed and it's hard to do using migrations. Though I don't think we've had a case like that.

Though I'm changing (2) in #90 to use fake data. For (1) I also think having a minimal fixture with 10-20 books with different configurations will be better long term.

So I don't think we should prioritize this issue at the moment.