alphagov / trade-tariff-backend

Enabling the population and distribution via API of UK Customs tariffs and duties
MIT License
7 stars 6 forks source link

Data migrations #91

Closed saulius closed 11 years ago

saulius commented 11 years ago

This change is for https://www.pivotaltracker.com/story/show/43132661

Separates schema migrations from data migrations, as schema migrations with data are usually problematic when we're recreating database snapshot from scratch. They fill in data before Taric and CHIEF imports and that can cause integrity issues later. Data migrations should be run after all schema changes and data imports are applied.

This adds several new rake commands:

Check current db state:

bundle exec rake db:data:status
[APPLIED] Convert MeasurementUnit code from ASX to ASV
[APPLIED] Rename San Marino to Italy on national measures: This is for excluded geographical areas. Related to commit #3951ff024bc , invalid original data.
[APPLIED] Remove national measures of type SPL
[APPLIED] Fix CHIEF le_tsmps
[APPLIED] Populates HiddenGoodsNomenclature with data: Certain GoodsNomenclatures should be hidden in UK Tariff.
[APPLIED] Updates Footnote descriptions 04005 and 04018 with up to date data
[APPLIED] Add hydrocarbon oils footnote 05976
[APPLIED] Fix CHIEF hectolitre mappings
[APPLIED] Updates Footnote description 04003

Rollback last applied migration (based on timestamp in filename):

bundle exec rake db:data:rollback
[ROLLBACK] Updates Footnote description 04003

Apply all pending migrations:

bundle exec rake db:data:migrate
[APPLIED] Updates Footnote description 04003

Rollback and apply last migration:

bundle exec rake db:data:redo
[ROLLBACK] Updates Footnote description 04003
[APPLIED] Updates Footnote description 04003

Things to note/questions:

matthewford commented 11 years ago

we would only need migrations to rewrite history, e.g add in missing data, so not too fussed about that.

But could you elaborate on the second point, what's the issue there? If we change the schema the migration will fail? Whitehall suffers from models changes, where past migrations are just broken, so commenting out is an option, but i'm not sure I understand if that the issue you're raising.

saulius commented 11 years ago

What should we do about old migration files that cover data changes, e.g. https://github.com/alphagov/trade-tariff-backend/blob/master/db/migrate/20130418141944_change_footnote_04003.rb ? These are now duplicated in data migrations and I would actually want to not run them with schema changes, so should I comment them and keep the files?

matthewford commented 11 years ago

Hrm.. sounds like we should move them into data migrations, delete the old migrations, and then repopulate the db, unless they cannot be run twice, I dont see the issue, otherwise we would need to reload the db.

saulius commented 11 years ago

So they are already in data migrations. I cannot delete the files because they are in schema_migrations table and Sequel will then complain before each run. So I'm thinking perhaps just comment out their content for now so #up and #down are empty?

matthewford commented 11 years ago

If we need to reload the db anyway, can we remove and recreate the db? If not then commenting out is fine.

saulius commented 11 years ago

Ah yes, we can remove them if we will be reloading anyway. I will update the pull request and ping you.

saulius commented 11 years ago

@matthewford I pushed a few commits that remove data related migrations for schema migrations all together. But with these changes we will now definitely need to reload the snapshot.

matthewford commented 11 years ago

Right, merging with the view we do a full reload.