alphagov / trade-tariff-backend

Enabling the population and distribution via API of UK Customs tariffs and duties
MIT License
7 stars 6 forks source link

National quantities #61

Closed saulius closed 11 years ago

saulius commented 11 years ago

This adds support for national quantities. Adds two new chief tables (chief_comm and chief_tbl9). COMM table is a join table between goods_nomenclatures and TBL9. TBL9 contains descriptions for national quantity units.

National quantities apply to national excise measures. Commodities that have national quantities (for checking after deploy):

mysql> SELECT DISTINCT(goods_nomenclatures.goods_nomenclature_item_id) FROM chief_comm INNER JOIN goods_nomenclatures ON chief_comm.cmdty_code = goods_nomenclatures.go
ods_nomenclature_item_id INNER JOIN measures ON goods_nomenclatures.goods_nomenclature_sid = measures.goods_nomenclature_sid WHERE uoq_code_cdu2 IS NOT NULL and uoq_co
de_cdu3 IS NOT NULL AND le_tsmp IS NULL AND goods_nomenclatures.validity_end_date IS NULL AND goods_nomenclatures.validity_start_date <= NOW() AND measures.measure_typ
e IN ('DAA','DAB','DAC','DAE','DAI','DBA','DBB','DBC','DBE','DBI','DCA','DCC','DCE','DCH','DDJ','DDA','DDB','DDC','DDD','DDE','DDF','DDG','DEA','DFA','DFB','DFC','DGC'
,'DHA','DHC','DHE','EAA','EAE','EBJ','EBA','EBB','EBE','EDA','EDB','EDE','EDJ','EEA','EEF','EFJ','EFA','EGA','EGB','EGJ','EHI','EIJ','EIA','EIB','EIC','EID','EIE','FAA
','FAE','FAI','FBC','FBG','LAA','LAE','LBJ','LBA','LBB','LBE','LDA','LEA','LEF','LFA','LGJ');
+----------------------------+
| goods_nomenclature_item_id |
+----------------------------+
| 2207200090                 |
| 2205909000                 |
| 2205109000                 |
| 2207100090                 |
| 2207100011                 |
| 2207200019                 |
| 2207200011                 |
| 2207100019                 |
| 2204299790                 |
| 2204299651                 |
| 2204219551                 |
| 2204219751                 |
| 2204219895                 |
| 2204299495                 |
| 2204299451                 |
| 2204109300                 |
| 2204219851                 |
| 2206005900                 |
| 2204210800                 |
| 2204219695                 |
| 2204219651                 |
| 2206003900                 |
| 2204299695                 |
| 2204210900                 |
| 2204109800                 |
| 2204219495                 |
| 2204299895                 |
| 2204219451                 |
| 2204299390                 |
| 2204210700                 |
| 2204291000                 |
| 2204210600                 |
| 2204109600                 |
| 2204299851                 |
| 2204109400                 |
| 2206008900                 |
| 2204299590                 |
| 2204219351                 |
| 2710125110                 |
| 2710124990                 |
| 2710124190                 |
| 2710124510                 |
| 2710124110                 |
| 2710124590                 |
| 2710125910                 |
| 2710125190                 |
| 2710125990                 |
| 2710124910                 |
| 2208909919                 |
| 2208909911                 |
| 2106902000                 |
| 2208909110                 |
| 2205101000                 |
| 2205901000                 |
| 2204301000                 |
| 2206001000                 |
| 2710129010                 |
| 2710121510                 |
| 2710122591                 |
| 2710121110                 |
| 2710127010                 |
| 2710122110                 |
| 3814009070                 |
| 2710123110                 |
+----------------------------+
64 rows in set (0.01 sec)

There may be fewer or more of these, I'm not checking validity dates in this query, but it is a good reference.

There are two problems however:

jabley commented 11 years ago

Pivotal story?

saulius commented 11 years ago

Sorry forgot it, it's for this one https://www.pivotaltracker.com/story/show/38026757.

jabley commented 11 years ago

It feels like there are a few changes here; schema changes related to the new feature, code to take advantage of the new data, and data changes which might need further additions.

I'll have a chat with other people here about how they're doing bulk data upload like that; whitehall and EFG have a separation between schema migrations and data migrations. It feels like we might want something like that to bulk slurp up some content. I'm ambivalent about large data files being in git, but we probably want to decide how we'd like that to work.

matthewford commented 11 years ago

This is more than just a data migration we're going back in time to add in new seed data. There not that many options, either rebuild the entire things from scratch or replace the db with an updated snapshot which as been rebuilt for these tables.

saulius commented 11 years ago

I added a story for data migrations recently https://www.pivotaltracker.com/story/show/43132661. I want to separate data migrations from schema changes. I have some prototype quality code already that does this.

https://gist.github.com/54e0f522d0b54d96d2aa

saulius commented 11 years ago

I uploaded a snapshot for this (tariff_development-2013-01-30_national_quantities.sql.bz2) in case we don't find other solution. I made it locally on development database so find and replace is needed.

jabley commented 11 years ago

It looks like we'll have to do this as a new DB snapshot.

Can you sketch out how you think the deployment will proceed?

I'm wondering in particular if we need to have code that will run with both the existing schema and new schema to begin with, then do the database update, then remove the code to work with the old schema.

saulius commented 11 years ago

@jabley I think this deployment does not have to be that complicated. National quantity assignment is not part of the transformation process, so we just need records in TBL9 and COMM. So I think I could just dump those two tables (~45MB). So then we would just need to migrate the schema, import this two table dump and check how things are looking. In case of a problem just rollback.

Regarding the deploys when we actually need to import a new snapshot I tend to lean towards the Blue-Green approach you mentioned. What's unfortunate is that MySQL does not support database renaming in 2013. Here's what I what I found to be closest to renaming, but haven't tried running it yet http://blog.shlomoid.com/2010/02/emulating-missing-rename-database.html. So in this case I would imagine the process to go as follows:

  1. Create blank database.
  2. Import new snapshot to the newly created database.
  3. Shutdown unicorns and rely on Akamai.
  4. Rename old database, rename new database to production.
  5. Bring up unicorns.
  6. Bust cache and try to see if updating was successful.
  7. If it works, just drop the old db. If it does not shutdown unicorn, drop current database and rename old version to production. And bring up unicorn.
matthewford commented 11 years ago

can we rebase this on master?

saulius commented 11 years ago

Sure, just did that.

matthewford commented 11 years ago
matthewford commented 11 years ago

@sauliusg do we have a dump of just the missing tables to load into preview?

saulius commented 11 years ago

I uploaded a snapshot called tariff_development-national_quantities.sql.bz2. Contains just the inserts to those two tables. Exported with:

mysqldump -u root --no-create-db --no-create-info tariff_development chief_comm chief_tbl9 > tariff_development-national_quantities.sql

Btw, there is related PR for the frontend https://github.com/alphagov/trade-tariff-frontend/pull/52.

matthewford commented 11 years ago

@jabley we just need to load the national quantities dump onto production for deployment now. Testing on preview