National quantities - Githubissues

saulius commented 11 years ago

This adds support for national quantities. Adds two new chief tables (chief_comm and chief_tbl9). COMM table is a join table between goods_nomenclatures and TBL9. TBL9 contains descriptions for national quantity units.

National quantities apply to national excise measures. Commodities that have national quantities (for checking after deploy):

mysql> SELECT DISTINCT(goods_nomenclatures.goods_nomenclature_item_id) FROM chief_comm INNER JOIN goods_nomenclatures ON chief_comm.cmdty_code = goods_nomenclatures.go
ods_nomenclature_item_id INNER JOIN measures ON goods_nomenclatures.goods_nomenclature_sid = measures.goods_nomenclature_sid WHERE uoq_code_cdu2 IS NOT NULL and uoq_co
de_cdu3 IS NOT NULL AND le_tsmp IS NULL AND goods_nomenclatures.validity_end_date IS NULL AND goods_nomenclatures.validity_start_date <= NOW() AND measures.measure_typ
e IN ('DAA','DAB','DAC','DAE','DAI','DBA','DBB','DBC','DBE','DBI','DCA','DCC','DCE','DCH','DDJ','DDA','DDB','DDC','DDD','DDE','DDF','DDG','DEA','DFA','DFB','DFC','DGC'
,'DHA','DHC','DHE','EAA','EAE','EBJ','EBA','EBB','EBE','EDA','EDB','EDE','EDJ','EEA','EEF','EFJ','EFA','EGA','EGB','EGJ','EHI','EIJ','EIA','EIB','EIC','EID','EIE','FAA
','FAE','FAI','FBC','FBG','LAA','LAE','LBJ','LBA','LBB','LBE','LDA','LEA','LEF','LFA','LGJ');
+----------------------------+
| goods_nomenclature_item_id |
+----------------------------+
| 2207200090                 |
| 2205909000                 |
| 2205109000                 |
| 2207100090                 |
| 2207100011                 |
| 2207200019                 |
| 2207200011                 |
| 2207100019                 |
| 2204299790                 |
| 2204299651                 |
| 2204219551                 |
| 2204219751                 |
| 2204219895                 |
| 2204299495                 |
| 2204299451                 |
| 2204109300                 |
| 2204219851                 |
| 2206005900                 |
| 2204210800                 |
| 2204219695                 |
| 2204219651                 |
| 2206003900                 |
| 2204299695                 |
| 2204210900                 |
| 2204109800                 |
| 2204219495                 |
| 2204299895                 |
| 2204219451                 |
| 2204299390                 |
| 2204210700                 |
| 2204291000                 |
| 2204210600                 |
| 2204109600                 |
| 2204299851                 |
| 2204109400                 |
| 2206008900                 |
| 2204299590                 |
| 2204219351                 |
| 2710125110                 |
| 2710124990                 |
| 2710124190                 |
| 2710124510                 |
| 2710124110                 |
| 2710124590                 |
| 2710125910                 |
| 2710125190                 |
| 2710125990                 |
| 2710124910                 |
| 2208909919                 |
| 2208909911                 |
| 2106902000                 |
| 2208909110                 |
| 2205101000                 |
| 2205901000                 |
| 2204301000                 |
| 2206001000                 |
| 2710129010                 |
| 2710121510                 |
| 2710122591                 |
| 2710121110                 |
| 2710127010                 |
| 2710122110                 |
| 3814009070                 |
| 2710123110                 |
+----------------------------+
64 rows in set (0.01 sec)

There may be fewer or more of these, I'm not checking validity dates in this query, but it is a good reference.

There are two problems however:

The format of national quantity unit description is horrible, e.g.:

[2] pry(main)> Chief::Tbl9.where(tbl_type: 'UNOQ').map(&:tbl_txt).join("|")
=> "QUINTAL|1000LB|CENTAL|OZ TROY|HECTOGRAM|POUND GROSS|50KG|SHORT STANDARD|LONG HUNDRED|100 SQUARE METRES|100 FEET|DAKAMETRE|36 GALLONS IE BULK BARREL|MILLILITRE IE CU CENTIMETRE|DEKALITRE|US GALLON|PILED CUBIC FATHOM|PILED CUBIC METRE|CUBIC DECIMETRE|MILLIMETRE|CENTIMETRE|DECIMETRE|METRE|INCH|FOOT|YARD|SQUARE DECIMETRE|SQUARE METRE|SQUARE INCH|SQUARE FOOT|LITRE|SQUARE YARD|CUBIC METRE|HECTOLITRE|CENTILITRE|CUBIC INCH|CUBIC YARD|CUBIC FOOT|PINT|GALLON|STANDARD|CARAT|GRAMME|KILOGRAMME|KG GROSS|TONNE|KILOGRAMME OF TOTAL ALCOHOL|GROSS TONNAGE|DISPLACEMENT TONNAGE|POUND|CCT CARRYING CAPACITY IN TONNE|CWT|OUNCE|100 KG|LB TROY|KILOWATT|GRAMME FISSILE ISOTOPES|DOZEN|GROSS|NUMBER OF PAIRS|NUMBER|NUMBER OF ROLLS|DOZEN PAIR|100 PACKS|HUNDRED|THOUSAND|1000 CUBIC METRES|KILOGRAMME 90 PERCENT DRY|KILOGRAMME OF K20|KILOGRAMME OF KOH|KILOGRAMME OF NAOH|KILOGRAMME OF U|KILOGRAMME OF P2O5|NUMBER OF SETS|ALCOHOL BY VOLUME|DEGREES PERCENTAGE VOLUME|NUMBER OF KITS|1000 TABLETS|KILOGRAMME OF WO3|NUMBER OF FLASKS|100 INTL UNITS|KILOGRAMME OF H2O2|GRAM FINE GOLD CONTENT|LITRE OF ALCOHOL|90% TONNE DRY|CURIE|PROOF GALLON|MILLION INTL UNITS POTENCY|HECTOMETRE|TON|% SUCROSE CONTENT/ 100 KG NET|% LACTIC DRY MATTER/ 100 KG PRODUCT|100 KG DRY PRODUCT|% ADDED SUGAR/ 100 KG|PER% OF POTENTIAL ALCOHOLIC STRENGTH PER HECTOLITRE|% LACTIC MATTER/ 100 KG PRODUCT|% ACTUAL ALCOHOLIC STRENGTH/ HECTOLITRE|STANDARD LITRE( OF HYDROCARBON OIL)|KILOGRAM DRAINED NET WEIGHT|HECTOKILOGRAM DRAINED NET WEIGHT|KILOGRAMME OF NITROGEN|1000 Litres|KILOGRAM OF METHYL AMINES|HECTOKILOGRAM NET DRY MATTER( 100 kg/ net mas)|KILOGRAM CHOLINE CHLORIDE|KILOWATT HOUR|1000 KILOWATT HOUR|KILOTONS|MICRO LITRES OR CUBIC MM|MILLIGRAMS|TONNE OF FUEL|BIO DIESEL|KILOMETERS|BIOETHANOL CONTENT|BIOETHANOL CONTENT (WEIGHT IN TONNES)"

Uppercase, weird spacing. I don't think we should be altering CHIEF data, but we could add 'normalizer' in the frontend.

The import of the new table data. Tbl9 contains 37688 initial entries and 1985 entries from update files. Comm contains 127506 initial entries and 7877 entries from update files. We are including ALL entries from these tables since we may possibly need something else from here for further development. I don't think such amounts of data really qualifies for migration. Shall I create a new DB snapshot? Other ideas?

jabley commented 11 years ago

Pivotal story?

saulius commented 11 years ago

Sorry forgot it, it's for this one https://www.pivotaltracker.com/story/show/38026757.

jabley commented 11 years ago

It feels like there are a few changes here; schema changes related to the new feature, code to take advantage of the new data, and data changes which might need further additions.

I'll have a chat with other people here about how they're doing bulk data upload like that; whitehall and EFG have a separation between schema migrations and data migrations. It feels like we might want something like that to bulk slurp up some content. I'm ambivalent about large data files being in git, but we probably want to decide how we'd like that to work.

matthewford commented 11 years ago

This is more than just a data migration we're going back in time to add in new seed data. There not that many options, either rebuild the entire things from scratch or replace the db with an updated snapshot which as been rebuilt for these tables.

saulius commented 11 years ago

I added a story for data migrations recently https://www.pivotaltracker.com/story/show/43132661. I want to separate data migrations from schema changes. I have some prototype quality code already that does this.

https://gist.github.com/54e0f522d0b54d96d2aa

saulius commented 11 years ago

I uploaded a snapshot for this (tariff_development-2013-01-30_national_quantities.sql.bz2) in case we don't find other solution. I made it locally on development database so find and replace is needed.

jabley commented 11 years ago

It looks like we'll have to do this as a new DB snapshot.

Can you sketch out how you think the deployment will proceed?

I'm wondering in particular if we need to have code that will run with both the existing schema and new schema to begin with, then do the database update, then remove the code to work with the old schema.

saulius commented 11 years ago

@jabley I think this deployment does not have to be that complicated. National quantity assignment is not part of the transformation process, so we just need records in TBL9 and COMM. So I think I could just dump those two tables (~45MB). So then we would just need to migrate the schema, import this two table dump and check how things are looking. In case of a problem just rollback.

Regarding the deploys when we actually need to import a new snapshot I tend to lean towards the Blue-Green approach you mentioned. What's unfortunate is that MySQL does not support database renaming in 2013. Here's what I what I found to be closest to renaming, but haven't tried running it yet http://blog.shlomoid.com/2010/02/emulating-missing-rename-database.html. So in this case I would imagine the process to go as follows:

Create blank database.
Import new snapshot to the newly created database.
Shutdown unicorns and rely on Akamai.
Rename old database, rename new database to production.
Bring up unicorns.
Bust cache and try to see if updating was successful.
If it works, just drop the old db. If it does not shutdown unicorn, drop current database and rename old version to production. And bring up unicorn.

matthewford commented 11 years ago

can we rebase this on master?

saulius commented 11 years ago

Sure, just did that.

matthewford commented 11 years ago

[x] Need to load the tables into preview

matthewford commented 11 years ago

@sauliusg do we have a dump of just the missing tables to load into preview?

saulius commented 11 years ago

I uploaded a snapshot called tariff_development-national_quantities.sql.bz2. Contains just the inserts to those two tables. Exported with:

mysqldump -u root --no-create-db --no-create-info tariff_development chief_comm chief_tbl9 > tariff_development-national_quantities.sql

Btw, there is related PR for the frontend https://github.com/alphagov/trade-tariff-frontend/pull/52.

matthewford commented 11 years ago

@jabley we just need to load the national quantities dump onto production for deployment now. Testing on preview

alphagov / trade-tariff-backend

National quantities #61

There are two problems however: