alphagov / trade-tariff-backend

Enabling the population and distribution via API of UK Customs tariffs and duties
MIT License
7 stars 6 forks source link

Tariff moved to Point in Time DB architecture. #74

Closed saulius closed 11 years ago

saulius commented 11 years ago

! DO NOT MERGE !

The name and idea comes from http://goo.gl/gwaWC and http://goo.gl/gc29A (Chapter 8).

Related stories:

  1. https://www.pivotaltracker.com/story/show/44373501 (main story)
  2. https://www.pivotaltracker.com/story/show/43944671
  3. https://www.pivotaltracker.com/story/show/43647015
  4. https://www.pivotaltracker.com/story/show/33493005
  5. https://www.pivotaltracker.com/story/show/43475217

This is a very noisy commit because I had to find and replace every line in CHIEF initial seed file of 22k lines.

Assumptions:

Currently for we were performing all operations in place. Meaning we find a record by primary key, update or destroy it and it's gone. Now there are no physical deletes. We treat inserts, updates and deletions as operations and put them to model_oplog tables. All attributes are persisted at all times. All operations have oid (auto incrementing operation id) column, with unique per table value. To reconstruct the database state as we have now we create views for each table looking similar to:

select t1.* from t_oplog t1 where (t1.oid in (select max(t2.oid) from t_oplog t2 where (t1.primary_key = t2.primary_key) order by t2.oid desc) and (t1.operation <> 'D'))

That gives us latest state of the tables just as we would have been applying updates as before. So model associations, fields etc just keep working as they did.

To make it possible all TARIC models now include sequel oplog plugin which intercepts inserts/updates/deletions and creates an oplog entry instead of actually performing that operation on the view. It also dynamically defines ModelName::Operation Sequel model class that allows us to access the oplog. This will most likely be useful for other related stories.

Conformance validations

At the moment we have conformance validations as Sequel class validations or before_destroy hooks. Since we now we will be able to rollback the updates I tried to enable them and see how it goes. Problem was that Sequel::HookFailed exception or failed presence validation (one out of 4) gives very little info in context of huge Taric update. That is why I wrote a simple validation framework that is meant to be used for conformance validations. In case of error it adds error names and identifiers to model_instance#errors[:conformance] and you can at least know where to look for a fix. I ported almost all validations we had before this change and with all the modifications they are now passing. With these enabled we run for real so to say.

As a good side effect validations are class-wise decoupled from model classes and live separately in app/validators.

There's also a new task/class called TradeTariffBackend::Auditor that traverses all (or selected subset) of models, applies conformance validations on all the records and reports back on the results. This is needed since we will be increasing our conformance suite obviously. It is not meant to be run in CI or productio. It takes 5 hours to traverse everything.

TARIC importer changes

At the moment when performing TARIC updates we operate on Sequel datasets which is very fast but does not instantiate model objects and therefore ignores validations. This has changed as said earlier to perform them so may be a bit slower. On the plus side, less Nokogiri and more Hashes is used so memory consumption of the import process dropped in my tests.

Rollback mechanism

This is yet to be done (TODO section), but is actually a quite simple thing. If we want to go back to 1st of 2013 we have to traverse all Sequel models that have oplog plugin and delete all operations that happened afterwards. Also, we have to delete all CHIEF (chief_mfcm, chief_tame, chief_tamf, chief_comm, chief_tbl9) entries that came with updates after 1st Jan 2013. And the third thing is to set all tariff_updates after that date to pending (or just remove them. thoughts?).

Misc changes

I modified some migrations and column names which is not a good practice. But we will do a full reload (see assumptions) so I think it should be fine. Notably I changed primary keys of most of the tables to have a unique names, because that results in an error when run on PostgreSQL (yes, I tried that too). Also, I changed some names on Measure and added _id suffixes to avoid conflicts with association names.

Bug fixes

TARIC records

Whenever TARIC sends an update it does not include fields that are supposed to be blank (NULL). So if we have a Measure with validity_start_date and validity_end_date and TARIC decides to update the dates leaving validity_end_date unbounded (blank) we won't receive validity_end_date as part of TARIC update record. That is why we now merge default columns with nil values with TARIC update values.

Impact on stories

Retroactive measures #43944671

A retroactively added measure is such measure that has validity_start_date spanning dates before the date it was added. So if we add a Measure on 1st Jan 2013 and it has validity start date set to 1st of Jan 2012 it means it was retroactively added.

We are not tracking addition dates now, so it's not possible to identify such measures. This change added oplog tables and each entry has operation_date so therefore it's trivial to say which Measure was added retroactively.

Commodity change tracking, RSS feeds and information about changes

There is no way we can do these things with current implementation as we don't retain previous versions. It won't be trivial with the oplog tables either, but the implementation is possible. For example for the RSS feed of commodity changes we have to find all entity operations related to that commodity (UNION?), sort them by operation date and figure out how to represent each one of them.

TODO

jabley commented 11 years ago

I feel sorry for whoever will be reviewing this. Oh...

saulius commented 11 years ago

It's mostly noise in app/models. Interesting things in lib/*.

jystewart commented 11 years ago

Do we have a sense of what difference this might make to the rate of growth of the database?

jabley commented 11 years ago

@jystewart fair point, that and performance have already been discussed in chat and we'll be recording the results of our findings somewhere.

KushalP commented 11 years ago

Would it be completely unrealistic to run two versions of the app for ~1 week? You could at least compare the two databases and run similar recovery scenarios against both versions.

saulius commented 11 years ago

Running in parallel sounds great to me. If we would implement on the related stories "Improving the information the alert emails send out after sync" (https://www.pivotaltracker.com/story/show/43475217) we could compare changes without going through db records.

jabley commented 11 years ago

Running in parallel would be tidy. We have a story in the backlog about having more environments to deploy demo features to; having a parallel environment would presumably sit within that.

saulius commented 11 years ago

Screen Shot 2013-02-26 at 1 12 00 PM

This is physical database growth chart by day on the VM. Not sure how useful this is, as InnoDB flushes at intervals if I understand it correctly so the chart growth is not very consistent.

KushalP commented 11 years ago

For which version of the application? Master or this branch?

saulius commented 11 years ago

This branch.

jabley commented 11 years ago

I'm trying to define the requirements for having another environment or two with Infrastructure. Hopefully this work will happen Real Soon Now™...

jabley commented 11 years ago

Intending to merge this on Monday morning, deploy on preview and rollback to test our intended deployment process.