Closed saulius closed 11 years ago
I feel sorry for whoever will be reviewing this. Oh...
It's mostly noise in app/models. Interesting things in lib/*.
Do we have a sense of what difference this might make to the rate of growth of the database?
@jystewart fair point, that and performance have already been discussed in chat and we'll be recording the results of our findings somewhere.
Would it be completely unrealistic to run two versions of the app for ~1 week? You could at least compare the two databases and run similar recovery scenarios against both versions.
Running in parallel sounds great to me. If we would implement on the related stories "Improving the information the alert emails send out after sync" (https://www.pivotaltracker.com/story/show/43475217) we could compare changes without going through db records.
Running in parallel would be tidy. We have a story in the backlog about having more environments to deploy demo features to; having a parallel environment would presumably sit within that.
This is physical database growth chart by day on the VM. Not sure how useful this is, as InnoDB flushes at intervals if I understand it correctly so the chart growth is not very consistent.
For which version of the application? Master or this branch?
This branch.
I'm trying to define the requirements for having another environment or two with Infrastructure. Hopefully this work will happen Real Soon Now™...
Intending to merge this on Monday morning, deploy on preview and rollback to test our intended deployment process.
! DO NOT MERGE !
The name and idea comes from http://goo.gl/gwaWC and http://goo.gl/gc29A (Chapter 8).
Related stories:
This is a very noisy commit because I had to find and replace every line in CHIEF initial seed file of 22k lines.
Assumptions:
Changes
Operation log
Currently for we were performing all operations in place. Meaning we find a record by primary key, update or destroy it and it's gone. Now there are no physical deletes. We treat inserts, updates and deletions as operations and put them to
model_oplog
tables. All attributes are persisted at all times. All operations have oid (auto incrementing operation id) column, with unique per table value. To reconstruct the database state as we have now we create views for each table looking similar to:That gives us latest state of the tables just as we would have been applying updates as before. So model associations, fields etc just keep working as they did.
To make it possible all TARIC models now include sequel oplog plugin which intercepts inserts/updates/deletions and creates an oplog entry instead of actually performing that operation on the view. It also dynamically defines ModelName::Operation Sequel model class that allows us to access the oplog. This will most likely be useful for other related stories.
Conformance validations
At the moment we have conformance validations as Sequel class validations or before_destroy hooks. Since we now we will be able to rollback the updates I tried to enable them and see how it goes. Problem was that
Sequel::HookFailed
exception or failed presence validation (one out of 4) gives very little info in context of huge Taric update. That is why I wrote a simple validation framework that is meant to be used for conformance validations. In case of error it adds error names and identifiers tomodel_instance#errors[:conformance]
and you can at least know where to look for a fix. I ported almost all validations we had before this change and with all the modifications they are now passing. With these enabled we run for real so to say.As a good side effect validations are class-wise decoupled from model classes and live separately in app/validators.
There's also a new task/class called
TradeTariffBackend::Auditor
that traverses all (or selected subset) of models, applies conformance validations on all the records and reports back on the results. This is needed since we will be increasing our conformance suite obviously. It is not meant to be run in CI or productio. It takes 5 hours to traverse everything.TARIC importer changes
At the moment when performing TARIC updates we operate on Sequel datasets which is very fast but does not instantiate model objects and therefore ignores validations. This has changed as said earlier to perform them so may be a bit slower. On the plus side, less Nokogiri and more Hashes is used so memory consumption of the import process dropped in my tests.
Rollback mechanism
This is yet to be done (TODO section), but is actually a quite simple thing. If we want to go back to 1st of 2013 we have to traverse all Sequel models that have oplog plugin and delete all operations that happened afterwards. Also, we have to delete all CHIEF (chief_mfcm, chief_tame, chief_tamf, chief_comm, chief_tbl9) entries that came with updates after 1st Jan 2013. And the third thing is to set all tariff_updates after that date to pending (or just remove them. thoughts?).
Misc changes
I modified some migrations and column names which is not a good practice. But we will do a full reload (see assumptions) so I think it should be fine. Notably I changed primary keys of most of the tables to have a unique names, because that results in an error when run on PostgreSQL (yes, I tried that too). Also, I changed some names on Measure and added _id suffixes to avoid conflicts with association names.
Bug fixes
TARIC records
Whenever TARIC sends an update it does not include fields that are supposed to be blank (NULL). So if we have a Measure with validity_start_date and validity_end_date and TARIC decides to update the dates leaving validity_end_date unbounded (blank) we won't receive validity_end_date as part of TARIC update record. That is why we now merge default columns with nil values with TARIC update values.
Impact on stories
Retroactive measures #43944671
A retroactively added measure is such measure that has validity_start_date spanning dates before the date it was added. So if we add a Measure on 1st Jan 2013 and it has validity start date set to 1st of Jan 2012 it means it was retroactively added.
We are not tracking addition dates now, so it's not possible to identify such measures. This change added oplog tables and each entry has operation_date so therefore it's trivial to say which Measure was added retroactively.
Commodity change tracking, RSS feeds and information about changes
There is no way we can do these things with current implementation as we don't retain previous versions. It won't be trivial with the oplog tables either, but the implementation is possible. For example for the RSS feed of commodity changes we have to find all entity operations related to that commodity (
UNION
?), sort them by operation date and figure out how to represent each one of them.TODO
Actions