alphagov / trade-tariff-backend

Enabling the population and distribution via API of UK Customs tariffs and duties
MIT License
7 stars 6 forks source link

DB Locks & Database backed updates #19

Closed saulius closed 12 years ago

saulius commented 12 years ago

So the effort here is to make Tariff cluster-compatible and make update management easier.

There is a drawback however: we need to increase max_allowed_packet size in MySQL configuration, because CHIEF updates tend to be bigger than 16Mb (current setting, AFAIK). The biggest we've seen in almost half a year is 18Mb. I think 64Mb would be a good choices, leaves plenty of buffer space. Please mind, this is not frequently read data, it's not that we will be serving images from DB. Ideally it will be written and read just once by background processes.

Also, I think that update download times and content sizes should be measured by Statsd so we would get a general idea about trends but this is for another PR.

saulius commented 12 years ago

Will make another dump in 10.

saulius commented 12 years ago

For the reference, Tariff and CHIEF updates since roughly start of June are 120mb in size.

jabley commented 12 years ago

If by dump you mean we need to drop and re-import all the data, that's not good and we need to explore other avenues. Taking down the site should not really be an option.

The CDN might be caching most of it and most people using the service might not notice it, but it's not a sustainable approach in my view.

KushalP commented 12 years ago

Going forward we should be doing migrations. We need an obvious (and automated) rollback process in case something goes wrong.

saulius commented 12 years ago

Okay you are right, we can squeeze this into migration instead of dump reload. Let me modify that.

saulius commented 12 years ago

.. But that means that the backend server should have all update files present, so that it could read them and update file column. Hrm.

matthewford commented 12 years ago

We can do this just in a migration, and deploy AFTER the CHIEF and TARIC update for that day has already been applied. Otherwise we'll have issues where we'll be looking in the DB to apply an update thats on the file system.

The issue without doing a full restore of the db, we will not have a complete audit trail - updates applied while using the file system will be lost and we won't have a backup to restore the system *should something happen to the HMRC server and our backups.

matthewford commented 12 years ago

Or we can do it before the updates are applied and remove any pending db entries.

matthewford commented 12 years ago

Had a chat with James, from the point of data consistency it would be nice if we can restore the db from a backup that has all the update files stored in the db (since the seed load). This is not a high priority, but should we need to rebuild the system without the seed file and all subsequent update files it is impossible.

Suggest that we load into a new db and then switch over (rather than drop and create)

jabley commented 12 years ago

Storing files in the database feels slightly mental. Systems exist for storing files.

Can we just take a step back to clarify and agree what we're trying to achieve here?