Closed saulius closed 12 years ago
Will make another dump in 10.
For the reference, Tariff and CHIEF updates since roughly start of June are 120mb in size.
If by dump you mean we need to drop and re-import all the data, that's not good and we need to explore other avenues. Taking down the site should not really be an option.
The CDN might be caching most of it and most people using the service might not notice it, but it's not a sustainable approach in my view.
Going forward we should be doing migrations. We need an obvious (and automated) rollback process in case something goes wrong.
Okay you are right, we can squeeze this into migration instead of dump reload. Let me modify that.
.. But that means that the backend server should have all update files present, so that it could read them and update file column. Hrm.
We can do this just in a migration, and deploy AFTER the CHIEF and TARIC update for that day has already been applied. Otherwise we'll have issues where we'll be looking in the DB to apply an update thats on the file system.
The issue without doing a full restore of the db, we will not have a complete audit trail - updates applied while using the file system will be lost and we won't have a backup to restore the system *should something happen to the HMRC server and our backups.
Or we can do it before the updates are applied and remove any pending db entries.
Had a chat with James, from the point of data consistency it would be nice if we can restore the db from a backup that has all the update files stored in the db (since the seed load). This is not a high priority, but should we need to rebuild the system without the seed file and all subsequent update files it is impossible.
Suggest that we load into a new db and then switch over (rather than drop and create)
Storing files in the database feels slightly mental. Systems exist for storing files.
Can we just take a step back to clarify and agree what we're trying to achieve here?
So the effort here is to make Tariff cluster-compatible and make update management easier.
There is a drawback however: we need to increase max_allowed_packet size in MySQL configuration, because CHIEF updates tend to be bigger than 16Mb (current setting, AFAIK). The biggest we've seen in almost half a year is 18Mb. I think 64Mb would be a good choices, leaves plenty of buffer space. Please mind, this is not frequently read data, it's not that we will be serving images from DB. Ideally it will be written and read just once by background processes.
Also, I think that update download times and content sizes should be measured by Statsd so we would get a general idea about trends but this is for another PR.