Closed anyakhvost closed 7 years ago
Dumb question, how did this leak through the UAT environment?
UAT database didn't have nil
values for css_id so the migration worked fine. Also when the migration ran in UAT, the code at that time had email
method in Download.rb @askldjd
@aroltsch Thanks for putting this together!
Since we cannot expect UAT to mirror the prod dataset I agree with your assessment:
These should be our rules going forward when test/merging migrations into the codebase
I agree 💯 on bullet 1 with @joofsh. I think we let these migrations pile up and it caused some issues. We should be more careful about calling out, merging in, and deploying any DB migrations.
To avoid timing issues, data created by data transforms should not be actively being read by the application. We should also try our hardest to make sure data transforms are only creating or only deleting unused data. Doing both at the same time can also cause timing issues (we aren't Indiana Jones 😄)
As for bullet 2, I'm not comfortable with saying they must run against a prod dump.
If we use this migration as a case study, we wouldn't have benefited much from a run on prod copy, and it would have been a lot more work. We were only writing data to a new table, and even if the data we wrote was corrupted (which it was not, since any exception forces a db rollback of the entire migration), we could have always cleared out the table and retried.
The primary issue here was the downtime caused by trying too many migrations at the same time, and when our data transform failed, the application broke down because the database was in an unexpected state.
I believe we should use our judgement to determine the potential risk of any data transform, and test against a prod dump when the risk warrants it.
Finally, we should also be as thorough as we can in validating assumptions about the state of the database in production.
1) Any destructive migration should be handled with care, and we should not let them stack up
1) Have a way to dump and sanitize prod DB PII. (captured here https://github.com/department-of-veterans-affairs/caseflow/issues/669)
1) In the mean time, we should have multiple reviewers closely review and approve any data migration scripts.
We have experienced a few issues with the migration when deploying code to production. The migration consisted of three things:
Here is the initial migration: https://github.com/department-of-veterans-affairs/caseflow-efolder/commit/9f616f0f184c0dce459b1854b25b0beb7c9857a4#diff-e5f0fedca0cff59f67d219d6fdc8699e
Issues:
email
method on the Download object. When the migration was created,email
method existed in Download.rb; however, when the migration ran, the method was already removed by then.Here is the resulting migration: https://github.com/department-of-veterans-affairs/caseflow-efolder/blob/master/db/migrate/20161222194523_create_users.rb
It would be great to brainstorm ideas to avoid things like that in the future: