department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 197 forks source link

KmsEncryptionVerification to remove Lockbox #33617

Open LindseySaari opened 2 years ago

LindseySaari commented 2 years ago

Issue Description

We have updated how we store encrypted data in Vets API. Here is our progress:

Tasks

Background

Previous Attempts:

We attempted a solution but ran into problems verifying the data was correct.

1. Decrypt each record and set verified_decrytable_at column. Before Christmas, the `KmsEncryptionVerification` job was run. With this approach, we decided to save a [verified_decrytable_at date](https://github.com/department-of-veterans-affairs/vets-api/blob/master/app/workers/kms_encryption_verification_job.rb#L21) on the model after it was successfully decrypted via the kms key. This attempt caused SavedClaim records to be reprocessed. Thread [here](https://dsva.slack.com/archives/CBU0KDSB1/p1639762159212100). TLDR; Since old `SavedClaim::EducationBenefits` are cleaned up via [this job](https://github.com/department-of-veterans-affairs/vets-api/blob/master/app/workers/education_form/delete_old_applications.rb), when the `verified_decrytable_at` date was saved, [this validation ](https://github.com/department-of-veterans-affairs/vets-api/blob/master/app/models/saved_claim/education_benefits.rb#L25) ran, creating a `education_benefits_claim` record, which was then "re-processed" by the [spool file job](https://github.com/department-of-veterans-affairs/vets-api/blob/master/app/workers/education_form/create_daily_spool_files.rb#L35-L39).
2. Log corrupted records to Sentry. [See](https://github.com/department-of-veterans-affairs/vets-api/pull/8539/files). I split the records evenly across 5 jobs. ([See spreadsheet](https://docs.google.com/spreadsheets/d/16HQ91dEWrNr7eIHBaJ1znI0ICCDT_-Crhn7_NCpQ-ng/edit#gid=623420235)). Since these jobs are processing upwards of millions of records, we would often see timeouts, which would start the job back over from the beginning (Hence why we tried solution number one above to save a verified_decryptable_at date).

Why even do this?

We've seen some decryption issues. We were made aware of some decryption issues from a vfs team member referencing two `AppealsApi::HigherLevelReview` records. These records were still encrypted via Lockbox and were not rotated properly during the kms rotation. The records had an `encrypted_kms_key` attribute, but were throwing errors when being decrypted. Our theory is that the job to rotate may have been killed/restarted after the deploy and the records were in the middle of being rotated, causing incomplete rotation. After reverting the [PR to remove previous_version lockbox defaults ](https://github.com/department-of-veterans-affairs/vets-api/pull/8499), the records decrypted via Lockbox properly and further recovery efforts were not needed.

[^1]: See pull requests: https://github.com/department-of-veterans-affairs/vets-api/pull/7816 and https://github.com/department-of-veterans-affairs/vets-api/pull/7817. [^2]: See pull requests: https://github.com/department-of-veterans-affairs/vets-api/pull/7848, https://github.com/department-of-veterans-affairs/vets-api/pull/7866, https://github.com/department-of-veterans-affairs/vets-api/pull/8095, and https://github.com/department-of-veterans-affairs/vets-api/pull/8112. [^3]: See pull requests: https://github.com/department-of-veterans-affairs/vets-api/pull/7981, https://github.com/department-of-veterans-affairs/vets-api/pull/8173 https://github.com/department-of-veterans-affairs/vets-api/pull/8514, and https://github.com/department-of-veterans-affairs/vets-api/pull/8570.

LindseySaari commented 2 years ago

Tables/models were split up to be relatively equal (in terms of total records) across the running jobs in this spreadsheet.

SupportingEvidenceAttachment and CovidVaccine::V0::RegistrationSubmission have already successfully completed. SavedClaim::DisabilityCompensation::Form526AllClaim is in progress.

LindseySaari commented 2 years ago

Monday update: Running the jobs as I can in between other tasks. InProgressForm and Form526Submission have also passed. The following tables/models are in progress:


job_5_models = [
  'SavedClaim::CaregiversAssistanceClaim',
  'SavedClaim::Ask',
  'VIC::SupportingDocumentationAttachment',
  'VIC::ProfilePhotoAttachment',
  'AsyncTransaction::Vet360::PermissionTransaction',
  'AsyncTransaction::VAProfile::PermissionTransaction',
  'ClaimsApi::SupportingDocument',
  'AppealsApi::SupplementalClaim',
  'DecisionReviewEvidenceAttachment',
  'HealthQuest::QuestionnaireResponse',
  'SavedClaim::DependencyVerificationClaim',
  'SavedClaim::EducationBenefits::VA0993',
  'AppealsApi::NoticeOfDisagreement',
  'Form1010cg::Attachment',
  'SavedClaim::EducationBenefits::VA1990n',
  'SavedClaim::EducationCareerCounselingClaim',
  'AsyncTransaction::Vet360::AddressTransaction',
  'AsyncTransaction::Vet360::EmailTransaction'
] 
LindseySaari commented 2 years ago

There were some issues surrounding the spool file job and claims that have already been processed. Thread here. It's believed that by updating the verified_decryptable_at field it may have marked the saved claim for reprocesseing., We most need to take a different approach to verify the decryption rather than saving the verified_decryptable_at field on the saved_claim records.