Open 7hunderbird opened 1 month ago
Got the job built into Jenkins. The job is currently a "no-op" but the part of adding the job is in this devops PR.
Here's the link: http://jenkins.vfs.va.gov/job/cms-test/job/cms-test-rollback-staging/
Last week I had put the test.staging.cms.va.gov site into the broken state to be able to test the rollback button.
This broken state got into a more broken state because I didn't do the rollback and then when I tried to reset it to the previous state I ran into some problems with that.
00:30:40.499 msg: Failure in the post-deploy tasks
Here are the three tasks that fail:
These are the failures from the Jenkins log:
In the Enable Deploy Mode in CMS
task:
00:30:39.577 TASK [Enable Deploy Mode in CMS] ***********************************************
00:30:39.577 Friday 08 November 2024 18:55:08 +0000 (0:00:00.027) 0:07:20.384 *******
00:30:40.499 fatal: [ip-10-247-35-92.us-gov-west-1.compute.internal]: FAILED! => changed=true
00:30:40.499 cmd: /bin/bash -lc 'drush va-gov-enable-deploy-mode 2>&1'
00:30:40.499 delta: '0:00:00.551960'
00:30:40.499 end: '2024-11-08 18:55:09.168307'
00:30:40.499 msg: non-zero return code
00:30:40.499 rc: 1
00:30:40.499 start: '2024-11-08 18:55:08.616347'
00:30:40.499 stderr: ''
00:30:40.499 stderr_lines: <omitted>
00:30:40.499 stdout: |2-
00:30:40.499
00:30:40.499
00:30:40.499 Command va-gov-enable-deploy-mode was not found. Drush was unable to query
00:30:40.499 the database. As a result, many commands are unavailable. Re-run your comma
00:30:40.499 nd with --debug to see relevant log messages.
00:30:40.499 stdout_lines: <omitted>
In the Drush deploy
task:
00:30:39.577 TASK [Drush deploy] ************************************************************
00:30:39.577 Friday 08 November 2024 18:55:07 +0000 (0:00:00.029) 0:07:19.502 *******
00:30:39.577 fatal: [ip-10-247-35-92.us-gov-west-1.compute.internal]: FAILED! => changed=true
00:30:39.577 cmd: /bin/bash -lc 'drush deploy --yes 2>&1'
00:30:39.577 delta: '0:00:00.544530'
00:30:39.577 end: '2024-11-08 18:55:08.289503'
00:30:39.577 msg: non-zero return code
00:30:39.577 rc: 1
00:30:39.577 start: '2024-11-08 18:55:07.744973'
00:30:39.577 stderr: ''
00:30:39.577 stderr_lines: <omitted>
00:30:39.577 stdout: |2-
00:30:39.577
00:30:39.577 In BootstrapHook.php line 40:
00:30:39.577
00:30:39.577 Bootstrap failed. Run your command with -vvv for more information.
00:30:39.577 stdout_lines: <omitted>
In the Sync PROD database for downstream environments only (sync-db.sh)
task:
00:30:16.632 TASK [Sync PROD database for downstream environments only (sync-db.sh)] ********
00:30:16.632 Friday 08 November 2024 18:54:43 +0000 (0:00:00.300) 0:06:55.215 *******
00:30:24.700 fatal: [ip-10-247-35-92.us-gov-west-1.compute.internal]: FAILED! => changed=true
00:30:24.700 cmd: /bin/bash -lc './scripts/sync-db.sh 2>&1'
00:30:24.700 delta: '0:00:08.719743'
00:30:24.700 end: '2024-11-08 18:54:52.174984'
00:30:24.700 msg: non-zero return code
00:30:24.700 rc: 1
00:30:24.700 start: '2024-11-08 18:54:43.455241'
00:30:24.700 stderr: ''
00:30:24.700 stderr_lines: <omitted>
00:30:24.700 stdout: |-
00:30:24.700 Downloading latest PROD database from: [https://dsva-vagov-prod-cms-test-backup-sanitized.s3-us-gov-west-1.amazonaws.com/database/cms-prod-db-sanitized-latest.sql.gz](https://dsva-vagov-prod-cms-test-backup-sanitized.s3-us-gov-west-1.amazonaws.com/database/cms-prod-db-sanitized-latest.sql.gz%1B[0m)
00:30:24.700 % Total % Received % Xferd Average Speed Time Time Time Current
00:30:24.700 Dload Upload Total Spent Left Speed
00:30:24.700 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 263 0 263 0 0 8755 0 --:--:-- --:--:-- --:--:-- 9068
00:30:24.700 Downloaded PROD Database to .dumps/cms-prod-db-sanitized-latest.sql.
00:30:24.700 Dropping existing database tables
00:30:24.700
00:30:24.700 // Do you really want to drop all tables in the database dsva_cms_staging?:
00:30:24.700 // yes.
00:30:24.700
00:30:24.700 Database tables dropped
00:30:24.700 Importing .dumps/cms-prod-db-sanitized-latest.sql
00:30:24.700 ./scripts/sync-db.sh: line 27: cms-prod-db-sanitized-latest.sql: No such file or directory
00:30:24.700 stdout_lines: <omitted>
00:30:24.700 ...ignoring
Basically it was failing because it didn't have a "sanitized database" to help setup the instance.
I ran these jobs in this order and it's back to working:
User Story or Problem Statement
Create a job in Jenkins that does steps from the deploy troubleshooting guide.
https://raw.githubusercontent.com/department-of-veterans-affairs/va.gov-cms/main/READMES/devops/deploy-failure-troubleshooting-guide.md
Description or Additional Context
Most of the time we go in and do "2A) ROLLBACK" in the deploy-failure-troubleshooting-guide.
aws autoscaling complete-lifecycle-action \ --region us-gov-west-1 \ --auto-scaling-group-name "dsva-vagov-prod-cms-asg" \ --lifecycle-hook-name launch-hook \ --lifecycle-action-result ABANDON \ --instance-id i-0f256f4eae72d5c87
It would be great if we could have a job in Jenkins that would just perform the steps once we've determined that we are in the 2A scenario.
An example of this would be when the deploy process has started and the Auto Scaling Group is replacing the EC2 instance but fails to obtain an IP address.
This will be a first step in this work, where the job will be manually run, but then we will want to work on automating this job in future iterations.
Steps for Implementation
Acceptance Criteria