department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
99 stars 69 forks source link

BRD jenkins jobs, Sanitize database on prod.cms.va.gov and CMS every-minute tasks run on prod failed due to lack of disk space #17883

Open edmund-dunn opened 6 months ago

edmund-dunn commented 6 months ago

User Story or Problem Statement

The following jobs have failed due to lack of disk space:

Description or Additional Context

CMS DB Sanitize error

https://dsva.slack.com/archives/CJT90C0UT/p1713362200272809

06:56:40      mysqldump: [Warning] Using a password on the command line interface can be insecure.
06:56:40      ++ ls -l --human-readable cms-prod-db-sanitized.sql
06:56:40      + echo 'File stats post-sanitize: ' -rw-rw-r-- 1 jenkins jenkins 4.0G Apr 17 13:55 cms-prod-db-sanitized.sql
06:56:40      + sed 's/\sDEFINER="[^`]*"@"[^`]*"//g' --in-place cms-prod-db-sanitized.sql
06:56:40      sed: couldn't write 1036155 items to ./sedZb2vnZ: No space left on device
06:56:40    stderr_lines: <omitted>
06:56:40    stdout: |-
06:56:40      Completed 81 Bytes/81 Bytes (1.7 KiB/s) with 1 file(s) remainingdownload: s3://dsva-vagov-prod-cms-backup/database/latest_uri to ./latest_uri

https://dsva.slack.com/archives/CJT90C0UT/p1713363471276249

07:17:51  fatal: [localhost]: FAILED! => changed=true 
07:17:51    msg: non-zero return code
07:17:51    rc: 1
07:17:51    stderr: |-
07:17:51      + MYSQL_SANITIZE_DB_NAME=sanitize
07:17:51      + MYSQL_CONNECTION_ARGS=(--host=dsva-cms-staging-db.cdqbofmbcmtd.us-gov-west-1.rds.amazonaws.com --user=master --password=KJD873JHd3u!%*dmj7)
07:17:51      ++ mktemp -d
07:17:51      mktemp: failed to create directory via template /tmp/tmp.XXXXXXXXXX: No space left on device
07:17:51      + tempdir=
07:17:51    stderr_lines: <omitted>
07:17:51    stdout: ''
07:17:51    stdout_lines: <omitted>
07:17:51  

Every Minute

https://dsva.slack.com/archives/CJT90C0UT/p1713363153772829

07:12:33  Installing setuptools, pip, wheel...
07:12:33    Complete output from command /instance-storage/je...e/venv/bin/python2.7 - setuptools pip wheel:
07:12:33    Traceback (most recent call last):
07:12:33    File "<stdin>", line 12, in <module>
07:12:33  IOError: [Errno 28] No space left on device
07:12:33  ----------------------------------------
07:12:33  ...Installing setuptools, pip, wheel...done.
07:12:33  Traceback (most recent call last):
07:12:33    File "/usr/bin/virtualenv", line 11, in <module>
07:12:33      load_entry_point('virtualenv==15.1.0', 'console_scripts', 'virtualenv')()
07:12:33    File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 717, in main
07:12:33      symlink=options.symlink)
07:12:33    File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 949, in create_environment
07:12:33      download=download,
07:12:33    File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 905, in install_wheel
07:12:33      call_subprocess(cmd, show_stdout=False, extra_env=env, stdin=SCRIPT)
07:12:33    File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 801, in call_subprocess
07:12:33      % (cmd_desc, proc.returncode))
07:12:33  OSError: Command /instance-storage/je...e/venv/bin/python2.7 - setuptools pip wheel failed with error code 1

Steps for Implementation

Acceptance Criteria

edmund-dunn commented 6 months ago
04:15:22  Started by upstream project "[cms/cms-full-pipeline](http://jenkins.vfs.va.gov/job/cms/job/cms-full-pipeline/)" build number [104576](http://jenkins.vfs.va.gov/job/cms/job/cms-full-pipeline/104576)
04:15:22  originally caused by:
04:15:22   Started by timer
04:15:22  Checking out git https://github.com/department-of-veterans-affairs/devops.git into /var/lib/jenkins/workspace/cms/cms-db-backup-prod@script to read ansible/Jenkinsfiles/cms/cms-db-backup-job.groovy
04:15:22  The recommended git tool is: git
04:15:22  using credential va-bot
04:15:22   > git rev-parse --resolve-git-dir /var/lib/jenkins/workspace/cms/cms-db-backup-prod@script/.git # timeout=10
04:15:22  Fetching changes from the remote Git repository
04:15:22   > git config remote.origin.url https://github.com/department-of-veterans-affairs/devops.git # timeout=10
04:15:22  Fetching upstream changes from https://github.com/department-of-veterans-affairs/devops.git
04:15:22   > git --version # timeout=10
04:15:22   > git --version # 'git version 2.18.5'
04:15:22  using GIT_ASKPASS to set credentials Credentials for va-bot on GitHub
04:15:22   > git fetch --tags --progress -- https://github.com/department-of-veterans-affairs/devops.git +refs/heads/*:refs/remotes/origin/* # timeout=10
04:15:23   > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
04:15:23  Checking out Revision 5d6d652e0bf542c14de297e623276e36caceb232 (refs/remotes/origin/master)
04:15:23   > git config core.sparsecheckout # timeout=10
04:15:23   > git checkout -f 5d6d652e0bf542c14de297e623276e36caceb232 # timeout=10
04:15:23  Commit message: "chore: SSL for LGY api gateway (#14227)"
04:15:23   > git rev-list --no-walk 2f7b0708dbb6c880e08ab9c2cbd905f02ccacc8d # timeout=10
04:15:23  Running in Durability level: MAX_SURVIVABILITY
04:15:23  Loading library va.gov-devops-jenkins-lib@master
04:15:23  Attempting to resolve master from remote references...
04:15:23   > git --version # timeout=10
04:15:23   > git --version # 'git version 2.18.5'
04:15:23  using GIT_ASKPASS to set credentials Credentials for va-vfs-bot on GitHub
04:15:23   > git ls-remote -h -- https://github.com/department-of-veterans-affairs/va.gov-devops-jenkins-lib.git # timeout=10
04:15:23  Found match: refs/heads/master revision b50da3eb1f99b65623130e24bd28f24c3e2b120d
04:15:23  The recommended git tool is: NONE
04:15:23  using credential va-vfs-bot
04:15:23   > git rev-parse --resolve-git-dir /var/lib/jenkins/workspace/cms/cms-db-backup-prod@libs/va.gov-devops-jenkins-lib/.git # timeout=10
04:15:23  Fetching changes from the remote Git repository
04:15:23   > git config remote.origin.url https://github.com/department-of-veterans-affairs/va.gov-devops-jenkins-lib.git # timeout=10
04:15:23  Fetching without tags
04:15:23  Fetching upstream changes from https://github.com/department-of-veterans-affairs/va.gov-devops-jenkins-lib.git
04:15:23   > git --version # timeout=10
04:15:23   > git --version # 'git version 2.18.5'
04:15:23  using GIT_ASKPASS to set credentials Credentials for va-vfs-bot on GitHub
04:15:23   > git fetch --no-tags --progress -- https://github.com/department-of-veterans-affairs/va.gov-devops-jenkins-lib.git +refs/heads/*:refs/remotes/origin/* # timeout=10
04:15:24  Checking out Revision b50da3eb1f99b65623130e24bd28f24c3e2b120d (master)
04:15:24   > git config core.sparsecheckout # timeout=10
04:15:24   > git checkout -f b50da3eb1f99b65623130e24bd28f24c3e2b120d # timeout=10
04:15:24  Commit message: "Make all github api requests authenticated (#20)"
04:15:24   > git rev-list --no-walk b50da3eb1f99b65623130e24bd28f24c3e2b120d # timeout=10
04:15:24  [Pipeline] Start of Pipeline
04:15:24  [Pipeline] node
04:15:24  Running on [EC2 (jenkins-cloud) - vetsgov-general-purpose-1a (sir-353yr5yj)](http://jenkins.vfs.va.gov/computer/EC2%20(jenkins-cloud)%20-%20vetsgov-general-purpose-1a%20(sir-353yr5yj)/) in /home/jenkins/workspace/cms/cms-db-backup-prod
04:15:24  [Pipeline] {
04:15:24  [Pipeline] stage
04:15:24  [Pipeline] { (Declarative: Checkout SCM)
04:15:24  [Pipeline] checkout
04:15:24  The recommended git tool is: git
04:15:24  using credential va-bot
04:15:24  Cloning the remote Git repository
04:15:24  Cloning repository https://github.com/department-of-veterans-affairs/devops.git
04:15:24   > git init /home/jenkins/workspace/cms/cms-db-backup-prod # timeout=10
04:15:24  Fetching upstream changes from https://github.com/department-of-veterans-affairs/devops.git
04:15:24   > git --version # timeout=10
04:15:24   > git --version # 'git version 2.38.4'
04:15:24  using GIT_ASKPASS to set credentials Credentials for va-bot on GitHub
04:15:24   > git fetch --tags --force --progress -- https://github.com/department-of-veterans-affairs/devops.git +refs/heads/*:refs/remotes/origin/* # timeout=10
04:15:35  Avoid second fetch
04:15:35  Checking out Revision 5d6d652e0bf542c14de297e623276e36caceb232 (refs/remotes/origin/master)
04:15:36  Commit message: "chore: SSL for LGY api gateway (#14227)"
04:15:36  [Pipeline] }
04:15:36  [Pipeline] // stage
04:15:36  [Pipeline] withEnv
04:15:36  [Pipeline] {
04:15:36  [Pipeline] ansiColor
04:15:36  [Pipeline] {
04:15:36  
04:15:36  [Pipeline] lock
04:15:36  Trying to acquire lock on [{deploys/job/cms-vagov-prod/ Block cms/job/cms-db-backup-prod/}]
04:15:36  Resource [deploys/job/cms-vagov-prod/ Block cms/job/cms-db-backup-prod/] did not exist. Created.
04:15:36  Lock acquired on [{deploys/job/cms-vagov-prod/ Block cms/job/cms-db-backup-prod/}]
04:15:36  [Pipeline] {
04:15:36  [Pipeline] stage
04:15:36  [Pipeline] { (Obtain virtualenv packages)
04:15:36  [Pipeline] dir
04:15:36  Running in /home/jenkins/workspace/cms/cms-db-backup-prod/ansible
04:15:36  [Pipeline] {
04:15:36  [Pipeline] copyArtifacts
04:15:35   > git config remote.origin.url https://github.com/department-of-veterans-affairs/devops.git # timeout=10
04:15:35   > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
04:15:35   > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
04:15:35   > git config core.sparsecheckout # timeout=10
04:15:35   > git checkout -f 5d6d652e0bf542c14de297e623276e36caceb232 # timeout=10
04:15:37  Copied 1 artifact from "[Testing » DevOps » master](http://jenkins.vfs.va.gov/job/testing/job/devops/job/master/)" build number [11294](http://jenkins.vfs.va.gov/job/testing/job/devops/job/master/11294/)
04:15:37  [Pipeline] sh
04:15:37  + tar zxf venv.tgz
04:15:38  [Pipeline] sh
04:15:38  + rm venv.tgz
04:15:38  [Pipeline] sh
04:15:38  + virtualenv venv
04:15:38  New python executable in /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/python2.7
04:15:38  Not overwriting existing python script /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/python (you must use /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/python2.7)
04:15:41  Installing setuptools, pip, wheel...done.
04:15:41  Overwriting /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/activate with new content
04:15:41  Overwriting /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/activate.fish with new content
04:15:41  Overwriting /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/activate.csh with new content
04:15:41  Overwriting /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/bin/python-config with new content
04:15:41  [Pipeline] }
04:15:41  [Pipeline] // dir
04:15:41  [Pipeline] }
04:15:41  [Pipeline] // stage
04:15:41  [Pipeline] stage
04:15:41  [Pipeline] { (Run DB backup via Ansible)
04:15:41  [Pipeline] dir
04:15:41  Running in /home/jenkins/workspace/cms/cms-db-backup-prod/ansible
04:15:41  [Pipeline] {
04:15:41  [Pipeline] sh
04:15:41  + bash -c 'source venv/bin/activate && ansible-playbook cms/cms-db-backup.yml'
04:15:42  /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/lib/python2.7/site-packages/ansible/parsing/vault/__init__.py:41: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
04:15:42    from cryptography.exceptions import InvalidSignature
04:15:44  [DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to 
04:15:44  allow bad characters in group names by default, this will change, but still be 
04:15:44  user configurable on deprecation. This feature will be removed in version 2.10.
04:15:44   Deprecation warnings can be disabled by setting deprecation_warnings=False in 
04:15:44  ansible.cfg.
04:15:44  [WARNING]: Invalid characters were found in group names but not replaced, use
04:15:44  -vvvv to see details
04:15:44  
04:15:44  /instance-storage/jenkins/workspace/cms/cms-db-backup-prod/ansible/venv/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
04:15:44    RequestsDependencyWarning)
04:15:45  
04:15:45  PLAY [Create RDS MysqlDump backup in AWS] **************************************
04:15:45  
04:15:45  TASK [Gathering Facts] *********************************************************
04:15:45  Thursday 18 April 2024  11:15:45 +0000 (0:00:00.162)       0:00:00.162 ******** 
04:15:46  ok: [localhost]
04:15:46  
04:15:46  TASK [include_role : cms-db-backup] ********************************************
04:15:46  Thursday 18 April 2024  11:15:46 +0000 (0:00:01.153)       0:00:01.315 ******** 
04:15:46  
04:15:46  TASK [cms-db-backup : set timestamp variable] **********************************
04:15:46  Thursday 18 April 2024  11:15:46 +0000 (0:00:00.125)       0:00:01.441 ******** 
04:15:46  ok: [localhost]
04:15:46  
04:15:46  TASK [cms-db-backup : debug] ***************************************************
04:15:46  Thursday 18 April 2024  11:15:46 +0000 (0:00:00.039)       0:00:01.480 ******** 
04:15:46  ok: [localhost] => 
04:15:46    msg:
04:15:46      ansible_facts:
04:15:46        timestamp: 2024-04-18-11-15
04:15:46      changed: false
04:15:46      failed: false
04:15:46  
04:15:46  TASK [cms-db-backup : Start RDS backup SQL dump and copy to S3] ****************
04:15:46  Thursday 18 April 2024  11:15:46 +0000 (0:00:00.029)       0:00:01.510 ******** 
04:15:46  fatal: [localhost]: FAILED! => changed=true 
04:15:46    cmd: |-
04:15:46      # Exit immediately if a command fails with a non-zero status code
04:15:46      set -e
04:15:46    
04:15:46      tempdir=$(mktemp -d)
04:15:46      cd $tempdir
04:15:46    
04:15:46      # UTF8MB4 is so that emoji make it downstream!
04:15:46      mysqldump --default-character-set=utf8mb4 --single-transaction -h dsva-cms-prod-db.cdqbofmbcmtd.us-gov-west-1.rds.amazonaws.com -u master -pU%1jtO2F2sf5 dsva_cms_prod > drupal8-db-prod-2024-04-18-11-15.sql
04:15:46    
04:15:46      gzip drupal8-db-prod-2024-04-18-11-15.sql
04:15:46    
04:15:46      echo "https://s3-us-gov-west-1.amazonaws.com/dsva-vagov-prod-cms-backup/database/drupal8-db-prod-2024-04-18-11-15.sql.gz" > latest_url
04:15:46      echo "s3://dsva-vagov-prod-cms-backup/database/drupal8-db-prod-2024-04-18-11-15.sql.gz" > latest_uri
04:15:46    
04:15:46      aws s3 cp drupal8-db-prod-2024-04-18-11-15.sql.gz s3://dsva-vagov-prod-cms-backup/database/drupal8-db-prod-2024-04-18-11-15.sql.gz --region us-gov-west-1
04:15:46    
04:15:46      aws s3api put-object-tagging --bucket dsva-vagov-prod-cms-backup --key database/drupal8-db-prod-2024-04-18-11-15.sql.gz --region us-gov-west-1 --tagging 'TagSet=[{Key=Env,Value=prod}]'
04:15:46    
04:15:46      aws s3 cp latest_url s3://dsva-vagov-prod-cms-backup/database/latest_url --region us-gov-west-1
04:15:46      aws s3 cp latest_uri s3://dsva-vagov-prod-cms-backup/database/latest_uri --region us-gov-west-1
04:15:46    
04:15:46      cd
04:15:46      rm -rf ${tempdir}
04:15:46    delta: '0:00:00.028880'
04:15:46    end: '2024-04-18 11:15:46.852230'
04:15:46    msg: non-zero return code
04:15:46    rc: 1
04:15:46    start: '2024-04-18 11:15:46.823350'
04:15:46    stderr: 'mktemp: failed to create directory via template ‘/tmp/tmp.XXXXXXXXXX’: No space left on device'
04:15:46    stderr_lines: <omitted>
04:15:46    stdout: ''
04:15:46    stdout_lines: <omitted>
04:15:46  
04:15:46  PLAY RECAP *********************************************************************
04:15:46  localhost                  : ok=3    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
04:15:46  
04:15:46  Thursday 18 April 2024  11:15:46 +0000 (0:00:00.446)       0:00:01.957 ******** 
04:15:46  =============================================================================== 
04:15:46  Gathering Facts --------------------------------------------------------- 1.15s
04:15:46  cms-db-backup : Start RDS backup SQL dump and copy to S3 ---------------- 0.45s
04:15:46  include_role : cms-db-backup -------------------------------------------- 0.13s
04:15:46  cms-db-backup : set timestamp variable ---------------------------------- 0.04s
04:15:46  cms-db-backup : debug --------------------------------------------------- 0.03s
04:15:47  [Pipeline] }
04:15:47  [Pipeline] // dir
04:15:47  [Pipeline] }
04:15:47  [Pipeline] // stage
04:15:47  [Pipeline] stage
04:15:47  [Pipeline] { (Declarative: Post Actions)
04:15:47  [Pipeline] cleanWs
04:15:47  [WS-CLEANUP] Deleting project workspace...
04:15:47  [WS-CLEANUP] Deferred wipeout is used...
04:15:47  [WS-CLEANUP] done
04:15:47  [Pipeline] slackSend
04:15:47  Slack Send Pipeline step running, values are - baseUrl: <empty>, teamDomain: dsva, channel: #cms-notifications, color: danger, botUser: false, tokenCredentialId: slack-token, notifyCommitters: false, iconEmoji: <empty>, username: <empty>, timestamp: <empty>
04:15:47  [Pipeline] }
04:15:47  [Pipeline] // stage
04:15:47  [Pipeline] }
04:15:47  Lock released on resource [{deploys/job/cms-vagov-prod/ Block cms/job/cms-db-backup-prod/}]
04:15:47  [Pipeline] // lock
04:15:47  [Pipeline] }
04:15:47  
04:15:47  [Pipeline] // ansiColor
04:15:47  [Pipeline] }
04:15:47  [Pipeline] // withEnv
04:15:47  [Pipeline] }
04:15:47  [Pipeline] // node
04:15:47  [Pipeline] End of Pipeline
04:15:47  ERROR: script returned exit code 2
04:15:47  Finished: FAILURE