BCDevOps / platform-services

Collection of platform related tools and configurations
Apache License 2.0
13 stars 29 forks source link

Statuspage deployment pipeline race condition #572

Closed stewartshea closed 4 years ago

stewartshea commented 4 years ago

Occasionally the update_notifications playbook craps out in one of the environment deployments (most recently was seen in test) and rerunning the update_notifications.sh script cleans everything up. We need to investigate the race condition and apply appropriate changes to stabilise it (or notify of failure)

stewartshea commented 4 years ago

webhook] 2020/01/15 20:15:59 [351dab] executing /opt/update_notifications.sh (/opt/update_notifications.sh) with arguments ["/opt/update_notifications.sh" "https://github.com/BCDevOps/platform-services-status-page-notifications " "" "41" "" ""] and environment [] using /opt as cwd
--
  | [webhook] 2020/01/15 20:17:20 [351dab] command output: From https://github.com/BCDevOps/platform-services
  | 90d37c1..bd16763  master     -> origin/master
  | 6f0be90..8a7183f  feature/sysdig-teams-operator -> origin/feature/sysdig-teams-operator
  | * [new branch]      latency-test -> origin/latency-test
  | Updating 90d37c1..bd16763
  | Fast-forward
  | security/aporeto/tools/latency-test/README.md      \| 40 ++++++++++++++++++
  | security/aporeto/tools/latency-test/analize.js     \| 49 ++++++++++++++++++++++
  | .../tools/latency-test/test/curl-format.txt        \|  9 ++++
  | security/aporeto/tools/latency-test/test/run.sh    \|  8 ++++
  | 4 files changed, 106 insertions(+)
  | create mode 100644 security/aporeto/tools/latency-test/README.md
  | create mode 100644 security/aporeto/tools/latency-test/analize.js
  | create mode 100644 security/aporeto/tools/latency-test/test/curl-format.txt
  | create mode 100755 security/aporeto/tools/latency-test/test/run.sh
  |  
  | PLAY [localhost] ***************************************************************
  |  
  | TASK [python_requirements_facts] ***********************************************
  | ok: [localhost]
  |  
  | TASK [debug] *******************************************************************
  | ok: [localhost] => {
  | "groupvars[inventory_hostname]": "VARIABLE IS NOT DEFINED!"
  | }
  |  
  | TASK [include_tasks] ***********************************************************
  | included: /opt/platform-services/apps/statuspage/ansible/tasks/set_env.yml for localhost
  |  
  | TASK [set_fact] ****************************************************************
  | skipping: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | skipping: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | skipping: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | ok: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | skipping: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | skipping: [localhost]
  |  
  | TASK [include_tasks] ***********************************************************
  | skipping: [localhost]
  |  
  | TASK [include_tasks] ***********************************************************
  | [DEPRECATION WARNING]: k8s_raw is kept for backwards compatibility but usage is
  | discouraged. The module documentation details page may explain more about this
  | rationale.. This feature will be removed in a future release. Deprecation
  | warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
  | included: /opt/platform-services/apps/statuspage/ansible/tasks/configure.yml for localhost
  |  
  | TASK [Wait for Grafana to be ready...] *****************************************
  | changed: [localhost]
  |  
  | TASK [Get Route] ***************************************************************
  | changed: [localhost]
  |  
  | TASK [Get existing admin password] *********************************************
  | changed: [localhost]
  |  
  | TASK [Generate admin password] *************************************************
  | skipping: [localhost]
  |  
  | TASK [Set password variable] ***************************************************
  | ok: [localhost]
  |  
  | TASK [Insert delay] ************************************************************
  | changed: [localhost]
  |  
  | TASK [Reset admin password] ****************************************************
  | skipping: [localhost]
  |  
  | TASK [Add grafana admin password to deployment] ********************************
  | skipping: [localhost]
  |  
  | TASK [Set light theme] *********************************************************
  | changed: [localhost]
  |  
  | TASK [Get existing public viewer API key] **************************************
  | ok: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | ok: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | ok: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | ok: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | ok: [localhost]
  |  
  | TASK [set_fact] ****************************************************************
  | ok: [localhost]
  |  
  | TASK [Delete existing public viewer API key] ***********************************
  | ok: [localhost]
  |  
  | TASK [Delete existing admin viewer API key] ************************************
  | ok: [localhost]
  |  
  | TASK [Create new public viewer API key] ****************************************
  | ok: [localhost]
  |  
  | TASK [Create new admin API key] ************************************************
  | ok: [localhost]
  |  
  | TASK [Add api key to envoy ConfigMap] ******************************************
  | changed: [localhost]
  |  
  | TASK [Reapply envoy configmap] *************************************************
  | changed: [localhost]
  |  
  | TASK [Redeploy Envoy] **********************************************************
  | changed: [localhost]
  |  
  | TASK [Configure Grafana Datasource] ********************************************
  | changed: [localhost]
  |  
  | TASK [Generate and apply dashboard] ********************************************
  | included: /opt/platform-services/apps/statuspage/ansible/tasks/generate_dashboard.yml for localhost
  |  
  | TASK [set fact from yaml file] *************************************************
  | ok: [localhost]
  |  
  | TASK [Fetch notifications repo content] ****************************************
  | changed: [localhost]
  |  
  | TASK [Set notifications fact path] *********************************************
  | ok: [localhost]
  |  
  | TASK [Get file list] ***********************************************************
  | changed: [localhost]
  |  
  | TASK [render template] *********************************************************
  | changed: [localhost]
  |  
  | TASK [Validate dashboard] ******************************************************
  | fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["bin/grafana-dashboard", "--debug", "--grafana-url", "http://grafana-c81e6h-dev.pathfinder.gov.bc.ca ", "--grafana-apikey", "0=", "validate", "templates/statuspage_grafyaml.yml"], "delta": "0:00:02.363862", "end": "2020-01-15 20:17:20.741649", "msg": "non-zero return code", "rc": 1, "start": "2020-01-15 20:17:18.377787", "stderr": "DEBUG:grafana_dashboards.cmd:Grafana URL override: http://grafana-c81e6h-dev.pathfinder.gov.bc.ca\nDEBUG:grafana_dashboards.cmd:Grafana  APIKey overridden\nINFO:grafana_dashboards.cmd:Validating schema in templates/statuspage_grafyaml.yml", "stderr_lines": ["DEBUG:grafana_dashboards.cmd:Grafana URL override: http://grafana-c81e6h-dev.pathfinder.gov.bc.ca ", "DEBUG:grafana_dashboards.cmd:Grafana APIKey overridden", "INFO:grafana_dashboards.cmd:Validating schema in templates/statuspage_grafyaml.yml"], "stdout": "templates/statuspage_grafyaml.yml: ERROR: expected str for dictionary value @ data['dashboard']['rows'][4]['panels']['content']", "stdout_lines": ["templates/statuspage_grafyaml.yml: ERROR: expected str for dictionary value @ data['dashboard']['rows'][4]['panels']['content']"]}
  | to retry, use: --limit @/opt/platform-services/apps/statuspage/ansible/statuspage.retry
  |  
  | PLAY RECAP *********************************************************************
  | localhost                  : ok=31   changed=12   unreachable=0    failed=1
  |  
  | Failed to update dev
  |  
  | [webhook] 2020/01/15 20:17:20 [351dab] finished handling update_notifications
stewartshea commented 4 years ago

checks have been added to this to wait for the dashboard api endpoint to be available prior to testing out the new dashboard.