ansible / workshops

Training Course for Ansible Automation Platform
MIT License
1.75k stars 1.14k forks source link

Ansible Workshops and Demos failure on RHPDS. #1320

Closed Chetan-07 closed 3 years ago

Chetan-07 commented 3 years ago

Problem Summary

Currently affected demos are: Ansible Compliance Demos, Ansible Developer Demos, Ansible F5 Demos, Ansible F5 Automation Workshop, Ansible Automation Workshop (all T shirt sizes), Ansible Network Demos, Ansible RHEL 90 Tower Workshop, Ansible Security Automation, Ansible Security Demos, Ansible Windows Demos, Ansible Windows Workshop

Issue Type

Bug

Extra vars file

ERROR: service provision request 30000000168323 for RHPDS-DEM-julin-redhat.com-PROD_ANSIBLE_WORKSHOPS-cb03 catalog item Ansible Network Automation Workshop (T) failed when provisioning. The problem is happening in step checkSoftwareDeploy

region = na_gpte catalogItemName = Ansible Network Automation Workshop (T) User Email = julin@redhat.com guid = cb03

TASK [/tmp/ansible-workshops-cb03/ansible_agnostic_deployer/ansible/workdir/ansible-workshops/provisioner/../roles/code_server : issue cert] FAILED - RETRYING: issue cert (5 retries left). FAILED - RETRYING: issue cert (4 retries left). FAILED - RETRYING: issue cert (3 retries left). FAILED - RETRYING: issue cert (2 retries left). FAILED - RETRYING: issue cert (1 retries left). fatal: [cb03-student1-ansible-1]: FAILED! => {"attempts": 5, "changed": true, "cmd": "certbot certonly --no-bootstrap --standalone -d student1-code.cb03.example.opentlc.com --email ansible-network@redhat.com --noninteractive --agree-tos", "delta": "0:00:00.264958", "end": "2021-09-27 02:55:12.325243", "msg": "non-zero return code", "rc": 1, "start": "2021-09-27 02:55:12.060285", "stderr": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 583, in _build_master\n ws.require(requires)\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 900, in require\n needed = self.resolve(parse_requirements(requirements))\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 791, in resolve\n raise VersionConflict(dist, req).with_context(dependent_req)\npkg_resources.ContextualVersionConflict: (requests 2.6.0 (/usr/local/lib/python3.6/site-packages), Requirement.parse('requests>=2.14.2'), {'acme'})\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/bin/certbot\", line 6, in \n from pkg_resources import load_entry_point\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3251, in \n @_call_aside\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3235, in _call_aside\n f(args, kwargs)\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3264, in _initialize_master_working_set\n working_set = WorkingSet._build_master()\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 585, in _build_master\n return cls._build_from_requirements(requires)\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 598, in _build_from_requirements\n dists = ws.resolve(reqs, Environment())\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 786, in resolve\n raise DistributionNotFound(req, requirers)\npkg_resources.DistributionNotFound: The 'requests>=2.14.2' distribution was not found and is required by acme", "stderr_lines": ["Traceback (most recent call last):", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 583, in _build_master", " ws.require(requires)", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 900, in require", " needed = self.resolve(parse_requirements(requirements))", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 791, in resolve", " raise VersionConflict(dist, req).with_context(dependent_req)", "pkg_resources.ContextualVersionConflict: (requests 2.6.0 (/usr/local/lib/python3.6/site-packages), Requirement.parse('requests>=2.14.2'), {'acme'})", "", "During handling of the above exception, another exception occurred:", "", "Traceback (most recent call last):", " File \"/bin/certbot\", line 6, in ", " from pkg_resources import load_entry_point", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3251, in ", " @_call_aside", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3235, in _call_aside", " f(*args, **kwargs)", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3264, in _initialize_master_working_set", " working_set = WorkingSet._build_master()", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 585, in _build_master", " return cls._build_from_requirements(requires)", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 598, in _build_from_requirements", " dists = ws.resolve(reqs, Environment())", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 786, in resolve", " raise DistributionNotFound(req, requirers)", "pkg_resources.DistributionNotFound: The 'requests>=2.14.2' distribution was not found and is required by acme"], "stdout": "", "stdout_lines": []} ...ignoring

Ansible Playbook Output

TASK [/tmp/ansible-workshops-cb03/ansible_agnostic_deployer/ansible/workdir/ansible-workshops/provisioner/../roles/code_server : issue cert] FAILED - RETRYING: issue cert (5 retries left). FAILED - RETRYING: issue cert (4 retries left). FAILED - RETRYING: issue cert (3 retries left). FAILED - RETRYING: issue cert (2 retries left). FAILED - RETRYING: issue cert (1 retries left). fatal: [cb03-student1-ansible-1]: FAILED! => {"attempts": 5, "changed": true, "cmd": "certbot certonly --no-bootstrap --standalone -d student1-code.cb03.example.opentlc.com --email ansible-network@redhat.com --noninteractive --agree-tos", "delta": "0:00:00.264958", "end": "2021-09-27 02:55:12.325243", "msg": "non-zero return code", "rc": 1, "start": "2021-09-27 02:55:12.060285", "stderr": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 583, in _build_master\n ws.require(requires)\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 900, in require\n needed = self.resolve(parse_requirements(requirements))\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 791, in resolve\n raise VersionConflict(dist, req).with_context(dependent_req)\npkg_resources.ContextualVersionConflict: (requests 2.6.0 (/usr/local/lib/python3.6/site-packages), Requirement.parse('requests>=2.14.2'), {'acme'})\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/bin/certbot\", line 6, in \n from pkg_resources import load_entry_point\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3251, in \n @_call_aside\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3235, in _call_aside\n f(args, kwargs)\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3264, in _initialize_master_working_set\n working_set = WorkingSet._build_master()\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 585, in _build_master\n return cls._build_from_requirements(requires)\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 598, in _build_from_requirements\n dists = ws.resolve(reqs, Environment())\n File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 786, in resolve\n raise DistributionNotFound(req, requirers)\npkg_resources.DistributionNotFound: The 'requests>=2.14.2' distribution was not found and is required by acme", "stderr_lines": ["Traceback (most recent call last):", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 583, in _build_master", " ws.require(requires)", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 900, in require", " needed = self.resolve(parse_requirements(requirements))", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 791, in resolve", " raise VersionConflict(dist, req).with_context(dependent_req)", "pkg_resources.ContextualVersionConflict: (requests 2.6.0 (/usr/local/lib/python3.6/site-packages), Requirement.parse('requests>=2.14.2'), {'acme'})", "", "During handling of the above exception, another exception occurred:", "", "Traceback (most recent call last):", " File \"/bin/certbot\", line 6, in ", " from pkg_resources import load_entry_point", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3251, in ", " @_call_aside", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3235, in _call_aside", " f(*args, **kwargs)", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 3264, in _initialize_master_working_set", " working_set = WorkingSet._build_master()", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 585, in _build_master", " return cls._build_from_requirements(requires)", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 598, in _build_from_requirements", " dists = ws.resolve(reqs, Environment())", " File \"/usr/local/lib/python3.6/site-packages/pkg_resources/init.py\", line 786, in resolve", " raise DistributionNotFound(req, requirers)", "pkg_resources.DistributionNotFound: The 'requests>=2.14.2' distribution was not found and is required by acme"], "stdout": "", "stdout_lines": []} ...ignoring

Ansible Version

3.7.3

Ansible Configuration

Demo has failed during deployment.

Ansible Execution Node

CLI Ansible (Ansible Core)

Operating System

RHEL

IPvSean commented 3 years ago

hey @Chetan-07 is this still an issue? I saw some reports this morning about EPEL and was curious if this was a separate issue or related.

Also the extra_vars file.... in the ticket is being filled out incorrectly. Here is example files of what I am looking for-> https://github.com/ansible/workshops/tree/devel/provisioner/sample_workshops

Chetan-07 commented 3 years ago

hey @Chetan-07 is this still an issue? I saw some reports this morning about EPEL and was curious if this was a separate issue or related.

Also the extra_vars file.... in the ticket is being filled out incorrectly. Here is example files of what I am looking for-> https://github.com/ansible/workshops/tree/devel/provisioner/sample_workshops

@IPvSean The issue still exists. This has impacted a lot of workshops and demos. Request you to check on this.

Btw, thanks for a reference link

IPvSean commented 3 years ago

@Chetan-07 I did a very bad thing... (direct commit to master) but can you try again?

Something must have reved on dnf and/or RHEL8 to not have a correct version of certbot so I added a task->

https://github.com/ansible/workshops/blob/master/roles/issue_cert/tasks/main.yml#L19

Chetan-07 commented 3 years ago

@Chetan-07 I did a very bad thing... (direct commit to master) but can you try again?

Something must have reved on dnf and/or RHEL8 to not have a correct version of certbot so I added a task->

https://github.com/ansible/workshops/blob/master/roles/issue_cert/tasks/main.yml#L19

Ordered few demos for testing.

IPvSean commented 3 years ago

hey @Chetan-07 sounds great, if it fails on a new task let me know and I can troubleshoot

Chetan-07 commented 3 years ago

Hey @IPvSean Ansible Linux Automation Workshop (S) Failed on the below task:

TASK [/tmp/ansible-workshops-3702/ansible_agnostic_deployer/ansible/workdir/ansible-workshops/provisioner/../roles/populate_tower : add tower credential into ansible tower] *** fatal: [3702-student1-ansible-1]: FAILED! => {"changed": false, "msg": "value of kind must be one of: aws, controller, gce, azure_rm, openstack, satellite6, rhv, vmware, aim, conjur, hashivault_kv, hashivault_ssh, azure_kv, insights, kubernetes_bearer_token, net, scm, ssh, github_token, gitlab_token, vault, got: tower"}

PLAY RECAP ***** 3702-student1-ansible-1 : ok=80 changed=58 unreachable=0 failed=1 skipped=6 rescued=0 ignored=0
3702-student1-node1 : ok=8 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
3702-student1-node2 : ok=8 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
3702-student1-node3 : ok=8 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
attendance-host : ok=26 changed=22 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
localhost : ok=94 changed=31 unreachable=0 failed=0 skipped=34 rescued=0 ignored=0

FAIL ansible-workshops-3702 ansible return code: 2

Chetan-07 commented 3 years ago

Ansible Windows Workshop also failed on same Task

IPvSean commented 3 years ago

🤔 what version of ansible are you running? It looks like someone revved up the version of awx collection... i got an idea though...

IPvSean commented 3 years ago

retry now @Chetan-07

Chetan-07 commented 3 years ago

retry now @Chetan-07

Testing

Chetan-07 commented 3 years ago

@IPvSean Ansible Windows Workshop and Ansible Linux Automation Workshop (S) failed on another Task:

**TASK** [/tmp/ansible-workshops-dbf6/ansible_agnostic_deployer/ansible/workdir/ansible-workshops/provisioner/../roles/populate_tower : add tower credential into ansible tower] ***
fatal: [dbf6-student1-ansible-1]: FAILED! => {"changed": false, "msg": "value of kind must be one of: aws, controller, gce, azure_rm, openstack, satellite6, rhv, vmware, aim, conjur, hashivault_kv, hashivault_ssh, azure_kv, insights, kubernetes_bearer_token, net, scm, ssh, github_token, gitlab_token, vault, got: tower"}

PLAY RECAP *********************************************************************
attendance-host            : ok=26   changed=22   unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
dbf6-student1-ansible-1    : ok=80   changed=58   unreachable=0    failed=1    skipped=6    rescued=0    ignored=0   
dbf6-student1-node1        : ok=8    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
dbf6-student1-node2        : ok=8    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0    
dbf6-student1-node3        : ok=8    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
localhost                  : ok=94   changed=31   unreachable=0    failed=0    skipped=34   rescued=0    ignored=0   

FAIL ansible-workshops-dbf6 ansible return code: 2

Ansible Network Automation Workshop (T)

TASK [/tmp/ansible-workshops-013d/ansible_agnostic_deployer/ansible/workdir/ansible-workshops/provisioner/../roles/populate_tower : add tower credential into ansible tower] ***
fatal: [013d-student1-ansible-1]: FAILED! => {"changed": false, "msg": "value of kind must be one of: aws, controller, gce, azure_rm, openstack, satellite6, rhv, vmware, aim, conjur, hashivault_kv, hashivault_ssh, azure_kv, insights, kubernetes_bearer_token, net, scm, ssh, github_token, gitlab_token, vault, got: tower"}

PLAY RECAP *********************************************************************
013d-student1-ansible-1    : ok=82   changed=61   unreachable=0    failed=1    skipped=6    rescued=0    ignored=1
attendance-host            : ok=26   changed=22   unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
localhost                  : ok=116  changed=39   unreachable=0    failed=0    skipped=24   rescued=0    ignored=0
IPvSean commented 3 years ago

yup same error, let me investigate....

IPvSean commented 3 years ago

testing a fix....

IPvSean commented 3 years ago

ok @Chetan-07 try it now....

Chetan-07 commented 3 years ago

@IPvSean Got 3 success Ansible Linux Automation Workshop (S) -- PROD_ANSIBLE_WORKSHOPS-c508_COMPLETED Ansible Linux Automation Workshop (S) -- PROD_ANSIBLE_WORKSHOPS-f46d_COMPLETED Ansible Network Automation Workshop (T) -- PROD_ANSIBLE_WORKSHOPS-29e0_COMPLETED

Chetan-07 commented 3 years ago

@IPvSean Thanks for looking into this. CI's has been enabled. Can you summarize what was the issue for failure? will be helpful for Post Mortom report

IPvSean commented 3 years ago

@Chetan-07 for this particular issue (#1320) there was two problems->

  1. the task TASK [/tmp/ansible-workshops-cb03/ansible_agnostic_deployer/ansible/workdir/ansible-workshops/provisioner/../roles/code_server : issue cert] *** was failing because upstream RHEL8 and/or certbot and/or python had some sort of disagreement and the version of python-requests was not correct, e.g. what RHEL8.4 installed by default did not contain 2.14.2 or newer which is what certbot needed to work correctly

this was fixed with the following task into the ansible.workshops.issue_cert role

    - name: Install requests python package
      pip:
        name: requests>=2.14.2

For this particular issue this was nothing that anyone at Ansible BU or GPTE did... it was an upstream change.

  1. The 2nd issue... is because engineers on GPTE side are adding new catalog items for AAP 2 which requires automation controller collection. The Automation controller collection is not compatible with Ansible Tower. Apparently (discovering this now) the collections that are installed on GPTE Babylon are global? @tonykay can fact check this? So what I did is I removed tasks that used were no longer compatible (1 task in ansible/product-demos and 1 task here in ansible/workshops)... this is not a great fix... because there is going to be one exercise for network automation that won't work now.... https://github.com/ansible/workshops/tree/devel/exercises/ansible_network/6-controller-job-template

I am brain storming some ideas to keep this working..... most students will figure this out... but a lot will get stuck based on my own experience... this is already fixed in devel... but that is all AAP 2... with the newest collection

the fix was to remove this->

- name: add controller credential into automation controller
  awx.awx.credential:
    name: "Controller Credential"
    credential_type: Red Hat Ansible Automation Platform
    organization: Default
    controller_username: admin
    controller_password: "{{ admin_password }}"
    controller_host: "https://{{ ansible_host }}"
    validate_certs: false
    inputs:
      host: "{{ username }}.{{ ec2_name_prefix }}.{{ workshop_dns_zone }}"
      username: admin
      password: "{{ admin_password }}"

the problem is that the name of Ansible Tower has switched to Automation controller... and all the corresponding credentials for it have changed.... but the module itself is not backwards compatible....... I have a couple paths forward but they are not trival... so it will take some more time.