f5devcentral / ansible-role-f5_atc_deploy_declaration

Ansible role used to deploy declaratives to F5 Automated Tool Chain services: AS3, DO, and TS
Apache License 2.0
7 stars 11 forks source link

Support token timeout handling #16

Open tkam8 opened 4 years ago

tkam8 commented 4 years ago
ISSUE TYPE
COMPONENT NAME

ansible-role-f5_atc_deploy_declaration

ANSIBLE VERSION
root@ip-10-1-1-5:/# ansible --version
ansible 2.9.4
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.17 (default, Nov  7 2019, 10:07:09) [GCC 9.2.1 20191008]
PYTHON VERSION
root@ip-10-1-1-5:/# python -V
Python 2.7.17
BIGIP VERSION
admin@(localhost)(cfg-sync Standalone)(NO LICENSE)(/Common)(tmos)# show sys version

Sys::Version
Main Package
  Product     BIG-IP
  Version     14.1.2.3
  Build       0.0.5
  Edition     Point Release 3
  Date        Tue Dec 17 19:48:55 PST 2019
OS / ENVIRONMENT

Running ansible from AWX 9.1.0.0 BIG-IQ 7.1 BIG-IP spun up in GCP with startup script to install AS3, DO, TS

SUMMARY

Need the ability to extend the BIG-IQ token timeout to something like 1200 to account for delay when BIG-IP is provisioning modules like asm, as well as BIG-IQ onboarding/discovery/import tasks. Default is 300s, I need at least 600s.

    - name: ATC POST
      include_role:
        name: f5devcentral.atc_deploy
      vars: 
        provider: "{{ provider_atc }}"
        atc_service: Device
        atc_method: POST
        atc_declaration_file: "files/onboard_bigip_do_{{ item.atc_declaration_file }}.json"
        atc_delay: 15
        atc_retries: 40
      when: "{{ requested_modules == item.when }}"
      loop:
        - { atc_declaration_file: "ltm", when: "ltm" }
        - { atc_declaration_file: "asm", when: "asm" }

DO declaration: https://github.com/tkam8/bigiq-ansible/blob/master/lab/tower/templates/onboard_bigip_do_asm.j2

atc_timeout is for socket timeout

BIG-IQ access token timeout is 300s

{
    "redirected": false,
    "url": "https://10.1.1.4:443/mgmt/shared/authn/login",
    "status": 200,
  ----snip----
    "json": {
        "username": "admin",
        "loginReference": {
            "link": "https://localhost/mgmt/cm/system/authn/providers/local/login"
        },
        "loginProviderName": "local",
        "token": {
            "token": ----snip----
            "userName": "admin",
            "authProviderName": "local",
            "user": {
                "link": "https://localhost/mgmt/shared/authz/users/admin"
            },
            "groupReferences": [],
            "timeout": 300,  <----------!!!
   ----snip----

Log from ansible after 5 min, although I'm setting retries for 10m:

----snip----
"content": "{\"code\":401,\"message\":\"Invalid registered claims.\",\"referer\":\"10.1.1.5\",\"restOperationId\":65549650,\"errorStack\":[],\"kind\":\":resterrorresponse\"}",
    "redirected": false,
    "url": "https://10.1.1.4:443/mgmt/shared/declarative-onboarding/task/7c020a7d-7129-4fda-a7aa-5ae6676b8d6b",
    "status": 401,
----snip----
tkam8 commented 4 years ago

Considering something like this to rescue from timeouts:

- name: Handle BIG-IQ token timeouts during BIG-IP onboarding
      block:
        - name: include atc_task_check.yaml
          include_tasks: atc_task_check.yaml
          when:
            - atc_service == "AS3" or atc_service == "Device"
      rescue:
        - debug:
            msg: "caught error: {{ atc_DO_status.json.message }}"
        - name: Re authenticate to BIG-IQ
          include_tasks: authentication.yaml
          when: atc_DO_status.json.message == "Invalid registered claims"
        - name: Redo atc_task_check.yaml
          include_tasks: atc_task_check.yaml
          when:
            - atc_service == "AS3" or atc_service == "Device"
rjouhann commented 4 years ago

Currently auth tokens have a hardcoded maximum lifetime of 5 minutes on BIG-IQ. This might change in the future. However, we should be able to refresh the token.

tkam8 commented 4 years ago

Updated to handle looping the tasks x number of times, as DO can take longer than 10min...

Think I have something worth considering:

In atc_task_check.yaml , if there is a 401 which is due to token expiration, it enters in the rescue block up to 3 times (or this could be a variable), where it runs the authentication tasks before calling itself (this retry task) to redo the checks.

- name: Wait for DO Task to complete (with retry)
  block:
    - name: Set the retry count
      set_fact:
        retry_count: "{{ 0 if retry_count is undefined else retry_count|int + 1 }}"

    - name: Run check
      uri:
        url: "https://{{ provider.server }}:{{ provider.server_port }}{{ atc_url }}/task/{{ atc_DO_result.json.id }}"
        method: GET
        headers:
          X-F5-Auth-Token: "{{ f5_auth_token }}"
        return_content: true
        validate_certs: "{{ provider.validate_certs }}"
        status_code: 200
      register: atc_DO_status
      #until: atc_DO_status is success
      until: "atc_DO_status is success or atc_DO_status.status == 401"
      retries: "{{ atc_retries }}"
      delay: "{{ atc_delay }}"
      delegate_to: localhost
      when:
        - atc_service == "Device"
        - atc_method == "POST"
  # Rescue block for handling BIG-IQ token timeouts
  rescue:
    - fail:
        msg: Ended after 3 retries
      when: retry_count|int == 3

    - debug:
        msg: "caught error: {{ atc_DO_status.json.message }}"

    - name: Re authenticate to BIG-IQ
      include_tasks: authentication.yaml
      when: atc_DO_status.json.message == "Invalid registered claims."

    - name: Redo check
      include_tasks: atc_task_check.yaml