F5Networks / f5-ansible

Imperative Ansible modules for F5 BIG-IP products
GNU General Public License v3.0
375 stars 229 forks source link

Ansible bigip_ucs_fetch timeout after 2 minutes #2349

Open gchambard opened 1 year ago

gchambard commented 1 year ago
COMPONENT NAME

ansible_collections/f5networks/f5_modules/plugins/modules/bigip_ucs_fetch.py

Environment

ANSIBLE VERSION
ansible [core 2.12.10]
  config file = /home/sa-automation/.ansible.cfg
  configured module search path = ['/home/sa-automation/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/sa-automation/.local/lib/python3.9/site-packages/ansible
  ansible collection location = /var/lib/ansible/collections:/usr/share/ansible/collections
  executable location = /home/sa-automation/.local/bin/ansible
  python version = 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110]
  jinja version = 3.0.3
  libyaml = True
BIGIP VERSION
Sys::Version
Main Package
  Product     BIG-IP
  Version     15.1.8.2
  Build       0.0.17
  Edition     Point Release 2
  Date        Thu Mar 16 03:44:11 PDT 2023
CONFIGURATION

[defaults] host_key_checking = False collections_paths = /var/lib/ansible/collections:/usr/share/ansible/collections interpreter_python=auto vault_password_file = ~/.vault_password timeout=40 jinja2_extensions = jinja2.ext.do,jinja2.ext.loopcontrols

OS / ENVIRONMENT

Linux 5.10.0-22-amd64 #1 SMP Debian 5.10.178-3 (2023-04-22) x86_64 GNU/Linux

SUMMARY

We save our ucs archive with ansible and f5networks modules. It works for all our clusters except for the largest configuration (Nodes with LTM and APM and ASM etc.). The job failed with "An exception occurred during task execution. To see the full traceback, use -vvv. The error was: socket.timeout: The read operation timed out" Our ucs file is heavy ~1Go. Generate ucs file in GUI or CLI work fine and take ~4minutes. We can see an java timeout in restjavad log after 2 minutes.

We try to incrase differents timeout option, with F5 support (async_timeout in playbook, ansible.cfg, icrd and restjavad in Bigip), without success.

With our devops team we found this, and the F5 support asking to raise the quesitons directly here:

In bigip_ucs_fetch.py, line 505, the "async_wait" def used the "https://{0}:{1}/mgmt/tm/task/sys/ucs/{2}/result" URL to check the result of the task. https://github.com/F5Networks/f5-ansible/blob/devel/ansible_collections/f5networks/f5_modules/plugins/modules/bigip_ucs_fetch.py This URL is used line 511 with the "api" property (resp = self.client.api.get(uri)). This property is in the definition class "iControlRestSession" from icontrol.py, line 130 https://github.com/F5Networks/f5-ansible/blob/devel/ansible_collections/f5networks/f5_modules/plugins/module_utils/icontrol.py This class has a timeout set at 120 in the def init.

If we increase this value, the ansible playbook works normally.

Our analyse: The line 511 (resp = self.client.api.get(uri)) wait a result and do not exit whitout something. So, the loop for is useless. With a big UCS file, the save task take more 120 seconds and the task's result check falls in timeout before the end. This timeout causes the ansible error.

STEPS TO REPRODUCE

Try to generate an heavy UCS file with Ansible playbook and bigip_ucs_fetch or decrease the timeout in icontrol.py

Exemple playbook task:

    - name: "Create a new UCS file from {{ hostname }}"
      f5networks.f5_modules.bigip_ucs_fetch:
        src: "{{ ucs_name }}"
        only_create_file: true
        async_timeout: 1200
        provider: "{{ bigip_provider }}"
      delegate_to: localhost
EXPECTED RESULTS

The playbook generate UCS file without error after 2 minutes.

ACTUAL RESULTS
The playbook crash after 2 minutes with error: "An exception occurred during task execution. To see the full traceback, use -vvv. The error was: socket.timeout: The read operation timed out"
pgouband commented 1 year ago

Hi @gchambard,

Have you tried using bigip-ucs-fetch from declarative collection? https://clouddocs.f5.com/products/orchestration/ansible/devel/f5_bigip/modules_2_0/bigip_ucs_fetch_module.html#bigip-ucs-fetch-module-2

There is a timeout parameter.

pooyesh-jpg commented 1 year ago

We have the exact same issue ! I cannot run #ansible-galaxy collection list on the current version of Ansible which is installed :2.9.27 How can I check if declarative collection is installed. I try to change the playbook as described in above link but is not accepting timeout .

pgouband commented 1 year ago

Hi @pooyesh-jpg,

Here is the procedure to install declarative collection: https://clouddocs.f5.com/products/orchestration/ansible/devel/f5_bigip/install_f5_bigip.html

pooyesh-jpg commented 1 year ago

I have installed the declarative collection fine. but when running the playbook receiving below error: msg": "unable to load API plugin for network_os f5networks.f5_bigip.bigip"

gchambard commented 1 year ago

Hi @gchambard,

Have you tried using bigip-ucs-fetch from declarative collection? https://clouddocs.f5.com/products/orchestration/ansible/devel/f5_bigip/modules_2_0/bigip_ucs_fetch_module.html#bigip-ucs-fetch-module-2

There is a timeout parameter.

Hi,

sorry for my late reply, i've been working on another project.

I've tested the declarative collection and it doesn't seem to take the "timeout" parameter. The playbook return a timeout command error after 30 secondes. I've set it to 300 in my playbook.

- name: Create a new UCS
  f5networks.f5_bigip.bigip_ucs_fetch:
    src: "{{ ucs_name }}"
    only_create_file: true
    timeout: 300
  register: task

- name: Check for task completion and download created UCS
  f5networks.f5_bigip.bigip_ucs_fetch:
    dest: "{{ path + bigip + '/' + ucs_name }}"
    src: "{{ task.src }}"
    task_id: "{{ task.task_id }}"
    timeout: 300

If possible, i preferred to stay with the imperative collection. It's more easier to set up provider informations.

gchambard commented 1 year ago

HI,

after more tests, I managed to get the script to work with declarative collection and i reproduce the issue. For that, i needed to increase the "ansible_command_timeout".

My playbook:

- name: Set connection variables
  ansible.builtin.set_fact:
    ansible_host: "{{ bigip_provider.server }}"
    ansible_user: "{{ bigip_provider.user }}"
    ansible_httpapi_password: "{{ bigip_provider.password }}"
    ansible_network_os: f5networks.f5_bigip.bigip
    ansible_httpapi_use_ssl: true
    ansible_httpapi_validate_certs: false
    ansible_command_timeout: 1000
  delegate_to: localhost

- name: Create a new UCS
  f5networks.f5_bigip.bigip_ucs_fetch:
    dest: "{{ path + bigip + '/' + ucs_name }}"
  register: task

- name: Check for task completion and download created UCS
  f5networks.f5_bigip.bigip_ucs_fetch:
    dest: "{{ path + bigip + '/' + ucs_name }}"
    src: "{{ task.src }}"
    task_id: "{{ task.task_id }}"
    timeout: 1000

After ~300 secondes i've this error in "Create a new UCS" task:

fatal: [bigip]: FAILED! => {
    "changed": false,
    "module_stderr": "Expecting value: line 1 column 1 (char 0)",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"
}
pgouband commented 1 year ago

Hi @gchambard,

Thanks for the feedback. Can we close this issue?

gchambard commented 1 year ago

Hi, why close it? I've only managed to reproduce the problem with the declarative collection.

Can you help me?

pgouband commented 1 year ago

Hi @gchambard,

I misread your previous message.

I tested the following playbook without any issue using imperative module and async_timeout option.

- hosts: all
  collections:
    - f5networks.f5_modules
  connection: local

  vars:
    provider:
      server: "X.X.X.X"
      user: "admin"
      password: "mysecretpassword"
      server_port: 443
      validate_certs: no

  tasks:
     - name: Only create new UCS in Big-IP device, no download
       f5networks.f5_modules.bigip_ucs_fetch:
         provider: "{{ provider }}"
         src: f5-backup.ucs
         only_create_file: yes
         async_timeout: 600
       delegate_to: localhost
gchambard commented 1 year ago

Hi @pgouband

No problem. Have you used this playbook with an eavy BIGIP configuration ? The problem only appears with large backups, lasting more than 2 minutes.

pgouband commented 1 year ago

Hi @gchambard,

Currently now but I'll try when I have some time. I keep you updated.

pgouband commented 1 year ago

Hi @gchambard,

I tested with LTM and AWAF. UCS size is 58MB. Tmsh command to generate ucs runs more than 4 minutes and ansible playbook also.

Can we close the issue?

gchambard commented 1 year ago

Hi @pgouband

I see a difference in your test, my UCS file when i have a problem is more heavy, like +1Go. I understand that you are trying to reproduce my mistake.

What do you think of my initial analysis where we need to increase the value of "iControlRestSession" in icontrol.py? It doesn't look like a problem from my side.

pgouband commented 1 year ago

Hi @gchambard,

Have you checked why your UCS is +1GB? Is there any file you can delete like tcpdump?

An article explaining how to identify large file in ucs: https://my.f5.com/manage/s/article/K86995543

gchambard commented 1 year ago

@pgouband

I think it's the APM module:

[admin@a-eb4-bigip-mkt02:Standby:In Sync] ucs # tar -tvf test-gch.ucs | awk -v size="$size" '$3 >= size {print $3" "$6}' | sort -t' ' -k1,1nr | head -30
265555968 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-1372.0.iso_77070_1
247531520 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-1156.0.iso_71400_1
160106496 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-928.0.iso_70002_1
142321664 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-862.0.iso_67618_1
142000128 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-810.0.iso_63982_1
139745280 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-787.0.iso_64067_1
133836800 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-769.0.iso_64187_1
129515520 var/tmp/filestore_temp/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-753.0.iso_64076_1
6383347 var/tmp/storage_temp/_f1ml.fdt
4411717 var/tmp/ts_db.save_dir_3663.cstmp/ts_db.data.PLC.NEGSIG_SIGNATURES.cstmp
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_20013_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_20057_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_20101_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_20122_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_20150_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_23430_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_23448_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28221_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28234_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28348_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28351_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28354_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28357_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28359_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_28363_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_29573_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_29622_1
4137083 var/tmp/filestore_temp/files_d/Common_d/certificate_d/:Common:ca-bundle.crt_30971_1
3970543 var/tmp/cert_temp/ssl/ssl.crt/ca-bundle.crt
3622045 var/tmp/filestore_temp/files_d/Common_d/datasync_update_file_d/:Common:datasync-global:update-file-clntcap_update_v15.1.0__19700101_000000__20230316_091647__b3af76fd01121aadc7d9938c2c449bda_77223_1

I've checked, all ISO files are required.

[admin@a-eb4-bigip-mkt02:Standby:In Sync] ucs # grep epsec /config/bigip.conf
apm epsec epsec-package /Common/epsec-1.0.0-753.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-753.0.iso_64076_1
apm epsec epsec-package /Common/epsec-1.0.0-769.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-769.0.iso_64187_1
apm epsec epsec-package /Common/epsec-1.0.0-787.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-787.0.iso_64067_1
apm epsec epsec-package /Common/epsec-1.0.0-810.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-810.0.iso_63982_1
apm epsec epsec-package /Common/epsec-1.0.0-862.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-862.0.iso_67618_1
apm epsec epsec-package /Common/epsec-1.0.0-928.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-928.0.iso_70002_1
apm epsec epsec-package /Common/epsec-1.0.0-1156.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-1156.0.iso_71400_1
apm epsec epsec-package /Common/epsec-1.0.0-1372.0.iso {
    cache-path /config/filestore/files_d/Common_d/epsec_package_d/:Common:epsec-1.0.0-1372.0.iso_77070_1
pgouband commented 1 year ago

Hi @gchambard,

I suggest you open a ticket via https://myf5.com to check if you can use the following articles safely to determine the active version and remove unnecessary EPSEC packages.

Determining the active OPSWAT version https://my.f5.com/manage/s/article/K14207

Removing unnecessary OPSWAT EPSEC packages from the BIG-IP APM system https://my.f5.com/manage/s/article/K21175584