F5Networks / f5-appsvcs-extension

F5 BIG-IP Application Services 3 Extension
Apache License 2.0
168 stars 54 forks source link

Service Discovery Causes Massive timeout issues with UCS Creation #816

Open VDI-Tech-Guy opened 7 months ago

VDI-Tech-Guy commented 7 months ago

Environment

Summary

When trying to create a UCS Backup of a BIG-IP the time it takes to create that UCS is massively increased as per https://cdn.f5.com/product/bugtracker/ID985329.html

This is stemmed from

Mar 28 12:23:32 bigip.f5demo.net err iAppsLX_save_pre[14953]: Failed task: /shared/iapp/build-package/bf9e1f3e-7826-4728-b8f0-67a2fb6b4b40: rpmbuild command failed: com.f5.rest.workers.shell.CommandExecuteException: Command execution process killed Mar 28 12:23:32 bigip.f5demo.net err iAppsLX_save_pre[14953]: Failed to get getRPM build response within timeout for f5-service-discovery Mar 28 12:23:32 bigip.f5demo.net info iAppsLX_save_pre[14953]: Exporting: f5-appsvcs - /var/config/rest/iapps/f5-appsvcs

If you remove AS3 and Service Discovery, UCS Backups on a clean bigip (Nothing on it) takes ~20 seconds, when this is implemented can be from 2-5 Minutes due to this timeout, i have also seen where UCS never gets created as well.

Talked with Mark Dittmer abou tthis and he suggested an issue ticket.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Have fresh BIG-IP (or you can use my Ansible 101/201 UDF Blueprint)
  2. Install AS3 (UDF Blueprint has a version already on it)
  3. In my usecase (using UDF) i ran the Backup and Restore information as per documentation
  4. Error occurs typically that says increase timeout value ~3-5 minutes
  5. Use Webshell on F5 to do access /var/log/ltm and see error indicated above.

To fix behavior in the lab login to TMUI and remove AS3/DO via AppsLX Section including Service Discovery, and re-run playbook above takes ~20 Seconds to a minute to complete with no issues.

Expected Behavior

This error shouldnt occur and create a massive time delay in creating UCS Files, it should take approximatly the same amount of time to backup a UCS with Ansible with our without AS3.

Actual Behavior

Backups slow down to the point of Ansible failing via timeouts or never completing the tasks.

VDI-Tech-Guy commented 5 months ago

Adding notes here that Restore Times for UCS Files are also impacted by this as well, sometimes the restore commands will hang or never fully run (or take 10+ minutes) when the AS3/Service Discovery is removed everything moves fast.