F5Networks / f5-bigip-runtime-init

Apache License 2.0
14 stars 15 forks source link

AS3/FAST RPM installation fails in GCP cloud deployment intermittent #46

Open RavinderReddyF5 opened 2 years ago

RavinderReddyF5 commented 2 years ago

error snapshot:

Wed, 17 Aug 2022 19:55:35 GMT - severe: FAST Worker: Failed to save config: Error: AS3 Driver failed to GET declaration: Request failed with status code 404
""
    at AS3Driver._handleAS3Error (/var/config/rest/iapps/f5-appsvcs-templates/lib/drivers.js:597:19)
    at /var/config/rest/iapps/f5-appsvcs-templates/lib/drivers.js:570:24
    at callReaction (/var/config/rest/iapps/f5-appsvcs-templates/node_modules/core-js/modules/es.promise.constructor.js:75:18)
I am able to replicate the following error:

and this error occurs because AS3 in a bad state; interesting thing is that AS3 was alive at some point after installation:

2022-08-17T19:30:47.381Z [25812]: info: Validating - as3 extension is available.
2022-08-17T19:30:47.382Z [25812]: silly: Making request: GET http://localhost:8100/mgmt/shared/appsvcs/info verifyTls: true
2022-08-17T19:30:48.674Z [25812]: silly: Request response: 404 {"code":404,"message":"","referer":"Unknown","errorStack":[]}
2022-08-17T19:30:48.675Z [25812]: silly: Error: Is available check failed 404
2022-08-17T19:30:51.675Z [25812]: silly: Retrying... Attempts left: 9
2022-08-17T19:30:51.677Z [25812]: silly: Making request: GET http://localhost:8100/mgmt/shared/appsvcs/info verifyTls: true
2022-08-17T19:30:53.469Z [25812]: silly: Request response: 200 {"version":"3.38.0","release":"4","schemaCurrent":"3.38.0","schemaMinimum":"3.0.0"}

but then, FAST checks for AS3 availability starts failing:

Wed, 17 Aug 2022 19:36:43 GMT - info: FAST Worker [0]: Entering Fetching AS3 info
Wed, 17 Aug 2022 19:36:43 GMT - finest: socket 342 opened
Wed, 17 Aug 2022 19:36:43 GMT - severe: [RestOperationDispatcher] 'shared/fast/info' not found.
Wed, 17 Aug 2022 19:36:43 GMT - severe: [ErrorHandlingModule] RestOperation failed: "/shared/fast/info". {"code":404,"message":"","referer":"Unknown","originalRequestBody":"","errorStack":[]}
Wed, 17 Aug 2022 19:36:44 GMT - info: FAST Worker [0]: Exiting Fetching AS3 info

and right now, AS3 is not listed under installed applications:
# curl -s -u admin: http://localhost:8100/mgmt/shared/iapp/global-installed-packages | jq .items[].appName
"f5-service-discovery"
"f5-declarative-onboarding"
"f5-cloud-failover"
"f5-telemetry"
"f5-appsvcs-templates"

somehow, AS3 gets uninstalled or it fails to start after restnoded restarts caused by other extension installations

under restnoded.log, I found the following message around the failure time:
Wed, 17 Aug 2022 19:36:26 GMT - warning: [appsvcs] {"message":"AS3 version: 3.38.0","level":"warning"}
ok - I have tried to re-run installation without DO config; just extensions and everything went through (but on 3rd attempt - couple times, I was getting a null rpm file from Github which caused installation failure)

i re-run declaration with just DO config (after installation exts were done before) and everything seems to be working without problems 

how consistent this issue? as I said before, it seems that the AS3 gets into bad state after restnoded restarts (restarts caused by extensions installation) - if this is a consistent issue, this requires a ticket/JIRA for RuntimeInit as first step 

ugghh - I am not able to re-run installation from the same host due to this error: 
RPM installation failed: Package f5-telemetry version 1.30.0-1 has status null
this appears to be happening because Github returns null file - potentially, this can be improved on RuntimeInit side as well 

perhaps, github throttle downs request from this IP because two many requests 

anyhow, if this is a consistent issue, I would advise to report a bug (contact Shyaw Karim or Krithika Chidambaram - they will file a JIRA story)  for Runtime Init to investigate why AS3 is in a bad state after installation all other extensions
RavinderReddyF5 commented 2 years ago

@shyawnkarim this issues reported based on comments from @andreykashcheev. he is aware of issue.

f5-applebaum commented 2 years ago

Hi,

Noticed repo that repo is defaulting to: n1-standard-4 https://github.com/F5Networks/terraform-gcp-bigip-module/blob/main/variables.tf#L28

When installing many extensions, noticed this type of symptom with smaller images. Can you try bumping up to:

n1-standard-8: ex. https://github.com/F5Networks/f5-google-gdm-templates-v2/blob/main/examples/quickstart/sample_quickstart.yaml#L36

to see if that helps resolve it. If that doesn't work, possibly increasing the delay between installs:

https://github.com/F5Networks/f5-bigip-runtime-init#controls extensionInstallDelayInMs: 15000

shyawnkarim commented 2 years ago

@RavinderReddyF5, did @f5-applebaum's advice solve your issue?

JeffGiroux commented 2 years ago

getting similar failures in GCP. Tried template as-is with tag 2.4.0.0 and also latest 2.6.0.0. Both deployments of quickstart result in same failure.

instance = n1-standard-8

snippet... 2022-11-03T23:31:53.888Z [3042]: info: Validating - fast extension is available after restnoded restart. 2022-11-03T23:32:53.601Z [3042]: error: Is available check failed 404

full...

cat /var/log/cloud/startup-script-post-swap-nic.log
2022-11-03T23:29:57.153Z [3042]: info: Configuration file: /config/cloud/runtime-init-conf.yaml
2022-11-03T23:29:57.176Z [3042]: info: Processing controls parameters
2022-11-03T23:29:57.180Z [3042]: info: Validating provided declaration
2022-11-03T23:29:57.289Z [3042]: info: Successfully validated declaration
2022-11-03T23:29:57.377Z [3042]: info: Resolving parameters
2022-11-03T23:29:58.428Z [3042]: info: Executing install operations.
2022-11-03T23:29:58.439Z [3042]: info: Installing - do 1.33.0
2022-11-03T23:30:02.182Z [3042]: info: Validating - do extension is available.
2022-11-03T23:30:15.223Z [3042]: info: Installing - as3 3.40.0
2022-11-03T23:30:19.567Z [3042]: info: Validating - as3 extension is available.
2022-11-03T23:30:54.873Z [3042]: info: Installing - ts 1.32.0
2022-11-03T23:31:01.831Z [3042]: info: Validating - ts extension is available.
2022-11-03T23:31:11.851Z [3042]: info: Installing - fast 1.21.0
2022-11-03T23:31:14.675Z [3042]: info: Validating - fast extension is available.
2022-11-03T23:31:44.750Z [3042]: info: fast extension  is not available. Attempt to restart restnoded.
2022-11-03T23:31:53.888Z [3042]: info: Validating - fast extension  is available after restnoded restart.
2022-11-03T23:32:53.601Z [3042]: error: Is available check failed 404
2022-11-03T23:32:53.601Z [3042]: info: Sending F5 Teem report for failure case.
2022-11-03T23:32:58.754Z [3042]: info: {"id":"defc7c6b-c4ad-878d-65b645131685","product":"BIG-IP","cpuCount":8,"diskSize":83968,"memoryInMb":30160,"version":"16.1.3.2","nicCount":3,"regKey":"OOUIC-GSJFA-JMOZI-BVXJR-YAFXINC","platformId":"Z100","hostname":"bigip1","management":"10.0.0.2/32","provisionedModules":{"ltm":"nominal"},"installedPackages":{"f5-service-discovery-1.10.15-1.noarch":"1.10.15","f5-declarative-onboarding-1.33.0-7.noarch":"1.33.0","f5-appsvcs-3.40.0-5.noarch":"3.40.0","f5-telemetry-1.32.0-2.noarch":"1.32.0","f5-appsvcs-templates-1.21.0-1.noarch":"1.21.0"},"environment":{"pythonVersion":"Python 2.7.5","pythonVersionDetailed":"2.7.5 (default, Sep 14 2022, 06:56:50) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]","nodeVersion":"v6.9.1","libraries":{"ssh":"OpenSSH_7.4p1, OpenSSL 1.0.2u-fips  20 Dec 2019"}}}
2022-11-03T23:33:05.777Z [3042]: info: F5 Teem report was successfully sent for failure case.
2022-11-03T23:33:05.778Z [3042]: info: Is available check failed 404
[admin@localhost:Active:Standalone] ~ # cat /var/log/cloud/startup-script-post-swap-nic.log 
shyawnkarim commented 1 year ago

This issue, internal ID ESECLDTPLT-3219, has already been completed and will be available with the next release.