Closed thepowercoders closed 1 year ago
Can you try deploying with instance type, Standard_D8s_v4, and let us know if you are still getting these errors?
When installing extensions, more vCPUs and memory are needed. For example, this deployment uses three extensions and deploys the larger instance type by default, as seen here.
Hi - okay I tested with D8s. First time it failed, then passed a couple of times - then failed again. It seems a little better but still no use as it's too unreliable. I've checked the CPU and memory using top for the running system and there does not appear to be any issue with low memory or cpu. I've also done a lot of testing with removing the packages and then re-running the init script to reinstall them. Again, sometimes there is a failure depending on if the F5 is 'busy' with other tasks but generally it works every time on re-run.
I think the underlying issue is that the RPM files are getting bigger each time, but the wait count for install is fixed at 20 seconds. Telemetry streaming RPM has grown from 8.5MB in v1.1 to 20MB in the current version. Would it be possible to just allow a configuration value for the wait/retry time for the packages to cope with transient delays like there is the HTTP_RETRY parm?
Hi - just an update on this. I've noticed a particular issue with a delay after AS3 is loaded. I also see that once loaded, AS3 then automatically requests and downloads Service Discovery RPM. I therefore switched my script so that telemetry was being downloaded first and AS3 last, and this seems to have improved the script processing - I am getting no failures so far.
I would also recommend you look into the logic of the bigip-runtime-init script. As you see in the logs, it is timing out even though the task is returning a "STARTED" status - indicating that the bigip has started processing the task. I don't really see why the script would time out this state so quickly... Surely wait until the task gives another status such as "FAILED" before bailing out, or until the task itself times out in restnoded (which I presume it does eventually).
Thanks for these additional details. A ticket, internal ID EC-147, has been created for engineering to look into.
Hi @antonywm, assuming you aren't using our ARM templates to deploy, can you share the user data script you're using to download, install, and run runtime init? Also, which version of BIG-IP are you using?
The extension installation delay can be configured in the controls block, or using an environment variable: https://github.com/F5Networks/f5-bigip-runtime-init#controls
Also, the iControl REST services used by AT packages have a separate bucket of memory and timeout values that usually need to be tweaked using db variables before mcpd starts up (which is why you wouldn't necessarily see an impact using top). You can see how they are used in our templates here: https://github.com/F5Networks/f5-azure-arm-templates-v2/blob/d5082fac3322d1c24e2112613f2b6cc042f0573f/examples/modules/bigip-standalone/bigip.json#L254
Please add these to your user data if not already present, and let us know how it goes.
/usr/bin/setdb provision.extramb 1000 /usr/bin/setdb restjavad.useextramb true
/usr/bin/setdb iapplxrpm.timeout 300 /usr/bin/setdb icrd.timeout 180 /usr/bin/setdb restjavad.timeout 180 /usr/bin/setdb restnoded.timeout 180
(Issue EC-147 mentioned above is for adding these variables to the runtime init examples/documentation)
Hi @mikeshimkus - my onboard script is here. It's from your example one, just modified a bit as we pull the rpm's and DO scripts from azure storage. bigip version is 16.1.3.3.
I did see the extension delay parm but this is just the delay BETWEEN installing the extensions, not the delay waiting for the install task to complete. I also found the parm "HTTP_RETRY" detailed here which I thought would fix the issue, but it didn't seem to do anything when I tested it.
VM is provisioned through Terraform so there is some parameterization in my template too. I run 'provision.extramb 500' already so let me expand that to 1000 and also add the timeout extensions.
@thepowercoders We just released a new version of Runtime Init: https://github.com/F5Networks/f5-bigip-runtime-init/releases/tag/1.6.1
It adds the db variables and fixes a few other issues you may have run into. Let me know if this helps.
this seems to be more stable now and I am getting successful runs of the script. Thank you!
DETAILS:
Cloud: Microsoft Azure Offer: f5-big-ip-byol Plan: f5-big-ltm-1slot-byol Image: f5-networks:f5-big-ip-byol:f5-big-ltm-1slot-byol:latest SKU: Standard DS2 v2 (2 vcpus, 7 GiB memory) runtime version: f5-bigip-runtime-init-1.6.0-1.gz.run
ISSUE:
Intermittent failure of script when trying to load TS rpm.
Script has the following configuration in extension_packages / install_operations:
DO and AS3 RPMs load fine. However, TS errors due to a timeout in bigip-runtime-init waiting for the rpm to be installed. Logs are shown below - timings are as follows:
09:20:22 - installs package and gets a 202 reply with status "INSTALL" (and running task ID)
If I manually load TS in the GUI, it takes about 15 seconds to load into the device so guessing the time allocated here (20 seconds) is not enough. I'm using the following Azure SKU: Standard DS2 v2 (2 vcpus, 7 GiB memory) which is an approved SKU for the LTM/DNS image (Good) which I am using (ref: https://clouddocs.f5.com/cloud/public/v1/matrix.html#microsoft-azure)
LOGS: