F5Networks / f5-aws-cloudformation

CloudFormation Templates for quickly deploying BIG-IP services in Amazon Web Services EC2
112 stars 118 forks source link

Template completes before BIG-IP is actually ready #74

Closed mikeoleary closed 4 years ago

mikeoleary commented 5 years ago

Do you already have an issue opened with F5 support?

No.

Description

CFT reports complete before deployed BIG-IP's are ready. If I try to access a BIG-IP immediately after the CFT is reported as completed, it is unresponsive. About 60 seconds after completion, I can access the box via SSH but it is not licensed. About 2 mins after completion, the box appears licensed but not clustered. About 4-5 mins after the CFT completes, the box is accessible, licensed, clustered, and ready to accept configuration commands.

In my case, this is a problem with nested CFTs - another template will deploy after this, which relies on the BIG-IP being completely provisioned.

Template

I have experienced this with both standalone and clustered templates: https://github.com/F5Networks/f5-aws-cloudformation/tree/master/supported/failover/same-net/via-api/2nic/existing-stack/byol https://github.com/F5Networks/f5-aws-cloudformation/tree/master/supported/standalone/2nic/existing-stack/byol

Severity Level

For bugs, enter the bug severity level. Do not set any labels.

Severity: 5

Severity level definitions:

  1. Severity 1 (Critical) : Defect is causing systems to be offline and/or nonfunctional. immediate attention is required.
  2. Severity 2 (High) : Defect is causing major obstruction of system operations.
  3. Severity 3 (Medium) : Defect is causing intermittent errors in system operations.
  4. Severity 4 (Low) : Defect is causing infrequent interuptions in system operations.
  5. Severity 5 (Trival) : Defect is not causing any interuptions to system operations, but none-the-less is a bug.
kreynoldsf5 commented 5 years ago

Perhaps a CreationPolicy attribute and a cfn-signal helper might be useful.

What does it actually mean for the stack to be ready?

The BIG-IPs are ready to accept traffic configuration? All onboarding tasks for the BIG-IP have been completed (ie. cloudlibs work)? Something else?

bertvandevoorde commented 5 years ago

I actually experience this problem as well. During the stack creation, quite soon after the instances are commissioned, the stack creation will say "complete". More often than not, if I connect to the boxes using SSH, they will say NO LICENSE. In my case, everything seems to stop at that point. I'm trying to investigate what goes wrong.

hparr commented 5 years ago

I actually experience this problem as well. During the stack creation, quite soon after the instances are commissioned, the stack creation will say "complete". More often than not, if I connect to the boxes using SSH, they will say NO LICENSE. In my case, everything seems to stop at that point. I'm trying to investigate what goes wrong.

WRT to NO LICENSE - please ensure that there is outbound internet access from Eth0 in a single NIC deployment and Eth0+Eth1 in a multi NIC deployment - the path can be EIP or NAT.  The step that fails should be in the logs at /var/log/cloud - if you dont have the directory and data it probably relates to an outbound problem on installing all the libraries.

amolari commented 4 years ago

No update yet? Please add signaling, that the autoscale instances have the status "InService" after the bigip instances are up&running (all installation/DO scripts successfully processed). It is important in a IaC deployment (configuration after the provisioning).

C0missar commented 4 years ago

I too would like an update on this issue. My short-term hack was to build in a wait of 15 minutes after CFT completion. Our Pro Svcs consultant (Keith Fuller) suggested checking for the existence of /credentials/master.json in the S3 bucket before continuing (so we could use REST to get in to the box), and then checking 'tmsh show sys ready field-fmt' for the following:

tmsh show sys ready field-fmt sys bigip-ready { config-ready yes license-ready yes provision-ready yes }

As far as what "ready" means, I think that it should mean the following:

Licensed Config loaded REST interface up Config UI running

The REST interface is the most important part, because no other provisioning utility (direct REST calls, Ansible, etc.) can talk to the box until then. If I can't manage it, it's not "Ready" in any useful sense of the term.

One could argue that "licensed" shouldn't be a requirement, because if there is a licensing problem, you couldn't connect to the device to find out what was happening otherwise, and licensing technically could be done later. But without REST calls, further automated activity will just error out (at random points depending on timing), leaving you with a pair of devices in an unknown state that has already consumed licenses that must then be manually revoked.

f5-applebaum commented 4 years ago

NOTE: Good RFE. cfn-signal is on the BIG-IP and as temporary workaround can be used in customization section depending on definition of READY.

ex. '/opt/aws/apitools/cfn-init/bin/cfn-signal -e 0 -e 0 --stack '

C0missar commented 4 years ago

I'm afraid I have no idea what the above advice means or how to implement it. And which definition of READY is it assuming?

f5-applebaum commented 4 years ago

Sorry, for example, take the template mentioned above. https://github.com/F5Networks/f5-aws-cloudformation/tree/master/supported/failover/same-net/via-api/2nic/existing-stack/byol

To do this, you would just add the "Creation Policy" on the "instance" resource: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-signal.html

ex.

"Resources": {
  "Bigip1Instance": {
    "CreationPolicy": {
        "ResourceSignal": {
            "Timeout": "PT10M"
        }
   },

ex. https://gist.github.com/f5-applebaum/d9209d02acaffbcf0d97954956a1fbd6#file-example-cfn-signal-L810

And then add cfn-signal command in the "CUSTOM CONFIGURATION" section of the template. ex.

           "  echo \"Custom config was not a URL, continuing...\"\n",
           "fi\n",
           "### ADD WHATEVER SUCCESS CRITERIA DESIRED ##### \n",
           "## if [[ success_condition == XXXXX ]]; then \n",
           "/opt/aws/bin/cfn-signal -e 0 ",
           "         --stack ",
           {
              "Ref": "AWS::StackName"
           },
           "         --resource Bigip1Instance ",
           "         --region ",
           {
               "Ref": "AWS::Region"
           },
           "\n",
           "## fi \n",
           "### END CUSTOM CONFIGURATION"

ex. https://gist.github.com/f5-applebaum/d9209d02acaffbcf0d97954956a1fbd6#file-example-cfn-signal-L1126

Cloudformation will delay reporting CREATE_COMPLETE on the instance until it receives that signal. (ex. will see "Received SUCCESS signal with UniqueId i-086d813ba8cded867" in the event log).

Disclaimer: In the above example, I just send a generic 0 (bash return code for SUCCESS) which at least delays Cloudformation reporting CREATE_COMPLETE until that last customization section (custom-config.sh) runs. A more complete definition of success can be defined (i.e. generic onboarding the template provides or user-defined in custom-section, etc.)

shyawnkarim commented 4 years ago

This enhancement request is being tracked internally with ID ESECLDTPLT-1938.

shyawnkarim commented 4 years ago

Closing.

This enhancement was included with Release 5.7.0.