bitovi / bitops

Automate the provisioning and configuration of cloud infrastructure with BitOps docker image
https://bitops.sh
Other
36 stars 9 forks source link

Cloudformation hanging during deployment stage #140

Open PhillypHenning opened 2 years ago

PhillypHenning commented 2 years ago

I've noticed a bit of an ongoing issue with the cloudformation deployment.

until echo "$STATUS" | egrep -q 'CREATE_COMPLETE|UPDATE_COMPLETE|COMPLETE|FAILED|DELETE_IN_PROGRESS'; 
do 
  # DEPLOYMENT STAGE 1
  aws cloudformation describe-stack-events --stack-name "${CFN_STACK_NAME}" --query 'StackEvents[?contains(ResourceStatus,`CREATE_IN_PROGRESS`)].[LogicalResourceId, ResourceStatus, ResourceType, ResourceStatusReason]';

  # DEPLOYMENT STAGE 2
  aws cloudformation describe-stack-events --stack-name "${CFN_STACK_NAME}" --query 'StackEvents[?contains(ResourceStatus,`FAILED`)].[LogicalResourceId, ResourceStatus, ResourceType, ResourceStatusReason]';

  sleep 10; 
  # DEPLOYMENT STAGE 3
  STATUS=$(aws cloudformation describe-stacks --stack-name "${CFN_STACK_NAME}" --query "Stacks[0].StackStatus" --output text);

done

It seems to me that during STAGE 1, the query can become hung and none responsive.. I've noticed this occurs if the stack is in any state that isn't CREATE_IN_PROCESS

PhillypHenning commented 2 years ago

A few things that should be noted.

  1. The AWS_DEFAULT_REGION needs to be the same as the region that the stack-name exists in.
  2. The logs are being queried at the "most recent" state. This means we are checking if the logs for the stack event (in this case CREATE_IN_PROGRESS exists in the stack events, not that they are the most recent event.
PhillypHenning commented 2 years ago

This will be fixed with the multi-regional deployment change

PhillypHenning commented 2 years ago

Snippet of logging output;

STATUS: [CREATE_IN_PROGRESS]
{
    "StackEvents": [
        {
            "StackId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "EventId": "SecurityGroups-CREATE_IN_PROGRESS-2022-01-13T20:01:50.487Z",
            "StackName": "test-mr-clearwater-ecs-infra",
            "LogicalResourceId": "SecurityGroups",
            "PhysicalResourceId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra-SecurityGroups-XRU6YSRBSI9/a5255680-74ab-11ec-96fc-0eb6db23da45",
            "ResourceType": "AWS::CloudFormation::Stack",
            "Timestamp": "2022-01-13T20:01:50.487000+00:00",
            "ResourceStatus": "CREATE_IN_PROGRESS",
            "ResourceStatusReason": "Resource creation Initiated",
            "ResourceProperties": "{\"TemplateURL\":\"https://s3.amazonaws.com/clearwater-bitops-deployments/multiregion-deployment/templates/security-groups.yaml\",\"Parameters\":{\"VpcId\":\"vpc-0bc30f5a3a12d11e0\",\"SourceSecurityGroup\":\"sg-0edf34e8738a4688c\",\"Region\":\"us-east-1\"}}"
        },
        {
            "StackId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "EventId": "SecurityGroups-CREATE_IN_PROGRESS-2022-01-13T20:01:49.433Z",
            "StackName": "test-mr-clearwater-ecs-infra",
            "LogicalResourceId": "SecurityGroups",
            "PhysicalResourceId": "",
            "ResourceType": "AWS::CloudFormation::Stack",
            "Timestamp": "2022-01-13T20:01:49.433000+00:00",
            "ResourceStatus": "CREATE_IN_PROGRESS",
            "ResourceProperties": "{\"TemplateURL\":\"https://s3.amazonaws.com/clearwater-bitops-deployments/multiregion-deployment/templates/security-groups.yaml\",\"Parameters\":{\"VpcId\":\"vpc-0bc30f5a3a12d11e0\",\"SourceSecurityGroup\":\"sg-0edf34e8738a4688c\",\"Region\":\"us-east-1\"}}"
        },
        {
            "StackId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "EventId": "a08f7ab0-74ab-11ec-a9db-12e3f0953fd1",
            "StackName": "test-mr-clearwater-ecs-infra",
            "LogicalResourceId": "test-mr-clearwater-ecs-infra",
            "PhysicalResourceId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "ResourceType": "AWS::CloudFormation::Stack",
            "Timestamp": "2022-01-13T20:01:42.494000+00:00",
            "ResourceStatus": "CREATE_IN_PROGRESS",
            "ResourceStatusReason": "User Initiated"
        }
    ]
}
STATUS: [CREATE_IN_PROGRESS]
{
    "StackEvents": [
        {
            "StackId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "EventId": "SecurityGroups-CREATE_IN_PROGRESS-2022-01-13T20:01:50.487Z",
            "StackName": "test-mr-clearwater-ecs-infra",
            "LogicalResourceId": "SecurityGroups",
            "PhysicalResourceId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra-SecurityGroups-XRU6YSRBSI9/a5255680-74ab-11ec-96fc-0eb6db23da45",
            "ResourceType": "AWS::CloudFormation::Stack",
            "Timestamp": "2022-01-13T20:01:50.487000+00:00",
            "ResourceStatus": "CREATE_IN_PROGRESS",
            "ResourceStatusReason": "Resource creation Initiated",
            "ResourceProperties": "{\"TemplateURL\":\"https://s3.amazonaws.com/clearwater-bitops-deployments/multiregion-deployment/templates/security-groups.yaml\",\"Parameters\":{\"VpcId\":\"vpc-0bc30f5a3a12d11e0\",\"SourceSecurityGroup\":\"sg-0edf34e8738a4688c\",\"Region\":\"us-east-1\"}}"
        },
        {
            "StackId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "EventId": "SecurityGroups-CREATE_IN_PROGRESS-2022-01-13T20:01:49.433Z",
            "StackName": "test-mr-clearwater-ecs-infra",
            "LogicalResourceId": "SecurityGroups",
            "PhysicalResourceId": "",
            "ResourceType": "AWS::CloudFormation::Stack",
            "Timestamp": "2022-01-13T20:01:49.433000+00:00",
            "ResourceStatus": "CREATE_IN_PROGRESS",
            "ResourceProperties": "{\"TemplateURL\":\"https://s3.amazonaws.com/clearwater-bitops-deployments/multiregion-deployment/templates/security-groups.yaml\",\"Parameters\":{\"VpcId\":\"vpc-0bc30f5a3a12d11e0\",\"SourceSecurityGroup\":\"sg-0edf34e8738a4688c\",\"Region\":\"us-east-1\"}}"
        },
        {
            "StackId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "EventId": "a08f7ab0-74ab-11ec-a9db-12e3f0953fd1",
            "StackName": "test-mr-clearwater-ecs-infra",
            "LogicalResourceId": "test-mr-clearwater-ecs-infra",
            "PhysicalResourceId": "arn:aws:cloudformation:us-east-1:186513196687:stack/test-mr-clearwater-ecs-infra/a08ce2a0-74ab-11ec-a9db-12e3f0953fd1",
            "ResourceType": "AWS::CloudFormation::Stack",
            "Timestamp": "2022-01-13T20:01:42.494000+00:00",
            "ResourceStatus": "CREATE_IN_PROGRESS",
            "ResourceStatusReason": "User Initiated"
        }
    ]
}
PhillypHenning commented 2 years ago

@mickmcgrath13 / @ConnorGraham What are your thoughts on having a verbose logging flag for cloudformation.

verbose would look like the above, and non verbose would strip it to be;

STATUS: [CREATE_IN_PROGRESS]
STATUS: [CREATE_IN_PROGRESS]
STATUS: [CREATE_COMPLETED]
ConnorGraham commented 2 years ago

@PhillypHenning there are already checks through bitops for the DEBUG env var. Would this suffice or do you want something more specific to CF?

PhillypHenning commented 2 years ago

I was thinking this would be specifically for CF logs, though if you think it should be encompassed in the DEBUG flag I don't have any objections.