DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Alarm `api_unauthorized` for HeadBucket from AWS Config #6134

Open achave11-ucsc opened 5 months ago

achave11-ucsc commented 5 months ago
{
    "eventVersion": "1.09",
    "userIdentity": {
        "type": "AWSService",
        "invokedBy": "config.amazonaws.com"
    },
    "eventTime": "2024-04-05T06:59:27Z",
    "eventSource": "s3.amazonaws.com",
    "eventName": "HeadBucket",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "config.amazonaws.com",
    "userAgent": "config.amazonaws.com",
    "errorCode": "AccessDenied",
    "errorMessage": "Access Denied",
    "requestParameters": {
        "bucketName": "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
        "Host": "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1.s3.amazonaws.com"
    },
    "responseElements": null,
    "additionalEventData": {
        "SignatureVersion": "SigV4",
        "CipherSuite": "TLS_AES_128_GCM_SHA256",
        "bytesTransferredIn": 0,
        "AuthenticationMethod": "AuthHeader",
        "x-amz-id-2": "gLfjs2So1SoGuTApoVO+10z4SbdQTcwl7C/wd8Evvxtg7yEA47Pu0RzPG30qqmzay026CIQQkeF5rhQmqIYLo62lqI7OKcki",
        "bytesTransferredOut": 263
    },
    "requestID": "QRVD73A912P7FRQT",
    "eventID": "a6dce604-fc55-43da-83a4-f75db4334790",
    "readOnly": true,
    "resources": [
        {
            "type": "AWS::S3::Object",
            "ARNPrefix": "arn:aws:s3:::edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1/"
        },
        {
            "accountId": "122796619775",
            "type": "AWS::S3::Bucket",
            "ARN": "arn:aws:s3:::edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": false,
    "recipientAccountId": "122796619775",
    "sharedEventID": "ec6c1c26-78fd-447e-9a5d-ac1f58dc3b0c",
    "vpcEndpointId": "vpce-00dc1369",
    "eventCategory": "Data"
}
dsotirho-ucsc commented 5 months ago

Assignee to provide description.

dsotirho-ucsc commented 5 months ago

@hannes-ucsc: "Assignee to add s3:HeadBucket to https://github.com/DataBiosphere/azul/blob/e8138f9948f391c8c502d69b8747071eab795781/terraform/shared/shared.tf.json.template.py#L237"

hannes-ucsc commented 5 months ago

~For demo, show absence of trail events for one week after this lands in a main deployment.~ https://github.com/DataBiosphere/azul/issues/6134#issuecomment-2266376442

dsotirho-ucsc commented 5 months ago

Assignee to summarize the difficulties and our plan ahead as discussed on Slack in a comment in this ticket.

dsotirho-ucsc commented 5 months ago

The problem reported by this issue is an AccessDenied error for config.amazonaws.com attempting a HeadBucket action on the awsconfig bucket (edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1).

PR #6150 attempted to fix this issue by adding the s3:HeadBucket permission to the bucket's policy, however this failed to deploy as there is no s3:HeadBucket permission.

According to the documentation, the HeadBucket operation requires the s3:ListBucket permission…

General purpose bucket permissions - To use this operation, you must have permissions to perform the s3:ListBucket action.

…however this is already configured in the bucket's current policy.

        {
            "Effect": "Allow",
            "Principal": {
                "Service": "config.amazonaws.com"
            },
            "Action": [
                "s3:GetBucketAcl",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
            "Condition": {
                "StringEquals": {
                    "AWS:SourceAccount": "122796619775"
                }
            }
        },

Assignee to wait until blocking ticket #6152 is merged, then spike to contact AWS Support regarding this issue.

We'll decide afterwards what to do with the yet to be merged PR #6150.

dsotirho-ucsc commented 5 months ago

AWS Support case 171355455900988 (opened on platform-hca-dev account): https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=171355455900988&language=en

dsotirho-ucsc commented 4 months ago

Close wontfix when blocker is resolved.

achave11-ucsc commented 3 months ago

Assignee to implement workarround suggested by AWS Support.

With regards to the frequency at which AWS Config checks if the S3 bucket "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1" exist before deliverying the logs to the S3 bucket, the internal team has also noted when reviewing the Delivery Channel configuration in place for the Delivery channel "azul-awsconfig-dev", that the delivery frequency set for the delivery channel is 6 hours. This means that around every 6 hours, AWS Config will first check if the S3 bucket exist, before deliverying the logs to the S3 bucket. This would also explain why you are seeing the "AccessDenied" errors for the "HeadBucket" operation, 4 times each day. The same can be seen, when running the AWS CLI command "describe-delivery-channels"[1] and review the "deliveryFrequency" value in the "configSnapshotDeliveryProperties" property that is in the response output of the above command.

Now, it would be possible to decrease the frequency at which AWS Config checks if the S3 bucket exist, by updating[2] the delivery frequency of the delivery channel, to the maximum time interval of 24 hours. This will result in AWS Config to perform the S3 bucket check once every 24 hours. You can update the delivery frequency of the delivery channel, using either the API "PutDeliveryChannel"[3], or by running the AWS CLI command "put-delivery-channel"[4], where you would set the value of the delivery frequency to "TwentyFour_Hours". Below is an example "put-delivery-channel" command for achieving the same:

$ aws configservice put-delivery-channel --delivery-channel name=azul-awsconfig-dev,s3BucketName=edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1,configSnapshotDeliveryProperties={deliveryFrequency=TwentyFour_Hours}

============References================ [1] https://docs.aws.amazon.com/cli/latest/reference/configservice/describe-delivery-channels.html

[2] Managing the Delivery Channel - Updating the Delivery Channel - https://docs.aws.amazon.com/config/latest/developerguide/manage-delivery-channel.html#update-dc-console

[3] PutDeliveryChannel - https://docs.aws.amazon.com/config/latest/APIReference/API_PutDeliveryChannel.html

[4] https://awscli.amazonaws.com/v2/documentation/api/latest/reference/configservice/put-delivery-channel.html

======================================

Assignee to also adjust the alarm threshold accordingly.

achave11-ucsc commented 3 months ago

@hannes-ucsc: "In order to establish the correct threshold we need to determine what exactly causes SSM to make a request that is being rejected. Assignee to confirm/deny the hypothesis that SSM requests are triggered by running make plan with the main component of a main deployment selected. Spike to also test the hypothesis that they are triggered by either deploying the GL component with a plan that replaces the instance or by simply rebooting the instance."

achave11-ucsc commented 3 months ago

@hannes-ucsc: "Assignee to also ensure that we use the right comparison operator, as we seem to currently be using the wrong one (GreaterThanOrEqual)."

achave11-ucsc commented 3 months ago

~@hannes-ucsc: "From the spike experiments it appears that AccessDenied requests occur in the following cases: When a newly created instance first starts up (two AccessDenied requests), when a new version of the SSM agent is available and automatically installed by the agent (new versions are checked twice daily, the uninstallation of an old version incurs two AccessDenied requests, the installation of the new version incurs another two). This means we could get eight AccessDenied requests per day from the auto-update plus another two for each instance recreation.~

~Assignee to try s3:* in the IAM policy."~

Moved to https://github.com/DataBiosphere/azul/issues/6141#issuecomment-2192551319

achave11-ucsc commented 3 months ago

~Assignee to draft a AWS support request on Google Docs.~

Moved to https://github.com/DataBiosphere/azul/issues/6141#issuecomment-2192566595

achave11-ucsc commented 2 months ago

This ticket is currently about two things, 1) reducing the AWS Config type of alarms due to AWS related pings via the AWS support recommendation and 2) (#6141) the SSM agent causing AccessDenied events in S3 when attempting to download an artifact.

hannes-ucsc commented 2 months ago

We should be holding off on any threshold adjustments until

dsotirho-ucsc commented 2 months ago

Assignee to monitor AWS Support ticket and follow up if necessary.

dsotirho-ucsc commented 2 months ago

@hannes-ucsc: "After several weeks of inaction on the AWS Support ticket, we closed it. Looks like we have to live with this false alarm for the time being."

dsotirho-ucsc commented 2 months ago

Assignee to move forward with the alleviation proposed by AWS Support.

nadove-ucsc commented 1 month ago

For demo, show reduction in false positive alarms two weeks after this lands.

achave11-ucsc commented 1 month ago

Assignee to move forward with the alleviation proposed by AWS Support.

The alleviation proposed by AWS Support was ineffective at reducing the daily frequency of these type of alarms.

It's been 24+ hrs since this changes were enabled in the dev deployment, and there's the same number of unauthorized type of alarms associated with the awsconfig bucket as before.

unauthdev With nine occurrences in the most recent 24h period, with suggested workaround enabled.

[
    {
        "errorCode": "AccessDenied",
        "eventSource": "s3.amazonaws.com",
        "eventName": "HeadBucket",
        "source": "config.amazonaws.com",
        "sessionIssuer": "",
        "principalId": "",
        "requestBucket": "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
        "manifest": "",
        "count(*)": "9"
    }
]

The configuration change seems to be valid, albeit ineffective.

❯ aws configservice describe-delivery-channels
{
    "DeliveryChannels": [
        {
            "name": "azul-awsconfig-dev",
            "s3BucketName": "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
            "configSnapshotDeliveryProperties": {
                "deliveryFrequency": "TwentyFour_Hours"
            }
        }
    ]
}
achave11-ucsc commented 1 month ago

@hannes-ucsc: "I think there's the possibility of AWS Config applying the delivery frequency decrease after a delay. My guess is that we should wait for one week before deciding that the fix is actually ineffective. Assignee to file a partial PR that only decreases the delivery frequency to 24 hours. We will then observe the effect of that change before deciding whether to leave it as is, to revert it, or to add the change that lowers the alarm threshold."

dsotirho-ucsc commented 3 weeks ago

@hannes-ucsc: "The reduction of the delivery frequency from 6 to 24 hours did not have the desired effect of reducing the number of unauthorized requests made by AWS Config. Assignee to (re)open issue with Amazon Support, asking why the proposed workaround did not work as predicted."

achave11-ucsc commented 3 weeks ago

A new case has been opened,https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=172488195200090&language=en.

Assignee to (re)open issue with

Unfortunately, that wasn't an option, can't re-open a case that has been closed after 14 days of inactivity: reopen-nogo

achave11-ucsc commented 1 week ago

AWS Support has replied requesting further details:

According to the internal team, they wwant to understand if your S3 bucket use a KMS encryption? If so, can you share the S3 bucket permissions, & KMS Key if available.

From their end, they could see that the Delivery channel is not expecting a KMS key to access the bucket so in case you have set that up, the delivery channel will not have adequate perms to deliver data.

Once done so, internal team would be able to investigate the issue further.

achave11-ucsc commented 1 week ago

Assignee to share screenshots of all the tabs in AWS S3 bucket edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1 except for the Metrics tab.

achave11-ucsc commented 1 week ago

The screenshots have been submitted in a reply to the AWS Support team.

achave11-ucsc commented 6 days ago

AWS Support responded earlier today with more questions:

According to the internal team, they want to re-confirm the asks and few other things here to help root cause:

  • Is the ask just to root cause and eliminate the Access denied exceptions seen on CloudTrail?
  • Do you use either of Config Snapshots or config history features? If so, is that impacted here? if impact is seen, what is the impact?
  • You had earlier responded saying that you changed the delivery frequency of the delivery channel and that "it did not have the intended effects". Has the number errors seen daily reduced from 4 a day? If so what is the number of errors seen now?

Once you provide the above information, internal team would be able to investigate the issue further.

achave11-ucsc commented 6 days ago

Assignee to respond as follows:

  1. Yes
  2. No (but confirm)
  3. State again what the current sustained frequency of these CloudTrail events per day is.
achave11-ucsc commented 5 days ago

Replied with the requested details. Support ticket.

dsotirho-ucsc commented 3 days ago

Warm Greetings from AWS Premium Support. Thank you for updating the case notes.

From the updated case notes I understand that the ask is to find the root cause and eliminate the Access denied exceptions seen on CloudTrail

Please let me know if I happen to misunderstand your query and I will align myself accordingly.

Thank you for providing us with the details. I have reached back to the internal team to get more insights regarding this.

Once I have an update from them, I will reach back to you.

Meanwhile, I will be putting the case in "Pending Amazon Action".

I hope the above information was helpful and if you have any other issues or queries, please feel free to update the case and I will be happy to assist you further.

dsotirho-ucsc commented 3 days ago

Assignee to monitor support ticket.