Open achave11-ucsc opened 5 months ago
Assignee to provide description.
@hannes-ucsc: "Assignee to add s3:HeadBucket to https://github.com/DataBiosphere/azul/blob/e8138f9948f391c8c502d69b8747071eab795781/terraform/shared/shared.tf.json.template.py#L237"
~For demo, show absence of trail events for one week after this lands in a main deployment.~ https://github.com/DataBiosphere/azul/issues/6134#issuecomment-2266376442
Assignee to summarize the difficulties and our plan ahead as discussed on Slack in a comment in this ticket.
The problem reported by this issue is an AccessDenied error for config.amazonaws.com
attempting a HeadBucket
action on the awsconfig bucket (edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1
).
PR #6150 attempted to fix this issue by adding the s3:HeadBucket
permission to the bucket's policy, however this failed to deploy as there is no s3:HeadBucket
permission.
According to the documentation, the HeadBucket operation requires the s3:ListBucket
permission…
General purpose bucket permissions - To use this operation, you must have permissions to perform the s3:ListBucket action.
…however this is already configured in the bucket's current policy.
{
"Effect": "Allow",
"Principal": {
"Service": "config.amazonaws.com"
},
"Action": [
"s3:GetBucketAcl",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
"Condition": {
"StringEquals": {
"AWS:SourceAccount": "122796619775"
}
}
},
Assignee to wait until blocking ticket #6152 is merged, then spike to contact AWS Support regarding this issue.
We'll decide afterwards what to do with the yet to be merged PR #6150.
AWS Support case 171355455900988 (opened on platform-hca-dev account): https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=171355455900988&language=en
Close wontfix
when blocker is resolved.
Assignee to implement workarround suggested by AWS Support.
With regards to the frequency at which AWS Config checks if the S3 bucket "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1" exist before deliverying the logs to the S3 bucket, the internal team has also noted when reviewing the Delivery Channel configuration in place for the Delivery channel "azul-awsconfig-dev", that the delivery frequency set for the delivery channel is 6 hours. This means that around every 6 hours, AWS Config will first check if the S3 bucket exist, before deliverying the logs to the S3 bucket. This would also explain why you are seeing the "AccessDenied" errors for the "HeadBucket" operation, 4 times each day. The same can be seen, when running the AWS CLI command "describe-delivery-channels"[1] and review the "deliveryFrequency" value in the "configSnapshotDeliveryProperties" property that is in the response output of the above command.
Now, it would be possible to decrease the frequency at which AWS Config checks if the S3 bucket exist, by updating[2] the delivery frequency of the delivery channel, to the maximum time interval of 24 hours. This will result in AWS Config to perform the S3 bucket check once every 24 hours. You can update the delivery frequency of the delivery channel, using either the API "PutDeliveryChannel"[3], or by running the AWS CLI command "put-delivery-channel"[4], where you would set the value of the delivery frequency to "TwentyFour_Hours". Below is an example "put-delivery-channel" command for achieving the same:
$ aws configservice put-delivery-channel --delivery-channel name=azul-awsconfig-dev,s3BucketName=edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1,configSnapshotDeliveryProperties={deliveryFrequency=TwentyFour_Hours}
============References================ [1] https://docs.aws.amazon.com/cli/latest/reference/configservice/describe-delivery-channels.html
[2] Managing the Delivery Channel - Updating the Delivery Channel - https://docs.aws.amazon.com/config/latest/developerguide/manage-delivery-channel.html#update-dc-console
[3] PutDeliveryChannel - https://docs.aws.amazon.com/config/latest/APIReference/API_PutDeliveryChannel.html
======================================
Assignee to also adjust the alarm threshold accordingly.
@hannes-ucsc: "In order to establish the correct threshold we need to determine what exactly causes SSM to make a request that is being rejected. Assignee to confirm/deny the hypothesis that SSM requests are triggered by running make plan
with the main component of a main deployment selected. Spike to also test the hypothesis that they are triggered by either deploying the GL component with a plan that replaces the instance or by simply rebooting the instance."
@hannes-ucsc: "Assignee to also ensure that we use the right comparison operator, as we seem to currently be using the wrong one (GreaterThanOrEqual)."
~@hannes-ucsc: "From the spike experiments it appears that AccessDenied requests occur in the following cases: When a newly created instance first starts up (two AccessDenied requests), when a new version of the SSM agent is available and automatically installed by the agent (new versions are checked twice daily, the uninstallation of an old version incurs two AccessDenied requests, the installation of the new version incurs another two). This means we could get eight AccessDenied requests per day from the auto-update plus another two for each instance recreation.~
~Assignee to try s3:*
in the IAM policy."~
Moved to https://github.com/DataBiosphere/azul/issues/6141#issuecomment-2192551319
~Assignee to draft a AWS support request on Google Docs.~
Moved to https://github.com/DataBiosphere/azul/issues/6141#issuecomment-2192566595
This ticket is currently about two things, 1) reducing the AWS Config type of alarms due to AWS related pings via the AWS support recommendation and 2) (#6141) the SSM agent causing AccessDenied events in S3 when attempting to download an artifact.
We should be holding off on any threshold adjustments until
[ ] we hear back from AWS support on the support request about the AccessDenied caused by AWS Config that @dsotirho-ucsc created earlier
[x] @achave11-ucsc finishes the draft support request about SSM Agent causing frequent AccessDenied alarms. That draft is better handled as part of #6141 which we will reopen now
Assignee to monitor AWS Support ticket and follow up if necessary.
@hannes-ucsc: "After several weeks of inaction on the AWS Support ticket, we closed it. Looks like we have to live with this false alarm for the time being."
Assignee to move forward with the alleviation proposed by AWS Support.
For demo, show reduction in false positive alarms two weeks after this lands.
Assignee to move forward with the alleviation proposed by AWS Support.
The alleviation proposed by AWS Support was ineffective at reducing the daily frequency of these type of alarms.
It's been 24+ hrs since this changes were enabled in the dev
deployment, and there's the same number of unauthorized type of alarms associated with the awsconfig bucket as before.
With nine occurrences in the most recent 24h period, with suggested workaround enabled.
[
{
"errorCode": "AccessDenied",
"eventSource": "s3.amazonaws.com",
"eventName": "HeadBucket",
"source": "config.amazonaws.com",
"sessionIssuer": "",
"principalId": "",
"requestBucket": "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
"manifest": "",
"count(*)": "9"
}
]
The configuration change seems to be valid, albeit ineffective.
❯ aws configservice describe-delivery-channels
{
"DeliveryChannels": [
{
"name": "azul-awsconfig-dev",
"s3BucketName": "edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1",
"configSnapshotDeliveryProperties": {
"deliveryFrequency": "TwentyFour_Hours"
}
}
]
}
@hannes-ucsc: "I think there's the possibility of AWS Config applying the delivery frequency decrease after a delay. My guess is that we should wait for one week before deciding that the fix is actually ineffective. Assignee to file a partial PR that only decreases the delivery frequency to 24 hours. We will then observe the effect of that change before deciding whether to leave it as is, to revert it, or to add the change that lowers the alarm threshold."
@hannes-ucsc: "The reduction of the delivery frequency from 6 to 24 hours did not have the desired effect of reducing the number of unauthorized requests made by AWS Config. Assignee to (re)open issue with Amazon Support, asking why the proposed workaround did not work as predicted."
A new case has been opened,https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=172488195200090&language=en.
Assignee to (re)open issue with
Unfortunately, that wasn't an option, can't re-open a case that has been closed after 14 days of inactivity:
AWS Support has replied requesting further details:
According to the internal team, they wwant to understand if your S3 bucket use a KMS encryption? If so, can you share the S3 bucket permissions, & KMS Key if available.
From their end, they could see that the Delivery channel is not expecting a KMS key to access the bucket so in case you have set that up, the delivery channel will not have adequate perms to deliver data.
Once done so, internal team would be able to investigate the issue further.
Assignee to share screenshots of all the tabs in AWS S3 bucket edu-ucsc-gi-platform-hca-dev-awsconfig.us-east-1
except for the Metrics
tab.
The screenshots have been submitted in a reply to the AWS Support team.
AWS Support responded earlier today with more questions:
According to the internal team, they want to re-confirm the asks and few other things here to help root cause:
- Is the ask just to root cause and eliminate the Access denied exceptions seen on CloudTrail?
- Do you use either of Config Snapshots or config history features? If so, is that impacted here? if impact is seen, what is the impact?
- You had earlier responded saying that you changed the delivery frequency of the delivery channel and that "it did not have the intended effects". Has the number errors seen daily reduced from 4 a day? If so what is the number of errors seen now?
Once you provide the above information, internal team would be able to investigate the issue further.
Assignee to respond as follows:
Replied with the requested details. Support ticket.
Warm Greetings from AWS Premium Support. Thank you for updating the case notes.
From the updated case notes I understand that the ask is to find the root cause and eliminate the Access denied exceptions seen on CloudTrail
Please let me know if I happen to misunderstand your query and I will align myself accordingly.
Thank you for providing us with the details. I have reached back to the internal team to get more insights regarding this.
Once I have an update from them, I will reach back to you.
Meanwhile, I will be putting the case in "Pending Amazon Action".
I hope the above information was helpful and if you have any other issues or queries, please feel free to update the case and I will be happy to assist you further.
Assignee to monitor support ticket.