Fraudmarc / fraudmarc-ce

Fraudmarc Community Edition: Open-source DMARC report analysis designed for government agencies
Apache License 2.0
152 stars 28 forks source link

Error Deploying - race condition between SES receipt rules and S3 bucket policy #79

Open ohookins opened 2 weeks ago

ohookins commented 2 weeks ago

I'm not 100% sure of the solution here although I'm reasonable certain about most of the cause.

I'm running into an issue deploying the app stack, which gets to about 90/98 resources in the stack before it fails with an error like:

11:08:46 am | CREATE_FAILED        | AWS::SES::ReceiptRule                           | FraudmarcCERule39C12C10
Could not write to bucket: fraudmarc-ce-app-ruad173f6f7-vns0tzccogsd (Service: null; Status Code: 400; Error Code: InvalidS3Configuration; Request ID: 7a
098382-ef83-455d-b0f0-9124ae82632d; Proxy: null)

What I believe is happening is that the receipt rule is triggering an attempt by SES to write to the RUA bucket, in order to test that the S3 permissions are correct. The S3 bucket at this point in time doesn't have a bucket policy to allow it, nor does the receipt rule use an IAM role that grants those permissions. It fails, and the receipt rule therefore doesn't create properly and the whole stack is rolled back.

This assumes that there is some automatic S3 write test going on in SES, but I can't imagine any other reason for this. It's worth mentioning that the call to cdk.aws_ses_actions.S3 should actually add an appropriate bucket policy for the access, as per this code in CDK.

My thoughts around why this doesn't work:

  1. Since the policy statements are not resources unto themselves, and neither is the bucket policy overall, there's no dependency between them and the receipt rule, and the policy is created too late for the SES test write action.
  2. The SES test write happens at the root of the bucket rather than the object prefix that is provided, and since the policy doesn't match up, it gets denied.
  3. There's some other action being performed on the bucket entirely by SES (e.g. list) and so the bucket policy doesn't match up.

I've tried but failed to create additional policy statements but due to the dependency ordering I don't think they end up being created early enough. Incidentally I noticed some "fixes" in CDK that relate to this functionality:

  1. https://github.com/aws/aws-cdk/pull/29833
  2. https://github.com/aws/aws-cdk/pull/30375

The second of these is a revert of the first PR. The first PR implements the "correct" SES IAM policy for the bucket with enough specificity, but ironically would create even more of a circular dependency - you need to know the receipt rule ARN in order to create the bucket policy to give it access, but if the creation of the receipt rule tests S3 write access, you have already failed at that point.

I also notice there is a custom resource in place to activate and deactivate the ruleset? Maybe a similar approach can be used to enable/disable the receipt rule? I will attempt to diagnose further.

ohookins commented 2 weeks ago

So far as a workaround to be able to install it, I have found this workaround:

The need for this is contingent on there being undocumented SES behaviour that tests the writability of the destination S3 bucket when the receipt rule is being created. The only way I can see around this without having to do a multiple stage deploy as I've used above, would be to add an IAM role to the SES configuration.

This would not allow us to use an IAM assume role policy with a calculated principle from the other resources - we'd need to form the template string ourselves without making reference to the receipt rule (thus introducing a resource ordering dependency). It's a bit icky but would likely work around this problem.

ohookins commented 1 week ago

I got this response from AWS support:

Yes, The SES service is performing a test operation to the bucket by sending an AMAZON_SES_SETUP_NOTIFICATION message. Usually when creating a receiving rule set in SES, the SES service sends the initial setup notification message to your S3 bucket with the below details to confirm if it can write to it.

So I think fundamentally the ReceiptRule CDK resource introduces a race condition that cannot be escaped. I think the only workaround for this would be to create an IAM role for SES to use, which grants S3 permissions required, and is created and configured prior to setting up the ReceiptRule.