awsdocs / aws-organizations-docs

The open source version of the AWS Organizations documentation. We welcome and encourage your feedback. You can submit feedback and requests for changes by submitting issues in this repo or by making proposed changes and submitting a pull request.
Other
47 stars 71 forks source link

Help Wanted -- Troubleshooting Tips, Tricks, and Strategies #8

Closed carlasp closed 4 years ago

carlasp commented 6 years ago

As you know, troubleshooting problems with an organization and its many moving parts can be a challenge. What issues have you run into and solved? Can you share your solution with your peers? Please share only problems for which you've found a solution. If you're still struggling with a problem, please contact Customer Support or post your issue on the AWS Forums.

Share the following information with me as a response to this issue, and I'll consider adding it to the troubleshooting section of the AWS Organizations documentation.

On behalf of your colleagues who run into the same problems, we thank you for your participation.

Carla Spoon Senior Technical Writer AWS Organizations

https://docs.aws.amazon.com/organizations/latest/userguide/

brainstorm commented 5 years ago

@carlasp We did find some worrying side effect while implementing a variation of the region-lock SCP example:

https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_example-scps.html#example-scp-deny-region

Namely, we want non-Australian regions to be completely disabled:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAllOutsideAU",
            "Effect": "Deny",
            "Action": [
                "*"
            ],
            "Resource": [
                "*"
            ],
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "ap-southeast-2"
                    ]
                }
            }
        }
    ]
}

According to docs, us-east-1 "core"/internal AWS services should continue working regardless of this lock.

Unfortunately, we did face problems with basic cross-account IAM functions failing. For instance AssumeRole between AWS accounts and through Google SAML setup, IIRC.

Could you pass this through your internal AWS channels/support and provide a solution/example for this usecase?: Only-AU accounts with working SAML/Cross-account assume role.

We would love to apply that useful SCP and have it running in production :)

/cc @reisingerf

carlasp commented 5 years ago

Thank you for letting me know. I will start looking into this.

carlasp commented 5 years ago

At first glance, it looks like you need to have a NotAction element and add your "core" global services in it so that they can bypass the region restriction. I verified this with my SCP and policy expert.

If you look at the example in the docs, it does include a NotAction block and the example description mentions the NotAction element. Where did the docs confuse you? I'd like to make this as clear as possible.

brainstorm commented 5 years ago

Aha, I see now. The docs confused me in the NotAction section since those permissions seem to be too lax? Users from other regions would be able to do iam:* actions? That is not desirable since those should be blocked too?

In other words, can those fields in NotAction be tightened even more without affecting core functionality/services on my allowed region?

/cc @reisingerf

carlasp commented 5 years ago

Yes. The example is intended as a starting point that you can customize. You can even list iam:* in your SCP and then deny the specific actions in a different policy.

brainstorm commented 5 years ago

Thanks @carlasp, but still that does not address the issue at hand:

If I want to lock down a particular region (in my case, AU), what is the minimal list of Global services (most of them running on us-east-1, I guess?) that I need to include/exclude in the SCP to just have my region operational?

/cc @reisingerf

carlasp commented 5 years ago

The region is operational regardless of what services you list in your SCP. The issue is that you want to control access to it. The example policy is intended as a starting point, and it includes commonly-used global global services that you may or may not use. It doesn't include every possible global service or action. You'll need to customize your SCP for your own needs.

brainstorm commented 5 years ago

Thanks much @carlasp for the attention, we got it sorted out with AWS support by listing the services used by the last 3 months as a starting point, namely:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAllOutsideAU",
            "Effect": "Deny",
            "NotAction": [
               "iam:*",
               "organizations:*",
               "route53:*",
               "acm:*",
               "cloudtrail:*",
               "cloudwatch:*",
               "s3:*",
               "ce:*",
               "kms:*",
               "guardduty:*",
               "lambda:*",
               "budgets:*",
               "waf:*",
               "ds:*",
               "cur:*",
               "dlm:*",
               "pricing:*",
               "config:*",
               "cloudfront:*",
               "cloudformation:*",
               "budgets:*",
               "events:*",
               "ec2messages",
               "globalaccelerator:*",
               "importexport:*",
               "support:*",
               "logs:*",
               "sns:*",
               "sts:*",
               "tag:*",
               "trustedadvisor:*",
               "wellarchitected:*",
               "resource-groups:*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "ap-southeast-2"
                    ]
                }
            }
        }
    ]
}

I totally get your point that it's entirely up to the org needs and services used. Thanks again!

(safe to close this now ;)

0xdabbad00 commented 5 years ago

@brainstorm That might not do what you want. It allows you to create a lambda or kms, for example, in any region.

brainstorm commented 5 years ago

Yeah, @0xdabbad00, that's what I suspected too, it is indeed too lax to be useful, even for starters/draft.

That example, btw, came from AWS Premium support, verbatim, after a few case exchanges and explanation.

So I'm indeed all ears for better alternatives that lock me down to AU/Sydney region (ap-southeast-2) while having core AWS services (i.e IAM) up and running without major fiddling with the above action groupings too much.

I'm quite astonished that this is not so easy to accomplish on AWS right now. IMHO, regions should be able to have rather clear-cut isolation for regulatory purposes and other quite common (I thought) use cases.

/cc @reisingerf @ohofmann

0xdabbad00 commented 5 years ago

@brainstorm Let me start off this response by saying I'm not an AWS employee, and work as a consultant for AWS clients.

To understand the policy, know that some services are global, so you can't disable them in other regions, because they don't exist in the "other" regions, but exist globally. You can see this in the SDK's for a global service such as route53: https://github.com/boto/botocore/blob/develop/botocore/data/endpoints.json#L2026

The global services use us-east-1 as their end-point, but again, it's not really a regional thing, it's just that they have to an end-point somewhere for your calls to go to.

Some services though, such as route53domains (which get's called behind the scenes for some route53 calls) isn't listed as being global, but only has an end-point in us-east-1, so for all purposes, it too is global (to be pedantic, this example doesn't really matter to you though, because the IAM service route53 controls the privileges to access both the route53 and route53domains end-points, because AWS didn't make a 1:1 ratio of privileges to end-points, which is not sane, but they probably have their reasons).

If you git clone botocore (the Python SDK), you can run the following query to find the global services and those in us-east-1. This is a little ugly because some services that are only in one region aren't just us-east-1 or aws-global, such as devicefarm which is only in us-west-2:

$ jq -r '.partitions[0].services | keys[] as $k | .[$k] | select(.endpoints | length == 1) | select(.endpoints."us-east-1" != null or .endpoints."aws-global" != null)| $k' botocore/data/endpoints.json
a4b
budgets
ce
chime
cloudfront
cur
entitlement.marketplace
health
iam
importexport
marketplacecommerceanalytics
mobileanalytics
organizations
route53
route53domains
shield
support
waf

Now as mentioned earlier, the relation between privilege names and APIs called is a mess, and some privileges don't have APIs (despite what AWS will tell you). So we need to add in:

globalaccelerator
trustedadvisor
wellarchitected

Also, I don't know the privileges that the marketplacecommerceanalytics and entitlement.marketplace tie back to, so I'm removing those.

So that all said, I think this policy is complete:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAllOutsideAU",
            "Effect": "Deny",
            "NotAction": [
                "a4b:*",
                "budgets:*",
                "ce:*",
                "chime:*",
                "cloudfront:*",
                "cur:*",
                "globalaccelerator:*",
                "health:*",
                "iam:*",
                "importexport:*",
                "mobileanalytics:*",
                "organizations:*",
                "route53:*",
                "shield:*",
                "support:*",
                "trustedadvisor:*",
                "waf:*",
                "wellarchitected:*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "ap-southeast-2"
                    ]
                }
            }
        }
    ]
}
brainstorm commented 5 years ago

This is a brilliant response in all regards, thanks so much @0xdabbad00, I'll give it a go right now ;)

brainstorm commented 5 years ago

@0xdabbad00 It worked well except for the missing sts:* since we have cross account temporal tokens and terraform was failing to assume role(s).

brainstorm commented 5 years ago

@0xdabbad00 After a few days of testing this region lock, it works well, except for some hiccups like this one seen in Route53:

route53-aws-organizations-region-lock

Seems to have more cross-region tentacles than simply route53:* :-S

/cc @reisingerf @jayai2014

brainstorm commented 5 years ago

This SCP definition is still problematic, things like ACM, even if "excluded" on the SCP's NotAction (since they are issued in us-east-1 and not ap-southeast-2 as in our case), fail, so we have to temporarily disable it for some deployments that have ACM certificate confirmations:

https://github.com/umccr/infrastructure/blob/master/terraform/stacks/umccr_data_portal/main.tf#L166

:/

brainstorm commented 5 years ago

@0xdabbad00 The policy you suggested above was missing route53domains:* that's why route53 panel and ACM was misbehaving. Thanks Or Straze from AWS for looking into this!

phils commented 4 years ago

@0xdabbad00 Additional missing global services from our experience so far:

            "aws-portal:*",
            "aws-marketplace:*",
            "route53domains:*",
            "savingsplans:*",

some other things we have to deal with as well:

            "sts:*", (we found awscli was hitting global endpoint by default)
            "cloudtrail:*", (maybe - if you're trying to look at IAM actions, for example)
            "config:*", (maybe - if you're tracking IAM, for example)
brainstorm commented 4 years ago

We have since given up with this SCP since it causes more disruption than benefits (although it's pretty critical to have for regulatory purposes). I really wish AWS would fix this one for good... are you running it on production @phils ?

0xdabbad00 commented 4 years ago

AWS just this week finally updated their SCP example for regions after requests from me, although it is the one I provided earlier so it sounds like you may still have issues. https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_example-scps.html#example-scp-deny-region

Now that they are agreeable to updating it, you might try requesting they look further at it. I had made the request through aws-security@amazon.com which is probably not the ideal communication path, but it's the one I'm most familiar with.

phils commented 4 years ago

@brainstorm Not in prod yet.. not even widely in nonprod (but soon - so I suspect will find some more breakage). What sort of issues did you run into mainly?

brainstorm commented 4 years ago

@0xdabbad00 Thanks for the insight, I'll ping our local APAC AWS representative on this matter and see if he can move it within his AWS hive mind. @phils Our issues are mainly deploys failing on route53-related terraform or CDK stacks/resources.

/cc @victorskl @reisingerf

cam8001 commented 3 years ago

Here's an updated SCP to restrict services to ap-southeast-2 when using Control Tower:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAllOutsideAustralia",
            "Effect": "Deny",
            "NotAction": [
                "a4b:*",
                "acm:*",
                "aws-marketplace-management:*",
                "aws-marketplace:*",
                "aws-portal:*",
                "awsbillingconsole:*",
                "budgets:*",
                "ce:*",
                "chime:*",
                "cloudfront:*",
                "config:*",
                "cur:*",
                "directconnect:*",
                "ec2:DescribeRegions",
                "ec2:DescribeTransitGateways",
                "ec2:DescribeVpnGateways",
                "fms:*",
                "globalaccelerator:*",
                "health:*",
                "iam:*",
                "importexport:*",
                "kms:*",
                "mobileanalytics:*",
                "networkmanager:*",
                "organizations:*",
                "pricing:*",
                "route53:*",
                "route53domains:*",
                "s3:GetAccountPublic*",
                "s3:ListAllMyBuckets",
                "s3:ListBuckets",
                "s3:PutAccountPublic*",
                "shield:*",
                "sts:*",
                "support:*",
                "trustedadvisor:*",
                "waf-regional:*",
                "waf:*",
                "wafv2:*",
                "wellarchitected:*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "ap-southeast-2"
                    ]
                },
                "ArnNotLike": {
                    "aws:PrincipalARN": [
                        "arn:aws:iam::*:role/AWSControlTowerAdmin",
                        "arn:aws:iam::*:role/AWSControlTowerCloudTrailRole",
                        "arn:aws:iam::*:role/AWSControlTowerStackSetRole",
                        "arn:aws:iam::*:role/*ControlTower*",
                        "arn:aws:iam::*:role/*controltower*"
                    ]
                }
            }
        }
    ]
}
brainstorm commented 1 year ago

Fast forward to 2023, we were just having some DataBricks data lake integration issues we resolved via s3:* under NotAction. Do folks in this thread see a major threat (model) of doing so vs the strategy above on just allowing certain S3 ops?

"if we misconfigue a bucket the data can be egressed to Estonia irrespective of some SCP policy" /cc @andrewpatto

Other than that, I'd like to add that this particular region lockdown SCP has been one of the most frequent sources of problems and frustrations w.r.t permission handling in our team, specially with newly announced services :-S

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyAllOutsideAustralia",
      "Effect": "Deny",
      "NotAction": [
        "a4b:*",
        "acm:*",
        "aws-marketplace-management:*",
        "aws-marketplace:*",
        "aws-portal:*",
        "awsbillingconsole:*",
        "budgets:*",
        "ce:*",
        "chime:*",
        "cloudfront:*",
        "config:*",
        "cur:*",
        "directconnect:*",
        "ec2:DescribeRegions",
        "ec2:DescribeTransitGateways",
        "ec2:DescribeVpnGateways",
        "fms:*",
        "globalaccelerator:*",
        "health:*",
        "iam:*",
        "importexport:*",
        "kms:*",
        "mobileanalytics:*",
        "networkmanager:*",
        "organizations:*",
        "pricing:*",
        "route53:*",
        "route53domains:*",
        "s3:*",
        "shield:*",
        "sts:*",
        "support:*",
        "trustedadvisor:*",
        "waf-regional:*",
        "waf:*",
        "wafv2:*",
        "wellarchitected:*",
        "chatbot:*",
        "xray:*",
        "omics:*"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "ap-southeast-2"
          ]
        },
        "ArnNotLike": {
          "aws:PrincipalARN": [
            "arn:aws:iam::*:role/AWSControlTowerAdmin",
            "arn:aws:iam::*:role/AWSControlTowerCloudTrailRole",
            "arn:aws:iam::*:role/AWSControlTowerStackSetRole",
            "arn:aws:iam::*:role/*ControlTower*",
            "arn:aws:iam::*:role/*controltower*"
          ]
        }
      }
    }
  ]
}

/cc @alexiswl @0xdabbad00