Open iainelder opened 1 year ago
@alencar, it sounds like you are describing different issues:
I'm not familiar with the third issue so you may need to give me more information before I can help.
botocove assumes that that the target regions are accessible in all accounts, and so will call the function once in each target region in each target account. It works this way because it was good enough to solve my problems in back in #28.
There were times when I wished I had more control over exactly which regions were accessed per account, such as when I needed to remediate the resources of just a few account-regions across a large organization. It would have been faster to skip the account-regions needing no remediation. So I'm open to the idea of making it more flexible, but I want to understand your use case first, because there are many ways to work around it before making changes to botocove.
Given the way that botocove works today, the only sure way to access all the enabled regions of all the accounts in a single pass is to target all the regions that are known to be enabled in at least one account.
Unless all your accounts enable the whole set of target regions, then, as you showed, some security token exceptions will occur: ClientError: An error occurred (UnrecognizedClientException) when calling the ListTrails operation: The security token included in the request is invalid
. For example in my account where region eu-south-2
is not enabled, I can generate that error like this:
Session().client("cloudtrail", region_name="eu-south-2").list_trails()["Trails"]
One way around that is to just ignore the exceptions in the cove output for the account-regions that you know are disabled.
In the same example account I run this to generate one good result and one exception.
from botocove import cove
from botocore.exceptions import ClientError
@cove(
rolename="AWSControlTowerExecution",
regions=["eu-central-1", "eu-south-2"],
target_ids=["111111111111"]
)
def test_caller_identity(session):
if session.client("sts").get_caller_identity():
return "OK"
cove_output = test_caller_identity()
You could post-process the cove_output
object to remove any results in the Exceptions
list whose ExceptionDetails
is a ClientError
with error message "The security token included in the request is invalid"
.
{'Results': [{'Id': '111111111111',
'RoleName': 'AWSControlTowerExecution',
'RoleSessionName': 'AWSControlTowerExecution',
'AssumeRoleSuccess': True,
'Region': 'eu-central-1',
'Partition': 'aws',
'Name': 'Log Archive',
'Arn': 'arn:aws:organizations::222222222222:account/o-aaaaaaaaaa/111111111111',
'Email': '...',
'Status': 'ACTIVE',
'Result': 'OK'}],
'Exceptions': [{'Id': '111111111111',
'RoleName': 'AWSControlTowerExecution',
'RoleSessionName': 'AWSControlTowerExecution',
'AssumeRoleSuccess': True,
'Region': 'eu-south-2',
'Partition': 'aws',
'ExceptionDetails': botocore.exceptions.ClientError('An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid'),
'Name': 'Log Archive',
'Arn': 'arn:aws:organizations::222222222222:account/o-aaaaaaaaaa/111111111111',
'Email': '...',
'Status': 'ACTIVE'}],
'FailedAssumeRole': []}
Would that work for you? Or were you looking for something else?
If I understand the problem statement here (and it'd be useful to restate it as a user story: as an X I want to Y, today only Z) - "What if my org has inconsistent regions enabled across many accounts"; I don't think this is something we can solve in-library without a large burden of API calls.
Roughly two routes I can think of initially:
I don't think we can address this preflight without assuming into and then calling each account's get regions API, but I'm not super familiar with that API in general, and will hit the same constraints as the difficult to solve in-band validation problem.
@iainelder / @connelldave indeed is a related, but different issue than #55
High-level example is to go across each account and region and retrieve the EC2 instances. AllRegions=True
will return each and every valid AWS region, being it opted-in or not for the account providing the initial credential (i.e. Management Account). Watch out for any Exception that can indicate the region is disabled and handle it as appropriate for the end user use case.
from botocove import cove
import boto3
from botocore.Exceptions import ClientError
from botocore.Exceptions import UnrecognizedClientException
...
"""
Get all regions, including ones not opted-in
"""
@cove(
regions=[
r['RegionName'] for r in boto3.client('account').list_regions(AllRegions=True)['Regions']
]
)
def example(session):
ec2 = session.client('ec2')
response = ec2.get_paginator('describe_instances').paginate().build_full_result()
return response
...
...
results = example()
"""
Check results["Exceptions"] for any UnrecognizedClientException as it may indicate
an opt-in region that's disabled from this account OR Global STS Endpoint not configured
to use v2Token. See IAM.Client.set_security_token_service_preference()
Regions that are disabled by other means like SCP, would return ClientError/AccessDenied
"""
...
With Regions being loaded by the Botocove Session factory, the above issues related to Disabled/Not opted-in regions would be addressed.
Issues caused by the Global STS Token Version (v1Token, v2Token) can be already addressed if the Session is constructed from the Regional Endpoint.
A workaround would be to call IAM.Client.get_account_summary() and looking for the value SummaryMap["GlobalEndpointTokenVersion"] to decide if using Global STS Endpoint or Regional STS Endpoint to obtain Sessions.
High-level, from Management Account, using the Account.Client, potentially using it from the place where the Sessions are set up? Sorry, I don't know my way around Botocove code :(
...
"""
Get the Account.Client from the region that cannot be disabled and find out which regions are enabled
using the Management Account credential
"""
account = boto3.client('account', region_name='us-east-1')
"""
Iterate over each Organization Member Account (Adds 1 API call per member account)
- Get the list of regions of each account
- Update the _decorated_ session with the list of enabled regions
"""
response = account.list_regions(
AccountId=<Account Id>,
RegionOptStatusContains=[
'ENABLED',
'ENABLED_BY_DEFAULT'
]
)
"""
Update the _decorated_ session to have the list of regions as default
"""
...
...
...
References
@connelldave, for all the reasons you give, I don't think we need to make botocove aware of the opt-in status of a region.
If the aim here is to avoid disabled account-regions, both accessing them and referencing them in the output, then the use case is similar to mine when I wanted to access only the account regions needing remediation.
We can support both with a new parameter to cove
that describes the set of account-regions to be accessed.
@cove(
regions_per_account={
"111111111111": ["rr-aaaa-1", "rr-bbbb-1", "rr-cccc-1"],
"222222222222": ["rr-aaaa-1", "rr-bbbb-1"],
"333333333333": ["rr-aaaa-1", "rr-bbbb-1", "rr-cccc-1", "rr-dddd-1"],
}
)
def example(session):
...
The regions_per_account
parameter would override the target_ids
, ignored_ids
, and regions
parameters.
So configured, botocove would call example
in these account-regions:
111111111111
, rr-aaaa-1
111111111111
, rr-bbbb-1
111111111111
, rr-cccc-1
222222222222
, rr-aaaa-1
222222222222
, rr-bbbb-1
333333333333
, rr-aaaa-1
333333333333
, rr-bbbb-1
333333333333
, rr-cccc-1
333333333333
, rr-dddd-1
And the output object would have only results or exceptions for those combinations.
In my use case, I would find the correct value for regions_per_account
by doing a first pass with botocove over all account-regions to identify regions that need to be remediated. The first pass would run some listing and describing APIs. After studying the output of the first pass, I would pass a description of where to remediate to regions_per_account
and a description of how to remediate as a new decorated function that runs some create/update/delete APIs.
In @alencar's use case, the client code would call a function like this before passing the return value to regions_per_account
.
def get_active_account_regions(session):
org_client = session.client("organizations")
account_client = session.client("account")
mgmt_account_id = org_client.describe_organization()["Organization"]["MasterAccountId"]
pages = org_client.get_paginator("list_accounts").paginate()
active_member_accounts = [
account
for page in pages
for account in page["Accounts"]
if account["Status"] == "ACTIVE" and not account["Id"] == mgmt_account_id
]
# boto3 has no paginator for ListRegions. MaxResults allows up to 50 regions
# in one response. In June 2023 there are 31 launched regions [1].
# [1]: https://aws.amazon.com/about-aws/global-infrastructure/
active_account_regions = {}
for account in active_member_accounts:
active_regions = account_client.list_regions(
AccountId=account["Id"],
MaxResults=50,
RegionOptStatusContains=["ENABLED", "ENABLED_BY_DEFAULT"]
)["Regions"]
active_account_regions[account["Id"]] = [r["RegionName"] for r in active_regions]
return active_account_regions
To make the function work in my test account, I needed to enable trusted access for AWS Account Management like this:
aws organizations enable-aws-service-access \
--service-principal account.amazonaws.com
$ aws organizations list-aws-service-access-for-organization
{
"EnabledServicePrincipals": [
{
"ServicePrincipal": "account.amazonaws.com",
"DateEnabled": "2023-06-07T11:00:49.362000+02:00"
},
{
"ServicePrincipal": "cloudtrail.amazonaws.com",
"DateEnabled": "2023-05-19T12:20:06.578000+02:00"
},
{
"ServicePrincipal": "config.amazonaws.com",
"DateEnabled": "2023-05-19T12:27:58.513000+02:00"
},
{
"ServicePrincipal": "controltower.amazonaws.com",
"DateEnabled": "2023-05-19T12:20:05.228000+02:00"
},
{
"ServicePrincipal": "sso.amazonaws.com",
"DateEnabled": "2023-05-19T12:20:44.899000+02:00"
}
]
}
Without trusted access, using the AccountId
parameter of ListRegions causes this error:
AccessDeniedException: An error occurred (AccessDeniedException) when calling the ListRegions operation: User: arn:aws:sts::111111111111:assumed-role/AWSReservedSSO_AWSAdministratorAccess_aaaaaaaaaaaaaaaa/... is not authorized to perform: account:ListRegions (Your organization must first enable trusted access with AWS Account Management.)
@alencar, would something like that work better for you than post-processing the botocove output?
To be clear, I'm not suggesting that we add get_active_account_regions
to botocove. When the cove host account isn't an organization management account or delegated administrator, the function wouldn't make sense. Instead that function would be in your client code that calls cove
.
Issues caused by the Global STS Token Version (v1Token, v2Token) can be already addressed if the Session is constructed from the Regional Endpoint.
Can you show an example of the problem the STS token version causes? I've read about the SetSecurityTokenServicePreferences and GetAccountSummary APIs for controlling the version, but I don't yet understand how it interacts with botocove.
There is one path in the code that uses the boto3 default session. I wonder whether here it would matter.
@alencar , did you find a solution to the problem?
@iainelder applying what is discussed in https://github.com/connelldave/botocove/issues/74#issuecomment-1580351228 seems a good solution. Adding regions_per_account
additional parameter for use controlled account-regions combinations would be great.
@iainelder STS Global/Regional endpoints only affects calls to STS [1], basically where you call sts.assume_role(...)
like
and perhaps indirectly
[1] https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sts.html
@alencar , thanks. When I have a moment I'll try to add support for regions_per_account
to Botocove. You're also welcome to try it yourself if you don't want to wait for me. I doubt that this week I will get around to it.
Thanks also for the references to the code where the STS global/region endpoint setting matters. I'll experiment in my own environment to see whether I can break Botocove with a certain configuration of regions. If we can reproduce any errors, then we can fix that as well.
@alencar , did you find a solution to your problem?
I've started working in an environment with a "ragged regional" setup. I need to take an inventory of trails from a region in a member account that is disabled in the management account. I get the same error we discussed before: ClientError('An error occurred (UnrecognizedClientException) when calling the ListTrails operation: The security token included in the request is invalid'
.
I would like to fix this so that I can complete my inventory checking using botocove.
This issue has gotten a bit muddled, so I'll create a new one to track that specific issue when I have a simple repro.
@iainelder I have parsed the results Exceptions with jq to to identify disabled regions.
@alencar , thanks. When I have a moment I'll try to add support for
regions_per_account
to Botocove. You're also welcome to try it yourself if you don't want to wait for me. I doubt that this week I will get around to it.
I'm supportive of this, as well as shipping a helper function for get_active_account_regions
although I'd suggest it needs to be get_organization_active_account_regions
since it has a dependency on there being an org (just to differentiate the use case for non-orgs of just taking a list of accounts that trust another account, I doubt this is very common)
Would it be possible to use the decorator session within
@cove()
annotation to load the regions that are relevant for session account?It is common-place to have regions disabled from the account configuration (opt-in regions) and using the Management Account (or any other) regions as the list often result in An error occurred (UnrecognizedClientException) when calling the ListTrails operation: The security token included in the request is invalid exceptions due to region being disabled/not opted-in in Account -> AWS Regions AND/OR due to Global STS Endpoint issued tokens being only valid on regions enabled by default unless explicitly changed by the user in IAM -> Security Token Service (STS) -> Global endpoint
Another side effect of not using the account's enabled regions is that, you can miss regions that are not enabled/opted-in in the account, .i.e. Management Account.
There are currently ~10 regions that requires opt-in https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html?icmpid=docs_iam_console#id_credentials_region-endpoints
Originally posted by @alencar in https://github.com/connelldave/botocove/issues/55#issuecomment-1577091152