IAM Policy enhancements - Split policy statements in chunks

anushka-singh commented 1 year ago

Is your idea related to a problem? Please describe. The limit for managed IAM policies is 6144 characters. Based on a dirty approximation, each team can request access to around 35 buckets based on the number of characters in policy statements.

Describe the solution you'd like In v2.1.0 we added some utilities in dataall/base/utils/iam_policy_utils.py. We can reuse them to limit the resources per policy and splitting policies.

dlpzx commented 12 months ago

Thanks for opening the issue for visibility @anushka-singh. We will work together on introducing this feature because as you noticed it includes some scenarios out of the ones considedred in the iam_policy_utils

zsaltys commented 4 months ago

@dlpzx @SofiaSazonova @anmolsgandhi we ran into this issue on IAM policies when a role accumulated 38+ shares. Would be nice to prioritize this one.

petrkalos commented 2 months ago

I started an investigation into it and here are my findings so far...

When using data.all generated IAM roles we already use all the policies (max 10 per role) for services hence we cannot add anymore. We can assume that not all deployment will have all the features enabled and those 10 might be fewer and as such have some space but I don't consider this a good solution.

When using consumption roles we don't have those service policies hence in theory we have space for adding more managed policies but keeping in mind that those roles are not data.all managed and hence users might have already added their own policies.

Overall to tackle this issue we need to come up with a smarter (which probably means dynamic) policy manager which will maximise the capacity based on the following limitations

10 managed policies per role
6144 chars per managed policy
10240 chars in total across all the inline policies per role

TejasRGitHub commented 2 months ago

Hi @petrkalos ,

We are seeing IAM policy limit getting exceeded on a managed roles. I think an initial version where if the policy gets exceeded, another policy is created and attached would be helpful. I think data.all would somehow ( maybe ) have to remember the policy name it created and try to modify the policy which still has space left.

FYI , if needed the limit can be pushed to 20 managed policies per role- https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html#reference_iam-quotas-entity-length:~:text=Managed%20policies%20per%20role . Please correct me if I am wrong

TejasRGitHub commented 2 months ago

[WIP]

Current IAM Roles and Policy management in data.all

Data.all creates managed roles for data.all constructs like Environment, Dataset. These managed roles contain managed and inline policies.

Environment - Contains Data Policy ( inline ), muliple service policies ( Customer managed ), share policy [Optional]
Dataset - Contains one inline policy
Teams attached to an environment - Contains Data Policy ( inline ), muliple service policies ( Customer managed )[Optional], share policy[Optional]
Consumption Roles- Irrespective of if the role is data.all managed or not, consumption roles contains one customer managed policy
PivotRole - Contains one inline, multiple customer managed policies ( these policies are split into chunks to maximize space usage)

Problem with managing Customer Managed Service policies:

For environments and teams where roles are created by data.all and customer managed service policies are attached, these service policies are split into chunks ( splitting 10 statements into one managed policy ). Usually these service policies are split into managed policies as show below,

Now if there is some modification in the 1-indexed policy then that leaves the managed policy some empty space in which it could fill in some statements. Data.all will have to rearrange all those policies so that there is a tight fit ( i.e. minimal empty space in all the policies ).

Thus this problem is for statements = [s1, s2, s3, .... ] and their sizes ( weights ) = [size0, size1, size2, ... ]. Put statements in policies ( service-policy-0, service-policy-1, service-policy-2 ... ) such that Min(Empty Space left ).

This problems exists because the service policies are diverse and dynamically added. This doesn't allow us to efficiently split it into chunks like how it is done in functions like split_policy_with_resources_in_statements, etc. But this problem is not present for managed policies used in dataset sharing. Here optimizations can be used to split the statements into chunks and create indexed managed policies.

Proposal

Stage 1:

Since the problem of managed policies getting full and resulting in "limit exceeded error" most likely happens for roles which are involved in share purposes. This update can be first targeted to all the consumption roles, environment team roles ( which are involved in sharing ). Currently there exists just one managed policy with data.all with the naming convention - <dataall-env>-<URI>-share-policy-<name of the role>.

Policy updates with bucket sharing in data.all -

Following resources are added in requestors IAM role

SID for S3 - BucketStatementS3

S3 permissions

s3_target_resources = [f'arn:aws:s3:::{self.bucket_name}', f'arn:aws:s3:::{self.bucket_name}/*']

SID for KMS - BucketStatementKMS

Kms permissions ( if KMS is used to encrypt bucket )

kms_target_resources = [f'arn:aws:kms:{self.bucket_region}:{self.source_account_id}:key/{kms_key_id}']

Policy updates with folder sharing in data.all -

Following resources are added in requestors IAM role

SID - AccessPointsStatementS3

S3 permissions

s3_target_resources = [
            f'arn:aws:s3:::{self.bucket_name}',
            f'arn:aws:s3:::{self.bucket_name}/*',
            f'arn:aws:s3:{self.dataset_region}:{self.dataset_account_id}:accesspoint/{self.access_point_name}',
            f'arn:aws:s3:{self.dataset_region}:{self.dataset_account_id}:accesspoint/{self.access_point_name}/*',
]

SID - AccessPointsStatementKMS

Kms permissions ( if KMS is used to encrypt bucket )

kms_target_resources = [f'arn:aws:kms:{self.dataset_region}:{self.dataset_account_id}:key/{kms_key_id}']

Policy updates with table sharing in data.all -

Here the requestor IAM role is not updated

Design to handle splitting of policies

When a share is gettting processed ( either for bucket or folder share )

Check if the share has an old format policy
- If yes, then create a policy with naming convention plus have the index as 0
Check if the share already has a "share-policy", then create a new one with an index and copy all statements from old to new policy. Delete the old policy.
Read the statements from all the policies and split the statements similar to the function split_policy_with_resources_in_statements. a. This function can be extended / modified in which depending on if its bucket share or access point share , it will pick up all the statements. ~~belonging to an SID pattern ( AccessPointsStatement or BucketStatement).~~ [UPDATED] b. Then the statements will be again split with split_policy_with_resources_in_statements. c. Fetch all policy statement from all the managed policies. d. Combine statement from (b) and (c) . e. Then split statements into statement chunks with split_policy_statements_in_chunks.

Then add statement chunks into the managed policies like here

    for index, chunk in enumerate(statements_chunks):
        policies.append(
            iam.ManagedPolicy(
                self.stack,
                f'PivotRolePolicy-{index + 1}',
                managed_policy_name=f'{self.env_resource_prefix}-pivot-role-cdk-policy-{self.region}-{index + 1}',
                statements=chunk,
            )
        )

When a share is revoked,

Check if the share has an old policy
- If yes, then create a policy with naming convention plus have the index as 0
Check if the share already has a "share-policy" and create a new on with an index and copy all statements from old to new policy and delete the old policy
Fetch all the policy statements , remove the principals which have been revoked from the S3 and the KMS sids. Get all the statements and then again call the split function split_policy_with_resources_in_statements.
Add the statements chunks into the managed policies like in Step 4 above.
Delete all the managed policies which are not used ( empty ).

Migrating to managed policies with indexes.

Since we already have a managed policy, this will be converted to an index based managed policy with the name format as - <dataall-env>-<URI>-share-policy-<name of the role>-<**index** (for e.g. 0, 1, 2,.. )>

While attaching all the policies which are of the above format will be picked and attached to the role.

Questions

Q1. What happens when a role is not data.all managed ( in the case of consumption roles ) ?

In case if the role is not managed by data.all, then all managed policies which are created as a part of the share will be added on the copy clipboard button. The policies will be presented as array of policies when copied from the clip board icon.

Q2. Can we search for all the managed policies belonging to that consumption role ( any role which is involved in dataset share ) with boto3 calls ?

Seems like the policies can be filtered based on PathPrefix - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/iam/client/list_policies.html

Q3. What if we make separate managed policies for access point and bucket share ?

This will involve migrating the existing policy into two managed policies like "<dataall-env>-<URI>-access-share-policy-<name of the role>-<index>" and "<dataall-env>-<URI>-bucket-share-policy-<name of the role>-<index>"

Pros:

Easy to handle splitting and splitting will happen specific to the share type ( access and bucket )
Easy for user to view the managed policies and navigate to view statements

Cons

If the role already has 9 policies, splitting the policies into two would be a problem ( since 10 is the limit to the maximum number of managed policy a role can have )

Q4. What are the policies which are added by Redshift sharing ?

@dlpzx , any insights on this one ?

Previous GH issues related to policy - https://github.com/data-dot-all/dataall/issues/922

petrkalos commented 2 months ago

Thanks for the detailed analysis @TejasRGitHub, I agree with the proposed solution, let's tackle ConsuptionRoles (and groups since it's very similar code) and think later about the rest. Also the indexing solution is already utilised and proven to work fine.

dlpzx commented 2 months ago

Hi @TejasRGitHub, very clear design. Answering to Q4 ---> luckily for us we do not rely much on Redshift IAM permissions in the data.all integration. The only role that gets additional Redshift IAM permissions is the PivotRole, the rest will get access to Redshift using directly Redshift connections and Redshift permissions in the cluster (outside of IAM, managed by database admins)

data-dot-all / dataall