Closed TejasRGitHub closed 7 months ago
I don't think we need to create this role as part of pivot role auto create because we use dataset role to register LF locations now. We could potentially remove this piece of code but first need to investigate if any unforeseen errors could come up with with pre-existing environments using this role in data.all or with setting admins in lakeformation using this role.
If we need to still create role and just check if it exists before can use .from_role_arn()
(or .from_role_name()
) similar to how we do with importing pivot role in env stack:
self.pivot_role = iam.Role.from_role_arn(
self,
f'PivotRole{self._environment.environmentUri}',
pivot_role_stack.pivot_role.role_arn,
)
FOLLOW UP - from discussion with @dlpzx When we create the data.all Environment, we register the pivot role as a data lake administrator, without the existence of the LF service role this step would fail. At the time data.all was first developed we had to create this LF service role. Alternatively, customers could manually open Lake Formation in the AWS Console and the service role was created automatically. I think we should:
.from_role_arn()
or .from_role_name()
. Careful with the update stack behaviour@noah-paige , I wasn't able to find any problems with dataset import, dataset sharing without having the LF servicelinked role.
As pointed out by @dlpzx , removing the role is tricky and can cause problems. As this role is AWS managed service linked role and any aws account would have this created and they could be using this role, for existing data.all environments, ( which use the auto created pivot role ) and have this role in the cdk generated CF templates , if this role is removed, then the role will be deleted from the AWS accounts. This could cause issue with environment which use this role to do any operations on S3 buckets through lake formation.
In order to successfully remove this role, without letting this role get deleted, one way would be to apply the removal policy with RemovalPolicy.RETAIN.
Ways of doing this -
This service role deletion could be done over versions of data.all. For example, in v2.3, we add the code to apply the removal policy. Then in v2.4 , we could change the code to not include this role in pivot role stack. This will remove the role from CF of the pivot role in the update but won't delete the actual role.
Other options - Somehow access the CF template which is deployed for the pivot role and check if the role is present in the Pivot Role CF template. If the role is present, then add the RemovalPolicy.RETAIN policy. In the next update ( whenever the update-stacker runs ) , check if the CF template has the role and also has the removal policy, if yes then remove the role.
class PivotRole(NestedStack):
..
..
template = CFTemplate.getTemplate(id: "")
if template contains LakeFormationSLR and not RemovalPolicy statment:
self.lf_service_role = iam.CfnServiceLinkedRole(
self, 'LakeFormationSLR', aws_service_name='lakeformation.amazonaws.com'
)
self.lf_service_role.apply_removal_policy(RemovalPolicy.RETAIN)
I don't know if the above is possible, but would be a potential option if that works.
PR for fix - https://github.com/data-dot-all/dataall/pull/999
Would be good to add a warning that the LF Service Role will be released on the upcomming release
Can re-use this issue to track maintenance UI view https://github.com/data-dot-all/dataall/issues/998 - out of scope for this PR but a good to have for upcoming release to include Notice
Describe the bug
When migrating or bootstrapping environment with auto created pivot role ( dataallPivotRole-cdk ), the Pivot Role nested stack fails due to an already present lake formation service-linked role (
AWSServiceRoleForLakeFormationDataAccess
). This role is created when a S3 bucket is registered in Lake formation ( https://docs.aws.amazon.com/lake-formation/latest/dg/service-linked-roles.html ).When boostrapping an AWS account into data.all it is certainly possible that this role is being used already.
How to Reproduce
Register a S3 bucket and mention this role for registering the S3 bucket into Lake formation Create a data.all environment with that AWS account with auto create pivot role setting ON. The environment creation should fail due to dataallPivotRole stack not able to create the
AWSServiceRoleForLakeFormationDataAccess
role as it is already present.Expected behavior
The environment bootstrap should complete properly with the dataallPivotRole-cdk properly created.
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.9
AWS data.all version
2.2
Additional context
No response