A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Data set creation and import stack is failing in cloudformation and is always getting rolled back from past few days. This issue started recently and there is no change in the dataall version on our organization from past 5 months. Hence we are not able to find the root cause of the issue.
Error in cloudformation stack occurs at the creation of crawler and error message is
S3 bucket dataall-<> does not exist. (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: dedb0351-c13a-4f6e-b84d-dea3cf0db051; Proxy: null)
We are using cdk version 14
Error in cloudformation while importing dataset:
Failure in creation of dataallDatasetDatabase
CREATE_FAILED
Received response status [FAILED] from custom resource. Message returned: Error: Could not create Glue Database dataall_<> in aws://<>/<>, received An error occurred (AccessDeniedException) when calling the CreateDatabase operation: Insufficient Lake Formation permission(s) on s3://<>/ Logs: /aws/lambda/dataall-gluedb-handler-m6up1tqu at invokeUserFunction (/var/task/framework.js:2:6) at processTicksAndRejections (internal/process/task_queues.js:97:5) at async onEvent (/var/task/framework.js:1:302) at async Runtime.handler (/var/task/cfn-response.js:1:1474) (RequestId: 58ef82a8-80a8-41f3-b633-28c71896598c)
How to Reproduce
Bootstrap a aws account as environment in data.all
Create a data set in the environment.
Stack creation is failing and is in ROLLBACK_COMPLETE state.
Import existing bucket in the environment.
Stack creation is in ROLLBACK_FAILED state
Hi @mvidhu :) Thanks for opening the issue. If I understand correctly you actually have 2 issues:
creating of Dataset = failure on S3 Bucket not found. We also encountered this issue a couple of months ago because of a change on Glue crawler creation. We fixed it in #385 by adding a dependency clause in the stack
importing a Dataset = insufficient LF permissions in Glue database creation. I have tried recreating your issue in the latest version and the error does not appear. I would suggest you to upgrade to newer versions to fix this issue and the other one. If you want to debug the issue, here are some details on the failure. As part of the creation/import of datasets we create a Glue database and grant permissions in LF to it using a CloudFormation custom resource (a Lambda that is executed when the stack is created). I suggest you to take a look at LakeFormation and: 1) In the LakeFormation console, check the Data lake administrators. Make sure that the dataallPivotRole is one of these admins. 2) In LakeFormation, check the data lake locations, check if the imported S3 Bucket is already registered and what is the role that is using LakeFormation. 3) Has the pivotRole been deleted and re-created at any moment? In this case, I would remove it from the Lake Formation data lake admins and add it again. Lake Formation points at the unique identifier of an IAM role, when a role is deleted and re-created in the console it appears as the same role, but under-the-hood Lake Formation treats them as 2 roles which causes issues.
I hope this helps, please comment here if you still face issues :)
Describe the bug
Data set creation and import stack is failing in cloudformation and is always getting rolled back from past few days. This issue started recently and there is no change in the dataall version on our organization from past 5 months. Hence we are not able to find the root cause of the issue. Error in cloudformation stack occurs at the creation of crawler and error message is S3 bucket dataall-<> does not exist. (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: dedb0351-c13a-4f6e-b84d-dea3cf0db051; Proxy: null)
We are using cdk version 14
Error in cloudformation while importing dataset: Failure in creation of dataallDatasetDatabase
How to Reproduce
Bootstrap a aws account as environment in data.all Create a data set in the environment. Stack creation is failing and is in ROLLBACK_COMPLETE state. Import existing bucket in the environment. Stack creation is in ROLLBACK_FAILED state
Expected behavior
Create and import should be successful.
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.11
AWS data.all version
0.5.0
Additional context
No response