Closed zsaltys closed 6 months ago
I had a successful share, then I deleted the consumption role. Verify shares, marked share as unhealthy with following errors: When I hit re-apply shares, it does work and fails with error:
Thanks for opening an issue and being so clear. I think there are some methods in the consumption roles code that we could re-use to implement sharing guardrails. Do you have the bandwidth to take this item?
I think, the most logical way to handle this is:
@zsaltys @dlpzx @anushka-singh what do you think?
Also, @petrkalos suggested the scheme similar to proposed here .
Above I described the reaction to the problem, but this approach will help us to detect and prevent it earlier.
Completed in #1161
We've ran into multiple situations where our users delete a role which is actively used in data.all. We can detect this (with custom scripts) and data.all will let you revoke the S3 share via UI but it will fail trying to revoke a LakeFormation share with an error like:
Failed to get role ROLE_NAME due to: An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name ROLE_NAME cannot be found.
This can only be fixed by manually updating the share to FAILED in RDS and then the share can be deleted via UI. This is very inconvenient because only data.all engineering team can fix such issues rather than making it a responsibility of the share owner.
I suspect that this issue also affects the "share validator" released in 2.3 and that it fails to fix a share when a role is deleted. I propose the following enhancement for this problem:
When attempting to revoke a share detect that a role has been deleted/missing and simply put the share into FAILED mode so that the share can be deleted via UI. This should also happen when share fixer is trying to fix the share either via manual request or when it runs in the background.
This fix is related to this issue https://github.com/data-dot-all/dataall/issues/1029. The above error was observed on 2.2 release.