Open ricky-escobar opened 1 year ago
I tried the work around and got cyclic dependency errors.
Same here
Is this because each user creates its own Lambda that runs concurrently as opposed to all of the users being creates/assigned to a same lambda that calls a BatchExecuteStatement
call to the Redshift Data API?
For example, here is where the table.grant
method is defined. It calls the user.addTablePrivileges method, which in turn calls the UserTablePrivileges custom construct.
It looks like every call to table.grant
will actually create a separate construct and execute the GRANT
statement concurrently, which is what's causing the error. Ideally all of the changes associated with a particular construct (a table) should be part of the same transaction.
Can we control the concurrency limit of the lambda function itself? What if we set it up to one concurrent execution maximum?
This is probably not the ideal solution, but I think it's way easier than figuring out a way to accumulate all the statements and then execute them all in a single call to the the API
I am not sure if that would help. I think the simple solution is to run all these statements in a single transaction block; that way we avoid concurrent attempts to get a hold of the mutex.
But I am not sure if this is trivial. For example, if through CDK we generate 100 statements (100 different users getting read access to the same table) then this can be put into the same transaction. However, if there are 100 statements, where 50 give 50 users read access to table A
and the other 50 statements give the same users read access to table B
we would have to collect the statements per resource, i.e. table. So that in the latter case two transactions are sent to the Redshift Data API: 1 for table A
and one for table B
.
Sorry but I am not sure how the lazy eval would help here? It seems the main issue with the current approach is due to concurrently attempting to change the same Redshift resource (my guess is it's the table) in different transactions. I think that this will keep happening as long as we have concurrent executions of the same Lambda trying to change the same object.
Describe the bug
With Redshift, concurrency errors are possible when running multiple
GRANT
statements for different users on the same table at the same time; for example,GRANT SELECT ON Table TO User1
andGRANT SELECT ON Table TO User2
. However, the RedshiftTable
/User
CDK constructs do not account for this. During CloudFormation deployment, multipleTablePrivileges
resources can create/update at the same time, which can cause concurrency errors and stack update failures.Expected Behavior
When creating/updating multiple users with privileges for common tables, the CloudFormation stack update always succeeds with no errors.
Current Behavior
When creating/updating multiple users with privileges for common tables, the CloudFormation stack update sometimes fails, with the following error for a
TablePrivileges
resource:This is sensitive to the exact times that the concurrent custom resource Lambdas execute the statements. So the failure does not always. The failure is more likely to happen the more users and tables are involved. Reproducing this was very inconsistent on our end. We got 10 failures in a row when doing deploying to our production account, but struggled to reproduce it in our test accounts. We have 6 tables and 2 users.
Reproduction Steps
Possible Solution
We worked around this on our end by introducing an arbitrary dependency between the
User
s, forcing them to be updated sequentially rather than concurrently. Something similar could potentially be done internally. In case this is difficult to fix, the documentation can be updated to clearly call out this known issue and suggest this workaround to the user.Additional Information/Context
No response
CDK CLI Version
1.125.0 (build 67b4921)
Framework Version
No response
Node.js Version
12.x
OS
Amazon Linux 2
Language
Typescript
Language Version
No response
Other information
No response