berenddeboer / cdk-rds-sql

A CDK construct that allows creating roles and databases an on Aurora Serverless Postgresql cluster.
Apache License 2.0
23 stars 11 forks source link

Role creation sometimes errors with Tuple concurrently updated #22

Closed ww-daniel-mora closed 5 months ago

ww-daniel-mora commented 5 months ago

Role creation sometimes fails with the Tuple concurrently updated. A retry mechanism might be sufficient.

2024-06-11T20:51:53.067Z    f21a5213-3ca7-469d-aef9-43cbd7f60bda    DEBUG   Executing SQL [
  'start transaction',
  "create role anondata_rw with login password '<the password>'",
  'DO $$BEGIN\n' +
    `IF EXISTS (select from pg_database where datname = 'anonymizedData' and datistemplate = false) THEN grant connect on database "anonymizedData" to anondata_rw; END IF;\n` +
    'END$$;',
  'commit'
]
2024-06-11T20:51:53.067Z f21a5213-3ca7-469d-aef9-43cbd7f60bda DEBUG Executing SQL [ 'start transaction', "create role anondata_rw with login password '<the password>'", 'DO $$BEGIN\n' + `IF EXISTS (select from pg_database where datname = 'anonymizedData' and datistemplate = false) THEN grant connect on database "anonymizedData" to anondata_rw; END IF;\n` + 'END$$;', 'commit' ]
2024-06-11T20:51:55.928Z
2024-06-11T20:51:55.928Z    f21a5213-3ca7-469d-aef9-43cbd7f60bda    ERROR   Invoke Error    
{
    "errorType": "error",
    "errorMessage": "tuple concurrently updated",
    "code": "XX000",
    "length": 225,
    "name": "error",
    "severity": "ERROR",
    "where": "SQL statement \"grant connect on database \"anonymizedData\" to anondata_rw\"\nPL/pgSQL function inline_code_block line 2 at SQL statement",
    "file": "heapam.c",
    "line": "4665",
    "routine": "simple_heap_update",
    "stack": [
        "error: tuple concurrently updated",
        "    at <anonymous> (/node_modules/cdk-rds-sql/node_modules/pg/lib/client.js:526:17)",
        "    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)",
        "    at <anonymous> (/node_modules/cdk-rds-sql/src/handler.ts:313:13)",
        "    at async Promise.all (index 2)",
        "    at Runtime.exports.handler (/node_modules/cdk-rds-sql/src/handler.ts:311:9)"
    ]
}
ww-daniel-mora commented 5 months ago

I see this error is already handled with backoff. I wonder if we could just lock the roles table for each role modification.

// Custom error filter, mainly to retry role creation.
// Frequently see "tuple concurrently updated", and adding
// dependencies is very hard to make work.
const errorFilter = (error: any) => {
  // Retry only if the error message contains "tuple concurrently updated"
  return error.message.includes("tuple concurrently updated")
}
berenddeboer commented 5 months ago

You can try to add some dependencies, although I myself haven't had much luck with those, as I easily get circular dependencies.

And not sure what tables you need to lock honestly, we use a postgresql command, I don't think it's documented.

Did you try this with version 3.2? I just released that as I had the same issue. There are some other options too:

  1. Retry more often.
  2. Add jitter.

Or maybe we I need to rethink the whole approach, and role creation must be sequential, i.e. done at once instead of in parallel as right now.

ww-daniel-mora commented 5 months ago

I'll updated to the latest.

I did a bit of looking around and found this discussion on reliably dropping a role which I attempted to implement in #23 .

I also came across a concurrent delete error so expanded the error filter and added logging and doubled the retry. I see the pr check is failing so I will investigate that as well.

ww-daniel-mora commented 5 months ago

Will marked this as resolved with the addition of rollback. That should enable users to undo permissions changes.