Fix for `Error: could not execute revoke query: pq: tuple concurrently updated`

cyrilgdn / terraform-provider-postgresql

Terraform PostgreSQL provider

https://www.terraform.io/docs/providers/postgresql/

Mozilla Public License 2.0

356 stars 181 forks source link

Fix for `Error: could not execute revoke query: pq: tuple concurrently updated` #352

Open kylejohnson opened 9 months ago

kylejohnson commented 9 months ago

This should be a fix for #178.

From what I can tell, the locks created in pgLockRole and pgLockDatabase were not sufficient to cover cases when we're messing with a schema, such as REVOKE ALL PRIVILEGES ON SCHEMA %s FROM %s.

I created an example in examples/issues/178 in which I was able to reproduce the error in nearly 50% of terraform apply or terraform destroy operations. With this change, the error hasn't been reproduced in over 30 such operations.

kylejohnson commented 9 months ago

I'm working on adding test coverage for this issue / fix. The goal is to run the terraform code in examples/issues/178 sequentially some number of times, to consider the test "passed" if all runs complete without any errors, and "failed" if there are any errors.

kylejohnson commented 9 months ago

Tests are failing because examples/issues/178/main.tf specifies

    postgresql = {
      source  = "cyrilgdn/postgresql"
      version = "1.21"
    }

and because TestConcurrentPostgresqlGrant is using terratest, which does things quite differently than the way the rest of the tests are structured.

When developing locally, I build the provider and copy it to the correct location locally on disk, before running examples/issues/178/test.sh, so I need to find a similar way to do this within the go testing environment. Ideas?

kylejohnson commented 8 months ago

@cyrilgdn Could you review this please?

kylejohnson commented 8 months ago

I've made a beta release targeting this branch, and in my case, the tuple concurrently updated error has been completely fixed. version = "1.21.1-beta.1"

philip-harvey commented 7 months ago

Hi @kylejohnson, thanks for looking at this issue. I'm testing out the beta and am now getting this error: │ Error: could not get advisory lock for members of role rolenamehere: pq: deadlock detected

This error happens on both postgresql_grant and postgresql_default_privileges

kylejohnson commented 7 months ago

Hi @kylejohnson, thanks for looking at this issue. I'm testing out the beta and am now getting this error: │ Error: could not get advisory lock for members of role rolenamehere: pq: deadlock detected

This error happens on both postgresql_grant and postgresql_default_privileges

Could you provide an example of code that recreates your error?

philip-harvey commented 7 months ago

It's somewhat difficult to recreate since it's timing dependent, but something like this should do it.

resource "postgresql_grant" "grant" { for_each = local.grants_map role = each.value.rolename database = local.database_name schema = each.value.schema object_type = each.value.object_type objects = each.value.objects privileges = each.value.privileges }

resource "postgresql_default_privileges" "tables" { for_each = local.default_table_grants_map database = local.database_name role = each.value.rolename owner = var.db_owner schema = each.value.schema object_type = "table" privileges = each.value.privileges }

I suspect that the deadlock occurs between the default privileges and the grants since they both apply a ton of table grants

I ended up setting max concurrency back to 1 and this issue went away again.

pagalba-com commented 6 months ago

I have tested 1.21.1-beta.1 and concurrency issues are not fixed.

The fix provided by giner works. This one not. Ref: https://github.com/giner/terraform-provider-postgresql/commits/fix_getting_stuck_on_refreshing_postgresql_grant

When terraform is refreshing state, it continuously slows down until gets stuck. Never completes.

May be good to understand would be, that I use provider inside a terraform module I built for quite standard function access.

Then every single function, has own terraform file, which uses module, to create standard function user and access for it.

Even if you run terraform with concurrency 1, and setup database connection limit to 1, still it creates like 20 processes, each for single function, and tries to refresh state for 20 modules.

Please make that giner commit into code, as it looks it works perfectly (build and tested it locally).