cyrilgdn / terraform-provider-postgresql

Terraform PostgreSQL provider
https://www.terraform.io/docs/providers/postgresql/
Mozilla Public License 2.0
395 stars 200 forks source link

"tuple concurrently updated" error on concurrent GRANT statements #178

Open Bhashit opened 2 years ago

Bhashit commented 2 years ago

Terraform Version

1.1.1

Affected Resource(s)

Terraform Configuration Files


provider "postgresql" {
  host             = var.postgres_host
  port             = var.postgres_port
  username         = var.root_user_name
  password         = var.root_user_password
  expected_version = "12.3"
  superuser        = false
}

resource "postgresql_grant" "connect_db" {
  database    = postgresql_database.db.name
  object_type = "database"
  privileges  = ["CREATE", "CONNECT"]
  role        = postgresql_role.svc_admin.name
}

resource "postgresql_grant" "use_schema" {
  database    = postgresql_database.db.name
  object_type = "schema"
  privileges  = ["CREATE", "USAGE"]
  role        = postgresql_role.svc_admin.name
  schema      = "public"
}

Panic Output

╷
│ Error: could not execute revoke query: pq: tuple concurrently updated
│ 
│   with module.svc.postgresql_grant.use_schema,
│   on .terraform/modules/svc/main.tf line 118, in resource "postgresql_grant" "use_schema":
│  118: resource "postgresql_grant" "use_schema" {
│ 
╵

Expected Behavior

Multiple GRANT statements should get executed correctly.

Actual Behavior

terraform apply fails intermittently when multiple GRANT statements are involved.

Steps to Reproduce

  1. terraform apply with multiple grant statements. You can also try a large number of statements with a for_each to make it more likely that the error will happen.

Important Factoids

Found this threads on postgres/terraform mailing lists:

  1. https://www.postgresql.org/message-id/967138.1634322291@sss.pgh.pa.us
  2. https://www.postgresql.org/message-id/20171228063004.GB6181@paquier.xyz
  3. https://discuss.hashicorp.com/t/for-each-support-sequential-operation/34680

The "solution" seems to be to run things sequentially. However, ideally, we should be able to handle this at the provider level. For ex. by either locking the table appropriately, or by retrying after a backoff period perhaps before failing.

One interesting thing that happened was that with my terraform apply, when TF exited, it didn't save the state. So, it created some resources, but they weren't tracked in the state. That could be a Terraform bug, but I thought I should at least mention it here.

JamesTimms commented 2 years ago

I think this is a similar problem to the issue/fix in #169

fabiopaiva commented 2 years ago

I'm having the same issue, even after upgrading to 1.15 and configuring parallelism=0

phclark commented 2 years ago

We were able to fix the issue by setting TF_CLI_ARGS_apply="-parallelism=1" (on provider version 1.15.0), but this certainly isn't ideal. I would love a fix as described above.

iskarbnik commented 2 years ago

same issue, using provider version 1.15.0

ConorWyse commented 2 years ago

We are experiencing the same problem even after upgrading to v1.15.0. We loop through a list of 3 items, create a Postgres user for each, and grant connect rights to those users. It fails consistently in the grant section.

postgresql_role.user["example"]: Creating...
postgresql_role.user["example_read"]: Creating...
postgresql_role.user["example_emergency"]: Creating...
postgresql_role.user["example_read"]: Creation complete after 1s [id=example_read]
postgresql_role.user["example"]: Creation complete after 1s [id=example]
postgresql_role.user["example_emergency"]: Creation complete after 1s [id=example_emergency]
postgresql_grant.user_db_connect_grants["example_emergency"]: Creating...
postgresql_grant.user_db_connect_grants["example"]: Creating...
postgresql_grant.user_db_connect_grants["example_read"]: Creating...
postgresql_grant.user_db_connect_grants["example"]: Creation complete after 1s [id=example_example_database]
postgresql_grant.user_db_connect_grants["example_read"]: Creation complete after 2s [id=example_read_example_database]
│ 
│ Error: could not execute revoke query: pq: tuple concurrently updated
│ 
│   with postgresql_grant.user_db_connect_grants["example_emergency"],
│   on resources.tf line 106, in resource "postgresql_grant" "user_db_connect_grants":
│  106: resource "postgresql_grant" "user_db_connect_grants" {

Terraform version 1.1.8

Update: Using the argument -parallelism=1 seems to have solved the problem.

red8888 commented 2 years ago

bump. I have many resources in additional to postgres grants in the same workspace. I dont want to have to set -parallelism=1

philip-harvey commented 2 years ago

This really needs to be fixed, -parallelism=1 makes Terraform runs take hours....

marcneander commented 2 years ago

Would love to see this fixed. A retry approach for this specific error would be sweet!

dpolivaev commented 2 years ago

Using postgresql provider parameter max_connections = 1 seems to help as a workaround. But a real fix would be highly appreciated.

philip-harvey commented 1 year ago

The "fix" in #224 has broken the max_connections = 1 setting, so I'm having to pin to provider version 1.1.7

debu99 commented 1 year ago

still have this issue in 1.18

philip-harvey commented 1 year ago

Any chance the change in #224 can be rolled back? In the past we could set max_connections = 1 and it would work, now it's just totally broken for versions > 1.1.7

kylejohnson commented 1 year ago

Hey everyone - I think I have a working fix in #352.

reddragond commented 1 year ago

As a workaround you can use -parallelism=1 in your terraform apply/destroy commands. It will run very slowly but it avoids this issue until a fix can be implemented. I've been using it with 1.21.0 for a while now with no issues.

kylejohnson commented 1 year ago

Hi everyone,

I went ahead and released a beta version which targets the tuple-concurrently-updated branch used in #352.

Using this beta release, the tuple concurrently updated error has been fixed 100% of the time in my internal usage at $dayjob.

I'd appreciate it if anyone else could test and confirm these results as well.

terraform {
  required_providers {
    postgresql = {
      source  = "cyrilgdn/postgresql"
      version = "1.21.1-beta.1"
    }
  }
}
kylejohnson commented 1 year ago

Oops, didn't mean to close.

rudl002 commented 12 months ago

One datapoint: our internal TF module, which creates a DB and executes several GRANTs did just run perfectly on the first try with the version "1.21.1-beta.1" 👍

Previously, the first run was almost guaranteed to trigger the "tuple concurrently updated" condition. We always needed at least 2 (sometimes 3) attempts to run to completion.

Thanks a lot for fixing this!

debu99 commented 12 months ago

Really appreciate the work👍

joaocc commented 11 months ago

Hi. We still face this issue with 1.21.1-beta.1. Not only that, but when we try to consolidate postgresql_grant and postgresql_grant_role resources into 2 big ones (merging dicts), we get something different:

Error: could not get advisory lock for members of role ROLE_NAME: pq: deadlock detected
...
...
...
Error: pq: grant options cannot be granted back to your own grantor
...
resource "postgresql_grant" "all" {
rmhw commented 8 months ago

I'm also facing this issue in 1.22.0. In case it's relevant, I'm creating grants using a for_each (as below).

resource "postgresql_grant" "grant_all_on_tables" {
  for_each    = toset(var.full_access_users)

  database    = postgresql_database.main.name
  role        = each.value
  schema      = "public"
  object_type = "table"
  privileges  = ["DELETE","INSERT","REFERENCES","SELECT","TRIGGER","TRUNCATE","UPDATE"]
}
nabutbul commented 5 months ago

Maybe the owner can merge the branch "1.21.1-beta.1" to master? It fix the issue with tuple concurrently update and it's not part of 1.22.0

m-barthelemy commented 3 months ago

It seems to have fixed our previous issues as well, we've been using this 1.21.1-beta.1 version for the past 3 months.