MagicStack / asyncpg

A fast PostgreSQL Database Client Library for Python/asyncio.
Apache License 2.0
6.99k stars 404 forks source link

Inconsistent permission denied error on remote postgres server #1097

Closed alamTillit closed 1 year ago

alamTillit commented 1 year ago

Extremely weird issue. Even though all the necessary privileges has been granted to a service account ( we are using iam auth ), a select query still would fail inconsistently due to permission denied. I tried to use the same user to run the same query against the same remote db server and it works fine.

The error message shown in our service:

sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.InsufficientPrivilegeError'>: permission denied for table a_table

The outline of the query is basically SELECT a, b FROM "a_schema"."a_table" WHERE ... so it doesn't contain relation to other tables and it's only query in "a_schema"."a_table"

What I have tried already:

  1. Bump the version of asyncpg to the latest
  2. Bump the version of sqlalchemy[asyncio] to the latest 1.4.X
  3. Run the same query on local machine with the same service account and it works fine
  4. Checked cloud sql proxy logs and no error logs are found
  5. Checked cloud sql logs and no same pattern of error logs are found
  6. Grant all privileges again to the service account on "a_schema"."a_table"
  7. Alter default privileges to the service account on "a_schema"
  8. Grant all privileges to the service account on table "a_table"
  9. Checked the service account contain all necessary privileges.

I could see there is the same issue happened 3 years ago on stackoverflow. I suspect it might be an existing rare bug. https://stackoverflow.com/questions/63674284/inconsistent-permission-error-on-postgres-schema

Sorry, I understand it's very hard to debug on your side. I created an issue here to just see what else I could do or test in order to debug this issue. I run out of any ideas to further track down this issue....

alamTillit commented 1 year ago

Further investigation on this issue. Seems like, every time, only two pods out of many pods ( yes, we are running services on k8s ) will have this issue. I suspect these two pods might interfere each other and cause this issue.

alamTillit commented 1 year ago

Just for record, so anyone don't have to waste time on investigating this if they have the same case as I do. For my case, we forgot to put correct namespace filter and therefore we got the wrong logs pointing to a non-production namespace. The problematic pods were always in another namespace instead of the production one. In that namespace, those pods are indeed lacking of the required permission to do the query. I think this remind us again that the database is probably always right.