Open jordanlewis opened 7 years ago
should this be closed?
No - we have no-op stubs for these functions, but as Ben points out, we probably shouldn't just pretend that they work correctly.
@jordanlewis does this need to be fixed for 1.0?
No, I don't think so.
I think it's unlikely we'll ever implement the pg_advisory_lock
functions because I don't think we can provide their semantics - locks tied to sessions, blocking semantics - in a distributed system such as ours. When do you release a lock when the node of the session that acquired it dies? The best you can do is lease semantics, which is not what these functions want.
A number of people have requested this; I think we should close this issue and provide an FAQ instead explaining why we don't support it.
cc @jseldess, @bdarnell
An FAQ sgtm, pending @bdarnell's response.
@andreimatei I'm not convinced we couldn't provide a close-enough approximation. We don't need perfect blocking semantics (polling would be fine), and it's not totally clear to me whether advisory locks need to be completely robust in the face of node failure. Perhaps we could use a (session id, node id, lock id) triple as the lock id and lazily invalidate locks if a node id corresponds to one that's been detected as down? I'm not familiar with how dead node detection works so perhaps this is misguided.
But what about partitioned nodes? How do you tell a client that its lock is gone when the node it was talking to is partitioned away from... the quorum of the locks table?
I think we have enough flexibility in the semantics that we can implement something that will work.
pg_advisory_xact_lock
would be easier than pg_advisory_lock
, I think, since we could be sure to push the transaction when we're breaking its lock. But it seems that the session-scoped variants are more commonly requested/used.
How do you tell a client that its lock is gone when the node it was talking to is partitioned away from... the quorum of the locks table?
The same way postgres does - we don't. In the event of a partition you're not guaranteed to be notified that you've lost your lock before someone else gets it.
How do you tell a client that its lock is gone when the node it was talking to is partitioned away from... the quorum of the locks table?
The same way postgres does - we don't. In the event of a partition you're not guaranteed to be notified that you've lost your lock before someone else gets it.
What do you mean by "the same way postgres does"? How does a single-node Postgres have the same issue? If the client is partitioned away from the server, and the session loses the lock, I assume you can't do anything with that session anymore (probably because some TCP connection timed out somewhere). So you may not be notified of anything necessarily, but at least you won't be able to commit stuff while you believe you have the lock.
Anyway, I was thinking that we could also abruptly close the client connection when we fail to heartbeat some session lease...
You can't do anything else on that session, and maybe that's enough. But I've seen people do all kinds of crazy things with the advisory locks, using them to control stuff done outside the session (I think all of these schemes are basically unsound). But my point is that since these locks are used to govern things that are essentially non-transactional like DDL statements (because regular transactions do their own locking and don't need another layer of artificial locks), and the lock may be broken by node/network failure, you still have to account for various kinds of inconsistencies. These locks cannot be used to protect invariants in the way that regular mutexes are.
@ajwerner can this be closed?
@ajwerner can this be closed?
No, we didn't do anything here. We prototyped something during the hackathon but it was far from something we'd merge.
Here are two data points / examples on what these locks can be used for:
Que is a high-performance job queue that improves the reliability of your application by protecting your jobs with the same ACID guarantees as the rest of your data. It uses PostgreSQL's advisory locks, which gives it several advantages over other RDBMS-backed queues.
+1
Simple.Migrations
also use pg_advisory_unlock
for migrations:
https://github.com/canton7/Simple.Migrations/blob/master/src/Simple.Migrations/DatabaseProvider/PostgresqlDatabaseProvider.cs#L65
Prisma Migrate uses this too https://github.com/prisma/migrate/issues/426
Here is another popular Rails ActiveJob (async job) adapter using Advisory Locks: https://github.com/bensheldon/good_job
sqlx
rust library also uses pg_advisory_lock()
: https://github.com/launchbadge/sqlx/blob/f0d0dce8e25c40dffd1431776b6a38510a70bde0/sqlx-core/src/postgres/migrate.rs#L184
I stumbled about this issue as well. May we at least document missing features like this in https://www.cockroachlabs.com/docs/stable/postgresql-compatibility.html?
Lease pattern is an alternative to locks in distributed systems world.
Is it possible to expose application level leases in CockroachDB specific SQL dialect?
Please implement this soon 😄
How are migration tools supposed to prevent multiple instances from trying to run migrations at the same time? If you have multiple instances of a high-availability application with embedded migrations coming online at the same time, they're all going to see that the database needs to be migrated and attempt to apply migrations. If the migrations themselves aren't written with this scenario in mind, they may fail or cause data loss.
@abonander , I've personally decided not to run migrations from application servers (though they could validate if already applied migrations are valid during startup).
I decided to run migrations in a dedicated db manager service that runs at the beginning of deployment process. It runs queries from a db user that can change database schema. Application services run from another user that can change data but not the schema. So this approach gives improved security.
Hey folks, would appreciate your advice on an infrastructure as code issue. I'm looking into this bug report on pulumi-postgresql:
That user reports an error when creating multiple grants in Postgres via Pulumi with an error like so:
error: could not execute revoke query: pq: tuple concurrently updated
This was supposedly fixed upstream of our bridged provider in this PR, adding pg_advisory_locks to limit concurrency:
However, that reportedly breaks compatibility with CockroachDB:
That chain led me here, and I think what would help us unblock this issue with managing role grants in IaC would be advice for the repository owner (@cyrilgdn).
I know my way around a SQL query and some Postgres internals, but I don't know the right answer here to address grants.
I'd love to unblock upgrading our Postgres provider and ensure our users have a great experience using it on Cockroach. Can you lend us some advice?
As @bdarnell points out in #13429,
We should provide an actual implementation of these builtins. I suppose it won't be too difficult to implement by writing some lock keys in a special system range.
Also Drupal uses this see #22211.
Epic CRDB-12077
Jira issue: CRDB-6112