cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.51k stars 3.7k forks source link

sql: fill out pg_advisory_lock stubs #13546

Open jordanlewis opened 7 years ago

jordanlewis commented 7 years ago

As @bdarnell points out in #13429,

it seems bad to lie about locking like this. Even if ActiveRecord is (currently) OK with us stubbing out this lock, implementing this function in a silently broken way could cause problems for other frameworks in the future.

We should provide an actual implementation of these builtins. I suppose it won't be too difficult to implement by writing some lock keys in a special system range.

Also Drupal uses this see #22211.

Epic CRDB-12077

Jira issue: CRDB-6112

cuongdo commented 7 years ago

should this be closed?

jordanlewis commented 7 years ago

No - we have no-op stubs for these functions, but as Ben points out, we probably shouldn't just pretend that they work correctly.

cuongdo commented 7 years ago

@jordanlewis does this need to be fixed for 1.0?

jordanlewis commented 7 years ago

No, I don't think so.

andreimatei commented 7 years ago

I think it's unlikely we'll ever implement the pg_advisory_lock functions because I don't think we can provide their semantics - locks tied to sessions, blocking semantics - in a distributed system such as ours. When do you release a lock when the node of the session that acquired it dies? The best you can do is lease semantics, which is not what these functions want. A number of people have requested this; I think we should close this issue and provide an FAQ instead explaining why we don't support it. cc @jseldess, @bdarnell

jseldess commented 7 years ago

An FAQ sgtm, pending @bdarnell's response.

jordanlewis commented 7 years ago

@andreimatei I'm not convinced we couldn't provide a close-enough approximation. We don't need perfect blocking semantics (polling would be fine), and it's not totally clear to me whether advisory locks need to be completely robust in the face of node failure. Perhaps we could use a (session id, node id, lock id) triple as the lock id and lazily invalidate locks if a node id corresponds to one that's been detected as down? I'm not familiar with how dead node detection works so perhaps this is misguided.

andreimatei commented 7 years ago

But what about partitioned nodes? How do you tell a client that its lock is gone when the node it was talking to is partitioned away from... the quorum of the locks table?

bdarnell commented 7 years ago

I think we have enough flexibility in the semantics that we can implement something that will work.

pg_advisory_xact_lock would be easier than pg_advisory_lock, I think, since we could be sure to push the transaction when we're breaking its lock. But it seems that the session-scoped variants are more commonly requested/used.

How do you tell a client that its lock is gone when the node it was talking to is partitioned away from... the quorum of the locks table?

The same way postgres does - we don't. In the event of a partition you're not guaranteed to be notified that you've lost your lock before someone else gets it.

andreimatei commented 7 years ago

How do you tell a client that its lock is gone when the node it was talking to is partitioned away from... the quorum of the locks table?

The same way postgres does - we don't. In the event of a partition you're not guaranteed to be notified that you've lost your lock before someone else gets it.

What do you mean by "the same way postgres does"? How does a single-node Postgres have the same issue? If the client is partitioned away from the server, and the session loses the lock, I assume you can't do anything with that session anymore (probably because some TCP connection timed out somewhere). So you may not be notified of anything necessarily, but at least you won't be able to commit stuff while you believe you have the lock.

Anyway, I was thinking that we could also abruptly close the client connection when we fail to heartbeat some session lease...

bdarnell commented 7 years ago

You can't do anything else on that session, and maybe that's enough. But I've seen people do all kinds of crazy things with the advisory locks, using them to control stuff done outside the session (I think all of these schemes are basically unsound). But my point is that since these locks are used to govern things that are essentially non-transactional like DDL statements (because regular transactions do their own locking and don't need another layer of artificial locks), and the lock may be broken by node/network failure, you still have to account for various kinds of inconsistencies. These locks cannot be used to protect invariants in the way that regular mutexes are.

asubiotto commented 4 years ago

@ajwerner can this be closed?

ajwerner commented 4 years ago

@ajwerner can this be closed?

No, we didn't do anything here. We prototyped something during the hackathon but it was far from something we'd merge.

sandstrom commented 3 years ago

Here are two data points / examples on what these locks can be used for:

  1. Async job queue implemented on top of Postgres

Que is a high-performance job queue that improves the reliability of your application by protecting your jobs with the same ACID guarantees as the rest of your data. It uses PostgreSQL's advisory locks, which gives it several advantages over other RDBMS-backed queues.

https://github.com/que-rb/que

  1. Migration locks in Rails 5

https://github.com/rails/rails/blob/8de10f9bb6edaf10bde21cea73db3da2aac6f6de/activerecord/lib/active_record/migration.rb#L151

nekinie commented 3 years ago
  1. Migration locks in Tern https://github.com/jackc/tern/blob/51868440dc9525767fda3fcd96757d5ebb2bcebc/migrate/migrate.go#L274
Nican commented 3 years ago

+1 Simple.Migrations also use pg_advisory_unlock for migrations: https://github.com/canton7/Simple.Migrations/blob/master/src/Simple.Migrations/DatabaseProvider/PostgresqlDatabaseProvider.cs#L65

rafiss commented 3 years ago

Prisma Migrate uses this too https://github.com/prisma/migrate/issues/426

sandstrom commented 3 years ago

Here is another popular Rails ActiveJob (async job) adapter using Advisory Locks: https://github.com/bensheldon/good_job

itmecho commented 2 years ago

sqlx rust library also uses pg_advisory_lock(): https://github.com/launchbadge/sqlx/blob/f0d0dce8e25c40dffd1431776b6a38510a70bde0/sqlx-core/src/postgres/migrate.rs#L184

nrueckmann commented 2 years ago

I stumbled about this issue as well. May we at least document missing features like this in https://www.cockroachlabs.com/docs/stable/postgresql-compatibility.html?

DXist commented 1 year ago

Lease pattern is an alternative to locks in distributed systems world.

Is it possible to expose application level leases in CockroachDB specific SQL dialect?

sbward commented 1 year ago

Please implement this soon 😄

abonander commented 1 year ago

How are migration tools supposed to prevent multiple instances from trying to run migrations at the same time? If you have multiple instances of a high-availability application with embedded migrations coming online at the same time, they're all going to see that the database needs to be migrated and attempt to apply migrations. If the migrations themselves aren't written with this scenario in mind, they may fail or cause data loss.

DXist commented 1 year ago

@abonander , I've personally decided not to run migrations from application servers (though they could validate if already applied migrations are valid during startup).

I decided to run migrations in a dedicated db manager service that runs at the beginning of deployment process. It runs queries from a db user that can change database schema. Application services run from another user that can change data but not the schema. So this approach gives improved security.

AaronFriel commented 1 year ago

Hey folks, would appreciate your advice on an infrastructure as code issue. I'm looking into this bug report on pulumi-postgresql:

That user reports an error when creating multiple grants in Postgres via Pulumi with an error like so:

error: could not execute revoke query: pq: tuple concurrently updated

This was supposedly fixed upstream of our bridged provider in this PR, adding pg_advisory_locks to limit concurrency:

However, that reportedly breaks compatibility with CockroachDB:

That chain led me here, and I think what would help us unblock this issue with managing role grants in IaC would be advice for the repository owner (@cyrilgdn).

I know my way around a SQL query and some Postgres internals, but I don't know the right answer here to address grants.

I'd love to unblock upgrading our Postgres provider and ensure our users have a great experience using it on Cockroach. Can you lend us some advice?