Support passing AbortSignal in query()

brianc / node-postgres

PostgreSQL client for node.js.

https://node-postgres.com

MIT License

12.24k stars 1.22k forks source link

Support passing AbortSignal in query() #2774

Open mattbishop opened 2 years ago

mattbishop commented 2 years ago

Node offers an AbortSignal API that gives async operations a way to cancel their progress if aborted by the caller. Common use cases are web apps that see their request closed before completion, or in a local app, the user hits Ctrl-C to kill a running program. node-postgres's query() command could accept an AbortSignal instance and listen to it for abort events. When an abort occurs, it would close the Postgres query immediately.

A decent writeup can be found in this logrocket blog: https://blog.logrocket.com/complete-guide-abortcontroller-node-js/

Node Stream offers an option to pass an AbortSignal instance to close streams when an abort occurs. https://nodejs.org/dist/latest-v18.x/docs/api/all.html#all_stream_streamaddabortsignalsignal-stream

brianc commented 2 years ago

That would be a pretty nice way to cancel a query. Canceling a query with postgres is kinda a gross operation under the hood which is why support isn't great in this library. You have to create a new connection and then tell postgres to cancel a query from another connection - the active connection running the query is "unresponsive" from our app's perspective while its waiting for a query to execute, and the backend doesn't read any network data once it receives a command to execute a query. It doesn't even check the socket to make sure it's still connected (which is why if your process dies and you're running a 30 min query it will still run in postgres). So while DX the semantics of wanting to kill a stream and have it tell postgres to cancel the query are really nice - what would need to happen under the hood is actually spinning an entirely new connection to the backend, going through the connection handshake, and then telling the backend to cancel the other connection's query. Then the existing query will throw an error saying 'terminiated by backend.' It's a whole bunch of chaos. I do want to get better support for canceling queries in the near future though because yah killing your program and having a rouge query running for 30 min in the background without any way to stop it outside of going to psql and finding and kiling it is annoying.

mattbishop commented 2 years ago

Wow I had no idea it was so complicated!

I guess you can’t just kill the tcp connection on abort?

— Matt Bishop

On Jul 20, 2022, at 11:26 AM, Brian C @.***> wrote:

That would be a pretty nice way to cancel a query. Canceling a query with postgres is kinda a gross operation under the hood which is why support isn't great in this library. You have to create a new connection and then tell postgres to cancel a query from another connection - the active connection running the query is "unresponsive" from our app's perspective while its waiting for a query to execute, and the backend doesn't read any network data once it receives a command to execute a query. It doesn't even check the socket to make sure it's still connected (which is why if your process dies and you're running a 30 min query it will still run in postgres). So while DX the semantics of wanting to kill a stream and have it tell postgres to cancel the query are really nice - what would need to happen under the hood is actually spinning an entirely new connection to the backend, going through the connection handshake, and then telling the backend to cancel the other connection's query. Then the existing query will throw an error saying 'terminiated by backend.' It's a whole bunch of chaos. I do want to get better support for canceling queries in the near future though because yah killing your program and having a rouge query running for 30 min in the background without any way to stop it outside of going to psql and finding and kiling it is annoying.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

nicholaswmin commented 5 months ago

@brianc Hi Brian,

I assume what you're talking is pg_terminate_backend or something of the sort? If that's the case it is ineed a pile of garbage to reason with; but I think it should be documented somewhere.This is a recurring problem with people using streams; some I would be are not even aware they have runaway queries that need to be cancelled.

that being said, it would be quite nice to have such capability baked in the library.

brianc commented 5 months ago

pg_terminate_backend

Yeah the cancel command in the protocol works the same as pg_cancel_backend(pid) actually - cancels the active query. pg_terminate_backend actually (attempts) to terminate the process (postgres uses a 1 process per connection model) which ends up closing the client entirely. The problem here is if node-postgres were to have a cancel function its hard to figure out the right API for it while maintaining principal of least surprise. The easiest way honestly would be to export a top level cancel function, since canceling a query requires a client and requires a client that's not currently executing an existing query. This actually works:

async function cancel(client: pg.Client): void {
  // note: need to use the same exact connection config to connect to the same backend
  const newClient = new pg.Client({
    host: client.host,
    user: client.user,
    // copy every single optional connection config for ssl and so-forth
  })
  await newClient.connect()
  // note: this may or may-not cancel your query...you don't really have total control over that
  // https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL
  await newClient.query('SELECT PG_CANCEL_BACKEND($1)', [client.processId])
  await client.end()
}

Putting it on client.cancel is gross...it isn't using the current client to cancel, its secretly opening up a new client. So yeah...I think about the API for this sometimes and always leave feeling a bit incomplete on it. There is tecnically a client.query method already, too, though it's undocumented & I'm going to deprecate it because the API is some insanity I designed 10 years ago and is confusing to use and tries to be too clever. (read: wouldn't recommend using it) It should probably be something like

import { cancelQuery, Client } from 'pg'

const client = new Client()
client.query('SELECT PG_SLEEP(10000)`)

await cancelQuery(client)

I'd need to test whether the newly created client doing the canceling needs to connect over SSL - it doesn't look like it does according to the protocol docs so all I need from the original client is host and port (plus the processID and secret)

Anyways...yeah the top level cancelQuery function best reflects how the operation is actually performed...you can read more about it here:

https://www.postgresql.org/docs/7.3/protocol-protocol.html#:~:text=To%20issue%20a%20cancel%20request,and%20then%20close%20the%20connection.

nicholaswmin commented 5 months ago

The easiest way honestly would be to export a top level cancel function, since canceling a query requires a client and requires a client that's not currently executing an existing query. This actually works:

That's exactly what I would propose.

The other issue I see is strange error propagation: So, ok we're cancelling a query on one side but now the client that actually initiated the query errors out with client terminated error or something. But is that really an error now? We are the ones who decided to cancel it after all.

nicholaswmin commented 5 months ago

Maybe: await cancelQuery(client, { suppressClientErrors: true })

Yuck.. but at least it .. doesn't surprise I suppose..

brianc commented 5 months ago

That's exactly what I would propose.

Definitely open to suggestions on a different api!

boromisp commented 5 months ago

An other issue with cancellation is that it can only target the connection, not a specific query (there is nothing like a query ID in the protocol).

The cancellation signal might or might not have any effect — for example, if it arrives after the backend has finished processing the query, then it will have no effect. If the cancellation is effective, it results in the current command being terminated early with an error message.

The upshot of all this is that for reasons of both security and efficiency, the frontend has no direct way to tell whether a cancel request has succeeded. It must continue to wait for the backend to respond to the query. Issuing a cancel simply improves the odds that the current query will finish soon, and improves the odds that it will fail with an error message instead of succeeding.

This issue has been discussed on the different postgres mailing lists for over 20 years.

Here are some relevant threads from 2003 to 2020:

2020: https://postgrespro.com/list/thread-id/2515803 2016: https://www.postgresql.org/message-id/flat/CADT4RqDh1CEgz7QgKwYSLT9TMCk7O%3DncauUaSQKVt_nPNTE9wQ%40mail.gmail.com#00832c72f4b93d57d6b0ac59de8eca85 2015: https://www.postgresql.org/message-id/flat/CADT4RqAk0E10%3D9bA8V%2Buu0Dq9tR%2BPn8x%2BptQBXfC1FBiVH3MFA%40mail.gmail.com 2005: https://www.postgresql.org/message-id/flat/27126.1126649920%40sss.pgh.pa.us#75364d0966758fccad56cd6c71547771 https://www.postgresql.org/message-id/flat/438F2DE5.EE98.0025.0%40wicourts.gov https://www.postgresql.org/message-id/flat/s326eceb.025%40gwmta.wicourts.gov 2003: https://www.postgresql.org/message-id/20030916075258.GA11240@opencloud.com

What I got from them is that you should lock down the client until the cancel request connection is closed by the server.

And if you already have multiple queries in-flight (query pipelining?) it's not predictable which - if any - query will be canceled.

Once all the in-flight queries have resolved and the cancel request connection has been closed by the server you can be sure there won't be any surprise errors caused by race conditions.

brianc commented 5 months ago

Thanks @boromisp!

nicholaswmin commented 5 months ago

An other issue with cancellation is that it can only target the connection, not a specific query (there is nothing like a query ID in the protocol).

@boromisp That's only true for pg_terminate_backend; pg_cancel_backend keeps the connections alive. But that's nitpicking from my side, what people ,mean when they say cancel a query, is cancel the entire request so we're talking about `pg_terminate_backend' here.

I think there should be some pragmatism here. I'm not sure if @brianc would like to bloat the library will utility functions like this one. And I also agree with you, this is complicated and full of gotchas. Thank you for that list as well.

On the other hand, I am perplexed that so few people (relative to how many use this lib), ask for this. It's critical to have it, especially for those few times you messed up the architecture/design and have to compensate with long-running queries.

The cancellation signal might or might not have any effect — for example, if it arrives after the backend has finished processing the query, then it will have no effect

Yes - it is full of gotchas. But someone with a need to cancel queries would likely be in a position where the vast majority of the cancellations hit a running query. You don't go looking on ways to cancel a query unless you're talking about long, really, really long queries. That's an anecdote from me, sure.

Btw language here is important; you say cancel but I'm not sure if you mean cancel or kill? Because they are different.

Errant long queries have bitten us multiple times in prod; user streams something, it's taking more than 5 seconds, he refreshes because.. because user, the query becomes errant and a potential blocker of subsequent queries. Just a tiny bit of traffic spike and it eventually snowballs into a cascading failure. Issues like that masquerade themselves - you're lucky in a sense if you get it to fail catastrophically because you know it's there; the worse cases is where you paid tens of thousands more per year for a database you never really needed just to bring p95/99 times down.

And there's other use cases, people who want to allow aborting requests as a product feature of some sort.

I think it should at least be documented, in the style: look we won't do this in the library but here's a recipe for it and a couple of gotchas. There are, of course, other solutions; a connection-level statement_timeout.

boromisp commented 5 months ago

I'm not against, I have the same issues with runaway queries. My current workaround simply closes the connections from the client side if the request has been aborted, and that's not ideal.

More edge cases to review:

With multi-host connection options (https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS), you have to target the same host. (Not supported by pg right now, so not an issue?)
With DNS, you might want to make sure to connect to the same IP, so don't resolve the domain again.
With "dumb" proxies (eg., haproxy in front of multiple read replicas), you would need to spread the cancel request to all upstream servers. While this is technically possible, it requires support from the proxy.
PgBouncer does some interesting things with distributing cancel requests between peers that it knows about: https://www.pgbouncer.org/config.html#section-peers
While SSL might be optional for the cancel request, for example, neon probably requires it for the SNI (https://neon.tech/docs/connect/connect-from-any-app).

Some of this might be solvable by code (target the same IP, use the same SSL options), and the rest just needs to be called out in documentation to reduce confusion.

boromisp commented 5 months ago

Maybe: await cancelQuery(client, { suppressClientErrors: true })

Yuck.. but at least it .. doesn't surprise I suppose..

There is a specific 57014 query_canceled error code (https://www.postgresql.org/docs/current/errcodes-appendix.html) you could check for in your exception handler. I'm not sure, how supressing it would work on a promise based API however.

I might even introduce a new error for queries aborted before reaching the server, since cancel(client) should probability abort all queries in the client's query queue.

And maybe add something like an isCancelError(err) utility to catch both kinds of errors.

To be honest, a simple, low-level tryCancel(client) API could be a good first step. It would need to extract the relevant connection details from a client, send the cancel request, and wait to be closed by the server.

And possibly an other new client API: client.abortPendingQueries(reason). Something like this, without touching the activeQuery: https://github.com/brianc/node-postgres/blob/0096856e2ea5c362438543958cb2fab0a379450a/packages/pg/lib/client.js#L72

The application would have to be careful not to release the client back to the pool while the cancellation is pending

All the other fluff could be added later if needed.

nicholaswmin commented 5 months ago

That's some excellent resources and pointers. There's edge cases i wasn't aware of.

(Not supported by pg right now, so not an issue?)

I would say so as well, right off the bat

My current workaround simply closes the connections from the client side if the request has been aborted, and that's not ideal.

Yes that's a must; I have to go an extra step further and put a statement_timeout as well. Do you use streams by any chance?

PgBouncer does some interesting things with distributing cancel requests between peers that it knows about: https://www.pgbouncer.org/config.html#section-peers

I don't use PGBouncer but I'm aware it's a default standard that a lot of people use; so yes certainly any implementation should support it.

While SSL might be optional for the cancel request, for example, neon probably requires it for the SNI (https://neon.tech/docs/connect/connect-from-any-app).

I wouldn't be picky about vendor compatibility, especially if they are outliers.

With "dumb" proxies (eg., haproxy in front of multiple read replicas), you would need to spread the cancel request to all upstream servers. While this is technically possible, it requires support from the proxy.

No comment here, I only have 1 read replica and I point at it directly.

nicholaswmin commented 5 months ago

There is a specific 57014 query_canceled error code (https://www.postgresql.org/docs/current/errcodes-appendix.html) you could check for in your exception handler. I'm not sure, how supressing it would work on a promise based API however.

By far, the most peculiar and painful handling has to be streams. I'm on my 3rd day of evaluating different strategies on how to handle streams that error, cleanup correctly and produce some OK response when they error out mid-stream. Just putting this out there.

I wouldn't expect it to trigger errors when you're cancelling a query. This would be very surprising and perplexing. This might be a misaligment with the PG concept of query termination, but the whole point of a cancelQuery on the driver level would be an abstraction that takes care of that. I personally don't see a need for such a mechanism unless it cover that case.

boromisp commented 5 months ago

@nicholaswmin By streaming, do you mean something like pg-query-stream? And you want to turn errors optionally into EOF?

I don't have a lot of experience with the node stream API, but a configurable error handler might work:

https://github.com/brianc/node-postgres/blob/0096856e2ea5c362438543958cb2fab0a379450a/packages/pg-query-stream/src/index.ts#L54

 public _read(size: number) { 
   this.cursor.read(size, (err: Error, rows: any[]) => { 
     if (err) { 
       if (this.errorIsEOF?.(err) this.push(null)
       else this.destroy(err) 
     } else { 
       for (const row of rows) this.push(row) 
       if (rows.length < size) this.push(null) 
     } 
   }) 
 }

new QueryStream('SELECT ...', [], {
  /* query_canceled, eg. statement_timeout or user request */
  errorIsEOF: (err) => err?.code === '57014'
})

I think it would be a lot more difficult to suppress errors in the core pg library without breaking things. That would involve emitting fake postgres protocol messages to simulate the end of a successful response and not confuse the likes of pg-cursor.

nicholaswmin commented 5 months ago

@nicholaswmin By streaming, do you mean something like pg-query-stream? And you want to turn errors optionally into EOF?

I don't have a lot of experience with the node stream API, but a configurable error handler might work:

https://github.com/brianc/node-postgres/blob/0096856e2ea5c362438543958cb2fab0a379450a/packages/pg-query-stream/src/index.ts#L54
 public _read(size: number) { 
   this.cursor.read(size, (err: Error, rows: any[]) => { 
     if (err) { 
       if (this.errorIsEOF?.(err) this.push(null)
       else this.destroy(err) 
     } else { 
       for (const row of rows) this.push(row) 
       if (rows.length < size) this.push(null) 
     } 
   }) 
 }
new QueryStream('SELECT ...', [], {
  /* query_canceled, eg. statement_timeout or user request */
  errorIsEOF: (err) => err?.code === '57014'
})
I think it would be a lot more difficult to suppress errors in the core pg library without breaking things. That would involve emitting fake postgres protocol messages to simulate the end of a successful response and not confuse the likes of pg-cursor.

Ah man, thanks again for the pointers, so much appreciated - but I'll open another issue for that as I don't want to divert this conversation to my own issues. It's not about ending the DB stream but rather how to properly handle the errors downstream to res (to the client), if the query errors out mid-stream.

I think it would be a lot more difficult to suppress errors in the core pg library without breaking things. That would involve emitting fake postgres protocol messages to simulate the end of a successful response and not confuse the likes of pg-cursor.

I see. In that case how would you handle the error that will occur in the initial query? Conditionally? By sniffing it's error code and ignoring it as an error?

And is there an accurate picture of the use cases that this is requested for?

Or maybe that enough bike shedding; I can jump in next week and give it a go?