brianc / node-postgres

PostgreSQL client for node.js.
https://node-postgres.com
MIT License
12.32k stars 1.23k forks source link

[META] Timing issues when using parameterized expressions in non-native mode #2454

Closed amotl closed 3 years ago

amotl commented 3 years ago

Dear Brian and node-postgres contributors,

first things first: thanks a stack for conceiving and maintaining this excellent module.

While improving our test harness over at CrateDB on behalf of https://github.com/crate/crate-qa/pull/180, we believe we discovered a nasty issue with respect to async/timing of node-postgres where we humbly want to ask about your assistance.

I've created two concise repros within gists in order to outline what we've discovered. I am sharing those within the downstream issues #2455 and #2456. I hope they will also do their job properly on your end and that they will be good companions on mitigating this issue. While we specifically tested on CrateDB, I believe the problems might also manifest when talking to other PostgreSQL-compatible databases.

As the title says, the problems only happen when using parameterized queries and when using the pure-JS driver variant. When running the code examples on pg-native by providing the --native command line option, I have not been able to reproduce them. Also, when not using parameterized expressions by omitting the --parameterized option, everything works flawlessly.

Due to the intrinsic details about the asynchronous behavior of Node.js, the errors don't happen on every occasion, that's why I added documentation to the repros about how to run them in a reasonably tight loop by using the watch utility.

Also, for #2455, I've added documentation about how to run the connection through toxiproxy in order to add a slight amount of latency, which will increase the chance to trigger the situation tremendously. Please note that this does not apply to #2456, where the origin of the flaw is not related to slowness of the client but rather about its speediness.

For running the repros on my workstation, I used pg@8.5.1 and pg-native@3.0.0 on Node.js v14.15.1. While I am not sure on this, things like #2163, #2327 and #2355 might be related.

Thank you for listening so far. We would be really grateful if you could have a look into this.

With kind regards, Andreas.

brianc commented 3 years ago

Wow thanks so much for the intro and details! This kinda communication really helps me find and fix things fast! I am running some errands today but I will jump on this and work a fix as soon as I can this afternoon. Ill be in touch in a few hours!

On Wed, Jan 27, 2021 at 10:13 AM Andreas Motl notifications@github.com wrote:

Dear Brian and node-postgres contributors,

first things first: thanks a stack for conceiving and maintaining this excellent module.

While improving our test harness over at CrateDB on behalf of crate/crate-qa#180 https://github.com/crate/crate-qa/pull/180, we believe we discovered a nasty issue with respect to async/timing of node-postgres where we humbly want to ask about your assistance.

I've created two concise repros within gists in order to outline what we've discovered. I am sharing those within the downstream issues X and Y. I hope they will also do their job properly on your end and that they will be good companions on mitigating this issue. While we specifically tested on CrateDB, I believe the problems might also manifest when talking to other PostgreSQL-compatible databases.

As the title says, the problems only happen when using parameterized queries and when using the pure-JS driver variant. When running the code examples on pg-native by providing the --native command line option, I have not been able to reproduce them. Also, when not using parameterized expressions by omitting the --parameterized option, everything works flawlessly.

Due to the intrinsic details about the asynchronous behavior of Node.js, the errors don't happen on every occasion, that's why I added documentation to the repros about how to run them in a reasonably tight loop by using the watch utility.

Also, for X, I've added documentation about how to run the connection through toxiproxy https://github.com/Shopify/toxiproxy in order to add a slight amount of latency, which will increase the chance to trigger the situation tremendously. Please note that this does not apply to Y, where the origin of the flaw is not related to slowness of the client but rather about its speediness.

For running the repros on my workstation, I used pg@8.5.1 and pg-native@3.0.0 on Node.js v14.15.1. While I am not sure on this, things like #2163 https://github.com/brianc/node-postgres/issues/2163, #2327 https://github.com/brianc/node-postgres/issues/2327 and #2355 https://github.com/brianc/node-postgres/issues/2355 might be related.

Thank you for listening so far. We would be really grateful if you could have a look into this.

With kind regards, Andreas.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brianc/node-postgres/issues/2454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMHIPDTLX7S6GRHQUGUMDS4A3R3ANCNFSM4WVQYLPA .

brianc commented 3 years ago

Alrighty, should be starting on this in 2-2.5 hours, will letcha know when i find something

amotl commented 3 years ago

Hi Brian,

thanks already. And please take your time.

With kind regards, Andreas.

amotl commented 3 years ago

Dear Brian,

thank you very much for your prompt response and your investigations at https://github.com/brianc/node-postgres/issues/2455#issuecomment-769345077 which revealed a problem to us, where CrateDB was not responding correctly to PostgreSQL wire protocol requests. Sorry for the noise.

@mfussenegger (also thanks a stack!) added a mitigation on behalf of https://github.com/crate/crate/pull/10979:

If the flush triggers the deferred executions, then a follow up sync() returned a completed future. This led the pg-write layer to immediately send a ReadyForQuery message to the client, even if the operations triggered by the flush weren't finished.

This patch improves the logic so that flush sets the activeExecution property, which will then be used by sync. This ensures that the ReadyForQuery message is sent after the operations have completed.

I have been able to confirm that the repros now work flawlessly using the most recent nightly release of CrateDB. The fix will be part of the upcoming GA release 4.4.1.

With kind regards, Andreas.