TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.46k stars 1.92k forks source link

Database commands coalescing #7019

Closed sebastienros closed 2 years ago

sebastienros commented 2 years ago

The Npgsql (PostgresQL driver for .NET) contributors have been looking into other drivers implementation to understand where improvements can be done.

Here is a current picture for the multi-queries scenario:

image

There are several frameworks around 30-34K (not all displayed here) and then a subsequent jump to ~60K.

After analyzing the network traffic between the Drogon framework and the Postgres server, which is represented in the following screenshot and validated by other specific payloads, it appears that queries issued from multiple http requests are coalesced in the same network packets and executed with a single "SYNC" command which will tell Postgres to execute them "as a batch".

image

Bundling together multiple commands in a single Sync batch has important consequences: if any command errors (e.g. unique constraint violation), all later commands are skipped until the Sync. Even more importantly, all commands are implicitly part of the same transaction.

Here we have not only implicit batching for the same request (20 queries), but more importantly batching across requests.

Our interpretation of the TechEmpower rule below leads us to believe that this method violates those rules

Except where noted, all database queries should be delivered to the database servers as-is and not coalesced or deduplicated at the database driver. In all cases where a database query is required, it is expected that the query will reach and execute on the database server. However, it is permissible to pipeline traffic between the application and database as a network-level optimization. That is, two or more separate queries may be delivered from the application to the database in a single transmission over the network, and likewise for query results from the database back to the application, as long as the queries themselves are executed separately on the database server and not combined into a single complex query.

We haven't looked if the other frameworks showing the same high performance are using the same technique.

/cc @an-tao @roji

roji commented 2 years ago

Note: most of the investigation work above was done by @NinoFloris and @vonzshik from the Npgsql project.

an-tao commented 2 years ago

https://2ndquadrant.github.io/postgres/libpq-batch-mode.html

here I used the batch mode of postgresql driver, this work has been done two years ago, and the coalescing happens only when:

  1. all sqls are read-only;
  2. all sqls in pipeline are in the same call stack of the current event-loop; so if any wrriten sql is executed, the sync comand is sent immediately, you could see the network traffic in the update test to comfirm it. The rules say it's not permited to combine mutiple sqls into a single complex query, I don't think drogon broke it because every reading sql is executed separately (they are in the same batch and sent together , it's a network-level optimization)
marty1885 commented 2 years ago

Hi, I'm another maintainer of Drogon.

I've chatted with @an-tao privately. He is not as fluent in English. Poke me if you need help explaining this reply. Please allow me to explain my view and response of the rules in detail. My views may differ with his, so I don't represent his position. I think this in a reasonable framework-level optimization. Hopefully helping both parties understand the differences.

all database queries should be delivered to the database servers as-is and not coalesced or deduplicated at the database driver.

In drogon with PQ-batch support. We control when the sync happens. It's a framework level optimization. To be specific, a sync happens when a SQL query that modifies data occurs, enough commands are batched or when the web application wasn't busy adding more queries to the batch. Therefor a batch only consists of read queries or ends with a write query. Any write fails (transaction conflict, uniqueness failure, etc..) will not cause reads to fail. And no subsequent commands exists to be skipped.

In all cases where a database query is required, it is expected that the query will reach and execute on the database server.

Besides above. The SQL banchmark only reads from the DB. Thus it's impossible to have a transaction error causing queries to fail. The test itself guarantee that. I think we are safe in this regard too.

Furthermore, I also think this is a reasonable optimization in practice. Since a batch always ends with a write. We guarantee all reads are executed (Note that SELECT .. FOR UPDATE is considered writing). The only edge case is when the web app runs an intentionally bad SELECT query. Which could get placed in the middle of a batch. But I consider it an application bug and I won't blame the framework for it.

Brar commented 2 years ago

@marty1885 Just for clarification. By "intentionally bad SELECT query" you mean any SELECT query that fails and causes the whole transaction to rollback instead of only the failing query - no?

marty1885 commented 2 years ago

No, I mean by some sequence of non-transaction queries.

co_await db->execSqlCoro("SELECT xxx FROM my_table WHERE ....");
co_await db->execSqlCoro("SELECT xxx FROM non_exist_table WHERE ....");;
co_await db->execSqlCoro("SELECT xxx FROM my_table WHERE ...."); // This gets dropped.

I think all queries in transaction are considered writing. Thus always not batched. Is it? If not, I think we need to add a special case. In any case, it doesn't invalidate this optimization. @an-tao

Also I think the performance benefits comes from sending all the queries at the same time. Only sending sync at the end is quark of pq_batch. We could spend some time look into the updated libpq and resolve the entire issue all together.

vonzshik commented 2 years ago

We guarantee all reads are executed (Note that SELECT .. FOR UPDATE is considered writing).

@marty1885 I assume you're talking about query parsing, which doesn't check for function and procedure calls (which can do inserts/updates). There is also a problem of PG adding a new keyword (however improbable it is), which might have side effects.

But I consider it an application bug and I won't blame the framework for it.

Speaking from my experience, I'm working on an app, which allows to generate queries via SQL with a bit of a meta language. As an example:

SELECT *
FROM "SomeTable"
WHERE 1 = 1
#param(ID, "ID")
#param(Value, "Value")

A user via UI can specify whether he wants to filter by ID or by Value (or both), and the corresponding SQL is generated with the required parameters. With some queries taking a hundreds of lines, it's not that rare for some combinations of parameters to generate an invalid SQL. While it's definitely not great, there is nothing requiring immediate fixing, since only the specific combination is broken. But if we were using drogon, that would start breaking unrelated queries, making that issue critical and requiring immediate fixing.

Just wondering, is batching (and possible issues from it) mentioned in drogon's documentation?

Brar commented 2 years ago

Sorry, by transaction I meant everything that the driver puts before a sync packet.

The issue I see is something like the following:

co_await db->execSqlCoro("SET statement_timeout TO 10000");
co_await db->execSqlCoro("SELECT xxx FROM my_table WHERE ....");

// Simulates a query that might happen to take slightly longer than statement_timeout under some load conditions
co_await db->execSqlCoro("SELECT pg_sleep(15)");

co_await db->execSqlCoro("SELECT xxx FROM my_table WHERE ...."); // This gets dropped.
roji commented 2 years ago

We guarantee all reads are executed (Note that SELECT .. FOR UPDATE is considered writing).

@marty1885 I assume you're talking about query parsing, which doesn't check for function and procedure calls (which can do inserts/updates). There is also a problem of PG adding a new keyword (however improbable it is), which might have side effects.

I want to make sure this crucial point by @vonzshik is clear... As I understand it, if a command is executed which happens to SELECT from a function, and around the same time another unrelated command is executed which produces an error (e.g. unique constraint violation), then any side-effects from the function may get rolled back, without any indication to the program that this happened. That is, the user code executing the function will continue as if the function completed successfully and its side effects were committed, when in fact they were silently rolled back.

There may be other such scenarios where "implicit" side-effects aren't detected. So I think the idea that a database driver can reliably parse SQL to know whether it's read-only is quite problematic.

@marty1885, you wrote:

Also I think the performance benefits comes from sending all the queries at the same time.

In our experience, sending the Sync at the end has quite a significant impact on performance (FWIW Npgsql does send all commands at the same time like Drogon, but with a Sync after each command instead of at the end). This is likely related with the PostgreSQL MVCC overhead of creating and committing many small transactions (Npgsql) vs. a single transaction which encompasses all commands (Drogon).

Only sending sync at the end is quark of pq_batch. We could spend some time look into the updated libpq and resolve the entire issue all together.

Importantly, Sync at the end is indeed the desirable behavior when application code wants to send a set of related commands, as a batch: either all commands complete or none do (I believe this was the intent of the PG pipelining mode). The crucial difference is that the application explicitly opts into this behavior - via some API gesture - whereas Drogon is applying this implicitly on unrelated commands, under the hood.

marty1885 commented 2 years ago

@vonzshik

I assume you're talking about query parsing, which doesn't check for function and procedure calls (which can do inserts/updates). There is also a problem of PG adding a new keyword (however improbable it is), which might have side effects.

Yeah. We noticed that earlier too. We will patch that.

Just wondering, is batching (and possible issues from it) mentioned in drogon's documentation?

Drogon's batching support requires a patched version of libpq (we haven't implement support after the feature being upstreamed). So we expect the users to know what they get getting themselves into at this point.

@Brar

Yeah.. that's an issue. But in a different way. Thanks for finding it.

co_await db->execSqlCoro("SELECT pg_sleep(15)"); suspends current coroutine from execution. No further code in the same coroutine gets executed. Since that statement always error out, it causes co_await <awaiter> to throw exception. And other SELECT statements in the same batch would also error out and throw. (My bad for the code I shared above. It's a quick illustration)

// coroutine 1
{
co_await db->execSqlCoro("SELECT pg_sleep(15)"); // This fails. Throw
co_await db->execSqlCoro("SELECT xxx FROM my_table WHERE ...."); // CPU never runs this because of throw
}

// coroutine 2
{
// Throws because the statement gets bundled and gets skipped.
co_await db->execSqlCoro("SELECT xxx FROM my_table WHERE ....");
}

@roji

In our experience, sending the Sync at the end has quite a significant impact on performance

Thanks for pointing it out. I'm discussing with @an-tao to determine that's the best action to fix this. Likely we'll add an option to opt-in to batch mode if we detect batch support.


Back to the original issue. I still think batching should be allowed in benchmark as it produces known behavior and does not break program flow when error happens. And it does not violate the rules under the benchmark.

roji commented 2 years ago

I assume you're talking about query parsing, which doesn't check for function and procedure calls (which can do inserts/updates). There is also a problem of PG adding a new keyword (however improbable it is), which might have side effects.

Yeah. We noticed that earlier too. We will patch that.

How do you propose to patch this? If the idea is to disable batching, that would mean the driver parsing to detect arbitrary function calls within arbitrary user SQL - that once again sounds quite problematic and brittle?

Back to the original issue. I still think batching should be allowed in benchmark as it produces known behavior and does not break program flow when error happens. And it does not violate the rules under the benchmark.

Stepping back here, all the various problems signaled above a result of Drogon applying implicit batching under the hood; this technique changes actual program behavior in a significant way - and does break program flow in ways that are very hard to predict and diagnose (as everything depends on timing). It's hard to see how this could be treated as a transparent network optimization.

Compare this with sending all the commands together, but with Syncs between each command; this indeed guarantees at the protocol level that each command runs independently of each other, and errors are guaranteed to not affect unrelated commands; IMHO this is what is meant by a network-level optimization.

marty1885 commented 2 years ago

How do you propose to patch this? If the idea is to disable batching, that would mean the driver parsing to detect arbitrary function calls within arbitrary user SQL - that once again sounds quite problematic and brittle?

We'll add additional checks to prevent common cases. I think it's quite rate for this to happen since we haven't get user complaining it yet. Then figure out how to solve this.

Drogon applying implicit batching under the hood; this technique changes actual program behavior in a significant way - and does break program flow in ways that are very hard to predict and diagnose ... and errors are guaranteed to not affect unrelated commands; IMHO this is what is meant by a network-level optimization.

I think this is acceptable. At least for me, as a C++ developer. We are used to dealing with data races and feels fine as long as timing doesn't crash the application or causes unrecoverable state change. It's perfectly ok for lock-free C++ algorithms to return error when stuff doesn't work this try. This characteristic gives me the same vibe. The maintainers are also treating SQL this way.

So in conclusion, is the following allowed?

roji commented 2 years ago

Drogon applying implicit batching under the hood; this technique changes actual program behavior in a significant way - and does break program flow in ways that are very hard to predict and diagnose ... and errors are guaranteed to not affect unrelated commands; IMHO this is what is meant by a network-level optimization.

I think this is acceptable. At least for me, as a C++ developer. We are used to dealing with data races and feels fine as long as timing doesn't crash the application or causes unrecoverable state change. It's perfectly ok for lock-free C++ algorithms to return error when stuff doesn't work this try. This characteristic gives me the same vibe. The maintainers are also treating SQL this way.

I agree that if an error were returned and the program could try again, this would maybe be acceptable. But here we're talking about database changes being silently rolled back - without an error - because some other unrelated command happened to error. This is quite different from attempting a lock-free operation and retrying it; it's more like your lock-free operation saying it succeeded, when in fact it didn't. I have no idea how I'd write reliable code in this situation, and so IMHO this does qualify as causing "unrecoverable state change". Regardless of how rare such a scenario is or what kind of SQL it requires from users, I don't believe a database driver should do something like this.

I do agree that it would be helpful for the benchmark rules to be more explicit on these nuances - IMHO it's important for the benchmarks not to encourage unsafe/dangerous driver behavior (as least according to my viewpoint).

marty1885 commented 2 years ago

database changes being silently rolled back - without an error

I to think about this. I think the current behavior is that driver notices something failed. Then framework understands some SQL commands gets dropped. Then the entire list of yet-to-be executed SQL calls throws error. Which is then recoverable.

Please help us determine what's the acceptable behavior for the benchmark. We will update our benchmark and framework.

roji commented 2 years ago

database changes being silently rolled back - without an error

I to think about this. I think the current behavior is that driver notices something failed. Then framework understands some SQL commands gets dropped. Then the entire list of yet-to-be executed SQL calls throws error. Which is then recoverable.

@marty1885 I'm not referring to yet-to-be-executed SQL statements - I do understand these are skipped and the user gets an error. The problem is with SQL statements which were already previously executed, already completed, and are now rolled back because of the error.

marty1885 commented 2 years ago

Ahhh... I didn't realize that

an-tao commented 2 years ago

Thanks all who pointed out the potential risks of this implementation, now I think this is not safe enough as you said. I wanted to avoid these risks through some keyword matching, but now it seems that my original consideration is not comprehensive. Some read-only sql can break current transaction (as the timeout example mentioned by @Brar ), and some write operations can occur when selecting something from a function or procedure call (as @vonzshik said), so for a general application, if we don't send a sync command after every query, it's hard to avoid unexpected failures.

But on the other hand, I don't want to deprive some users of the opportunity to use this mode if they know their SQLs are safe with this. So my fix will be a optional parameter (disabled by default) when creating database clients, considering it is supported by drogon to create multiple database clients (with their own connections) with different configrations, users can send 'safe' sqls with the delay synchronous database client and send others with the normal one (one command one sync).

I think this mode can be used in many scenarios, for example, all the test cases of TFB benchmark can be run in this mode without any problem. Since the C driver provides a separate sync command interface, why don't we use it? so I think after fixing this, It's not a problem but a feature for those who want extreme performance.

IMO, I still don't think this breaks the rule, which forbids merging requests into one complex sql, but doesn't forbid pipelining. The above is just my personal opinion, if the organizers of TFB think this is illegal, I will change back to one-command-one-sync mode (also batch mode) or disable the feature if I added the option mentioned above.

Brar commented 2 years ago

IMO, I still don't think this breaks the rule, which forbids merging requests into one complex sql, but doesn't forbid pipelining.

In my opinion the rules should not encourage dangerous/tricky driver implementations so no matter how we interpret the rule right now, I'd argue that it should be clarified in a way that requires a sync between what is meant to be unrelated commands in PostgreSQL.

If someone really thinks that a dangerous high-performance mode makes sense, this should be moved to a separate benchmark that isn't meant to reflect common general-purpose usage scenarios.

roji commented 2 years ago

@an-tao @marty1885 thanks for keeping an open mind in the discussion above!

I agree with @Brar about the undesirability of doing this in the TechEmpower benchmarks. While you can (obviously) decide to offer this unsafe mode in Drogon, I personally do not believe this capability makes sense in a database driver (even opt-in), and prefer to not implement it in Npgsql. However, that would mean that drivers which engage in inherently unsafe practices will always perform better, in effect pushing other drivers (like mine) to support them as well. IMHO this isn't the point of the TechEmpower benchmarks - I'd rather we compared the performance of safe, standard practices which real users would want to use in the real world.

In any case, I agree that at this point the TechEmpower organizers need to make a decision on what they want to allow, and how exactly to interpret the rule 7 of the general requirements, and whether the above constitutes an allowed "network-level optimization".

NateBrady23 commented 2 years ago

Let me ask this, since everyone here has a much better understanding of what's actually happening here. If you were to rewrite rule 7 to explicitly allow or deny the optimization that's happening here, what would that look like? I'm hoping that extra sentence or two will be succinct enough to help me understand.

roji commented 2 years ago

I'd propose something along the lines of:

If any pipelining or network optimization is performed, then the execution behavior and semantics must match what they would be if the multiple SQL statements were executed without the optimization, i.e. in separate roundtrips. For example, it is forbidden to perform any optimization could change the transactionality and/or error handling of the SQL statements. To avoid any doubt, such optimizations are forbidden even if they wouldn't produce the behavioral change with the specific SQL statements of the TechEmpower benchmarks, but would produce it with other statements a user might supply.

I don't want to introduce too much complexity to this discussion, but note that there's also the question of whether to allow explicit batching, which would be triggered via an API gesture in the benchmark code. This definitely wouldn't be a transparent network optimization (since it's explicit), but from a safety point of view it's entirely ok: the user code controls exactly which statements are included in the batch, and typically expects precisely the transactionality/error-handling behavior described above (skip all subsequent commands on any error in the batch and roll back). The crucial point with implicit batching is that it's done without the user specifying it, and on unrelated commands which can now affect each other.

I'm bringing up this point since rule 7 isn't clear on whether an explicit batching API is OK or not. In my personal opinion, it's fine to allow it since it's safe, and is how a developer would do a real-world implementation of e.g. the multiple query scenario. The sentences I wrote above seem like they disallow all forms of batching which would affect behavior, regardless of whether explicit or implicit; if we wanted to allow explicit batching, that would need a bit more tweaking.

NateBrady23 commented 2 years ago

One thing we do say in the Database Updates Test Rules is:

ix. Using bulk updates—batches of update statements or individual update statements that affect multiple rows—is acceptable but not required. To be clear: bulk reads are not permissible for selecting/reading the rows, but bulk updates are acceptable for writing the updates.

I think the line about bulk reads is something that's accidentally omitted from other test rules.

an-tao commented 2 years ago

https://github.com/TechEmpower/FrameworkBenchmarks/blob/04814eabe5dea7baeffddccf6d65fe9a4d2316f6/toolset/test_types/verifications.py#L405-L413 I think the verify_queries_count function checks if there is a bulk read that breaks the current rules. if we decide to forbid the behavior discussed here, a new function verify_transactions_count should be added. Maybe add following sentens to clearify the rules: Each individual sql request must be automatically committed in a separate transaction,explicitly or implicitly merging multiple SQLs into a transaction is prohibited,disabling transactions is also not allowed.

Brar commented 2 years ago

explicitly or implicitly merging multiple SQLs into a transaction is prohibited

I think that this touches the point that @roji was discussing above about explicit and implicit batching in the driver.

What we discussed above are the problems that arise when a driver implicitly merges SQL commands that the consumer of the API considers unrelated into one transaction so that they may be skipped or rolled back due to a failure of an unrelated command.

Allowing the consumer of a driver API to explicitly batch related SQL commands that they know are safe to run within a transaction is something different IMHO.

To illustrate with some pseudo API:

// Implicit batching
// These commands should not run in the same transaction because the API consumer considers them unrelated
db.Execute("SELECT * FROM test WHERE id = 1");
db.Execute("SELECT * FROM test WHERE id = 2");

// Or even worse implicit batching
// If these commands end up in the same transaction, how is the API consumer
// supposed to know that the failure of one command will affect the other one
Method_A()
{
    db.Execute("SELECT * FROM test WHERE id = 1");
}

Method_B()
{
    db.Execute("SELECT * FROM test WHERE id = 2");
}

// Explicit batching
// Here the API consumers have the option to explicitly group commands that they consider related
var batch = db.StartBatch()
batch.Add("SELECT * FROM test WHERE id = 1");
batch.Add("SELECT * FROM test WHERE id = 2");
batch.Execute();

That being said I think that the commands in the framework benchmark should be considered unrelated in order to better reflect a real life scenario and for that reason I agree with @an-tao that both explicit and implicit batching should be forbidden and the requirement that "Each individual sql request must be automatically committed in a separate transaction" makes sense to me.

A benchmark allowing explicit batching would be a special case that comes pretty close to what I called a "dangerous high-performance mode" in my comment above - except that it isn't as dangerous because the API consumer controls the dangers. In any case I still think that such a high-perf-mode doesn't reflect general usage scenarios and should live in a separate benchmark if it is considered worthwhile.

roji commented 2 years ago

Agree with @an-tao and @Brar that "Each individual sql request must be automatically committed in a separate transaction" is a good formulation, and pretty clearly disallow batching (explicit or implicit).

A benchmark allowing explicit batching would be a special case that comes pretty close to what I called a "dangerous high-performance mode" in my comment above - except that it isn't as dangerous because the API consumer controls the dangers. In any case I still think that such a high-perf-mode doesn't reflect general usage scenarios and should live in a separate benchmark if it is considered worthwhile.

Here I slightly disagree: I personally consider (explicit) batching to be a pretty basic optimization that anyone should be using where possible, rather than a high-perf mode carrying any dangers. The special transactionality and error handling semantics are usually a desirable feature being opted into - programs typically don't want later commands to continue if an earlier command failed. As I wrote above, I'd expect any developer implementing something like "multiple queries" to use it; for this reason, it may make sense to allow it in the benchmarks (or at least to be very clear on why we want to disallow it).

But whether explicit batching should be allows is ultimately a question of what the benchmarks are actually meant to measure.

Brar commented 2 years ago

Here I slightly disagree: I personally consider (explicit) batching to be a pretty basic optimization that anyone should be using where possible

I completely agree that batching is a basic optimization that definitely has its place and is even really common but let's talk about the Multiple Database Queries test and what kind of real live scenario it is meant to stand for and whether this is really "a network-level optimization".

You can probably batch queries in Multiple Database Queries test but how would that be compatible with requirement vi. that requires that "This test is designed to exercise multiple queries with each resulting row selected individually". and requirement v. that requires that "Each row must be selected randomly in the same fashion as the single database query test" "same fashion" isn't a very precise term but I understand it that way that you should execute the queries pretty much the same way as in the Single Database Query test and not mess with transaction semantics because - in the context of the benchmark - you happen to know that all queries are read-only and will never fail. While this would indeed be a pretty basic optimization in the context of the benchmark I still have a hard time calling it "a network-level optimization". It is definitely a PostgreSQL transaction handling optimization (you can call it wire protocol optimization but that only distracts from the fact that the real optimization runs inside PostgreSQL) where the speed gain doesn't come from the saved sync messages that don't get transmitted over the network but from a completely different code path this aggregated set of queries takes inside the PostgreSQL implementation. It is definitely possible to find real-live scenario where that PostgreSQL transaction handling optimization safely applies and that is the exact reason why it is and should be implemented in drivers supporting PostgreSQL but I stand with my opinion that this is not what this benchmark is meant to test and that this optimization should be tested in a different scope if it is considered relevant, since it basically tests the driver's capabilities to get the most out of the inner workings of PostgreSQL and not how well it manages to concatenate multiple queries into a network buffer.

ajcvickers commented 2 years ago

@nbrady-techempower I think the fundamental point here is that any network-level optimization must not change the observed behavior of the application, and that includes when errors happen. The optimizations discussed here have the same behavior when everything on the network and in the database succeeds, but change the behavior in unexpected and damaging ways when something goes wrong. Like @roji said above, I don't believe a driver should behavior in this way.

I don't think it is necessary in the rules to go into specifics about the kinds of behavior that are allowed/not allowed in terms of the batching and sync commands. (Although discussions like these will help clarify these things, of course.) Instead the rules simply need to state:

Network-level optimizations are permitted so long as the observable behavior of the driver does not change in any way, either under normal operating conditions, or when network or database errors occur.

Whether or not batching is allowed is really a separate discussion.

sebastienros commented 2 years ago

Adding @matt-42 (Lithium) @redkale (Redkale) @billywhizz (Just-JS)

Could you look at the recommend changes and confirm if it would mean your implementations need to be adapted? I noticed that your results on multiple-queries are even faster that Drogon as of the latest continuous run.

msmith-techempower commented 2 years ago

@ajcvickers

Network-level optimizations are permitted so long as the observable behavior of the driver does not change in any way, either under normal operating conditions, or when network or database errors occur.

This sounds like something we could test for in the verification step. Would there be some reproducible error we could cause to ensure that [something - what?] does/doesn't occur?

roji commented 2 years ago

@msmith-techempower I'm not sure what kind of verifications you guys perform... In order to observe this, one would have to send one command with side-effects (e.g. insert some row), batched before another (unrelated) command which fails. The expected behavior is for the side-effects from the first command to persist, but the implicit batching described above those side-effects silently disappear. This sounds like a special scenario that would have to be coded against each driver, so I'm not sure it's relevant for verification...

Otherwise, it's probably possible to run PostgreSQL in detailed logging/trace mode, and verify protocol messages being received. Short of that, a packet capture and a verification at that level should always be possible, in case you do that kind of thing.

Maybe @sebastienros has other ideas...

matt-42 commented 2 years ago

Adding @matt-42 (Lithium) @redkale (Redkale) @billywhizz (Just-JS)

Could you look at the recommend changes and confirm if it would mean your implementations need to be adapted? I noticed that your results on multiple-queries are even faster that Drogon as of the latest continuous run.

Lithium is using the upstream libpq batch mode yes. To me it is fine as long as frameworks using batching are tagged in TFB, that's why I added the -batch suffix to the lithium pgsql implementations in the benchmark. I could also add a lithium-postgres-nobatch if you want. If the TFB team forbid batching I'll remove them from the lithium impl, but I don't consider batching an unsafe optimization for read-only scenarios like the multi/single-request tests.

However if the problem is side-effects (one error in the batch canceling all the next requests of the batch), you can ask drivers to implement a failback mode where the remaining requests of an aborted batch are re-run automatically. This would remove the side-effect while keeping the same performance in case of no-errors batchs.

roji commented 2 years ago

However if the problem is side-effects (one error in the batch canceling all the next requests of the batch)

The problem is not the next requests of the batch getting cancelled, it's the previous requests being rolled back, since unrelated commands are wrapped together with a single Sync message at the end (and so there's an implicit transaction).

Even if these specific TE scenarios are read-only, if a driver does implicit batching, that means that in non-read-only workflows, users may get earlier side effects silently rolled back on later errors, which seems like very problematic driver behavior (see above for more discussion of all this).

matt-42 commented 2 years ago

Thanks @roji for the clarification. So I understand that: 1/ Implicit inter/intra-requests batching should be forbidden in all tests. 2/ Explicit inter/intra-requests batching can be used for all SELECT requests. 3/ Only the UPDATE requests should not use inter-requests batching but can still be batched with other UPDATE queries of the same HTTP request.

Does it sound right ? If yes I'll remove implicit batching the lithium framework to comply with those rules. But it won't change the performances of the framework.

Note: intra-request batching: Batching SQL requests issued by one HTTP request. inter-request batching: Batching SQL requests issued by several concurrent HTTP requests.

roji commented 2 years ago

1/ Implicit inter/intra-requests batching should be forbidden in all tests.

That is the main point of this issue, yes; to me this is a clearly unsafe practice in a driver.

2/ Explicit inter/intra-requests batching can be used for all SELECT requests.

Explicit batching is a somewhat orthogonal question which we also touched on above, but it's probably better to keep those discussions separate (though I'm curious as to how explicit inter-request batching would look like!).

3/ Only the UPDATE requests should not use inter-requests batching but can still be batched with other UPDATE queries of the same HTTP request.

So again, explicit batching is maybe better left off as a separate conversation. But I'm not sure why we'd make a difference between SELECT and UPDATE - if we allow explicit batching for one, I'm not sure why we'd disallow it for the other (again, implicit batching should IMHO always be forbidden).

If yes I'll remove implicit batching the lithium framework to comply with those rules. But it won't change the performances of the framework.

To be sure we're talking about the same thing... Inter-request implicit batching even affects single-query scenarios such as Fortunes, since multiple concurrent HTTP requests get executed in a single PG batch (reducing overhead). So if you're doing Inter-request implicit batching, removing that should definitely affect performance.

PS I'd advise waiting for a clear decision from the TechEmpower folk about all this before actually making any changes! This is still under discussion.

matt-42 commented 2 years ago

Thanks @roji for your answer. Yes we have the same inter-request definition. And yes if I remove inter-request batching in read-only tests it will affect performances, but I thought that is was safe to use batching in those scenarios as long as it is explicit.

I'll wait for the TFB team to make a decision on this. In the meantime I'll work on the explicit-off-by-default batching implementation in lithium pgsql driver.

msmith-techempower commented 2 years ago

@msmith-techempower I'm not sure what kind of verifications you guys perform... In order to observe this, one would have to send one command with side-effects (e.g. insert some row), batched before another (unrelated) command which fails. The expected behavior is for the side-effects from the first command to persist, but the implicit batching described above those side-effects silently disappear. This sounds like a special scenario that would have to be coded against each driver, so I'm not sure it's relevant for verification...

@roji We do a verification step to ensure that all of our verifiable rules are obeyed by framework maintainers/contributors. As a really simple example, for the updates test, we run the application as one would in the benchmark but with N requests made, we then check the (not sure the right word here...) analytics (???) of the specific database to see that a plausible number of updates (because it's random) were attempted and that a higher-than-some-threshold of rows were indeed updated. Since everything is run in a container and not persisted, this is a simple verifiable way to automatically test that implementations are written in good faith.

I am a bit out of my depth on the database side here, but from your explanation it sounds like we could come up with an example situation that we would expect to act one way, run it, then check to see that it did (or did not act the way that would show that this batching scheme is in use), and pass/fail based on it.

roji commented 2 years ago

@msmith-techempower thanks for the info... It's definitely possible to do some sort of verification, though in this case it may be more difficult than in other cases... But I'd be happy to help think about this more if we go in that direction.

billywhizz commented 2 years ago

just-js author here.

imho, the rule seems very much designed to stop people running multiple queries as one (e.g. using a union query for multiiple selects) rather than saying anything specific about when transactions should be committed. the checks in the tfb verification process also don't explicitly enforce any rules related to this and all the frameworks you mention are in compliance with the rules and verifications as they currently stand.

so, the question is i think should the rules be changed? personally, i am not in favour of this. if we want to test for something other than peak performance it might be better to add some more complex and realistic tests that are more difficult to "game"? right now the tests seem to be mostly concerned with peak performance, specifically raw RPS, which is a very limited metric on it's own. you can get a much better picture with the detail shown here i think.

one thing i discovered in my investigations for this is that enforcing a sync on every sql statement means the results are flushed onto the wire at the server process for every sync coming from the client, so you end up putting significantly more load on both sides of the connection.

so, one sync for every 10 selects we see this Screenshot from 2022-02-09 00-26-33

and with a sync for every select we see this Screenshot from 2022-02-09 00-25-28

so, this optimisation i think is a valid one if you want to see what "peak performance" looks like in these particular tests.

happy to make any changes required if people agree otherwise. the just-js code here is very much pre-alpha and i am working on more robust, full-featured libaries when i can find the time. it's certainly useful to have this discussion about the issues involved in the approach.

roji commented 2 years ago

@billywhizz my views on this are written above, but here's a tight summary.

First, note that this issue isn't about disallowing Sync-batching per-se: it's specifically about disallowing implicit Sync-batching. Whether or not it's OK to do explicit Sync-batching for the queries of a single HTTP query is IMHO another question.

Yes, the goal is to compare peak performance, but we're talking about a technique which is inherently unsafe in a database driver (once again, since a Sync-batch is an implicit transaction, the effects of earlier commands which succeeded can be silently rolled back by later unrelated commands which happen to fail). Assuming we agree on this being unsafe and inappropriate in real-world driver usage, measuring this in TFB becomes not just meaningless, but dangerous: it pushes drivers to implement problematic behavior which can cause silent data loss for actual users, and makes safe drivers look slow.

one thing i discovered in my investigations for this is that enforcing a sync on every sql statement means the results are flushed onto the wire at the server process for every sync coming from the client, so you end up putting significantly more load on both sides of the connection.

I noticed that too, and I think the right thing to do is to be in touch with the PostgreSQL hackers, so that this gets optimized on their side (I'd be happy to start that conversation with you). But IMHO we shouldn't be cutting safety corners because PostgreSQL is slower on the safe path.

billywhizz commented 2 years ago

so, i think we can summarise the critique as

i agree with these and would be happy to see them added to the requirements for the current tests.

@roji as you say, batching of commands "in general" is a different issue and i personally feel it should be allowed. it's a useful optimisation and is not unsafe as long as the database driver handles the errors and surfaces them to the calling program correctly and the calling program explicitly opts into this behaviour.

so, i don't think we should enforce sync on every sql command. if you look at multi-query test requirements they boil down to this when N is, for example, 20:

get 20 random rows from the database as individual sql commands
coalesce the results and return them to the caller

imho, it's perfectly acceptable here to put those 20 queries on the wire as a "batch" with a single sync at the end. they will still be executed in the server process (1 process per connection) as 20 individual sql queries but likely on an optimized code path and without the need to flush the socket for each individual result. this flushing causes huge syscall overhead at scale.

finally, i've gone back to look at the just-js code for techempower (it's quite out of date now but my latest code is not ready to release yet) and it seems that it violates neither of the constraints above.

you can see here the multi query will only add a sync on every 20th query in the batch of queries for the current request. but here you can see it always sends a sync on the final query in the batch so there is no sharing overlap of batches across different http requests. also, the decision to sync is being made in the "calling program" written specifically for the techempower tests, not inside the database driver code. so, i think adding the constraints above will not require any changes and should not affect the performance in the tests.

i'll hopefully have a new release out soon with a much nicer api and better/some error handling etc. which will likely be slower than the current implementation.

billywhizz commented 2 years ago

@matt-42 i did some further investigation and from what i can see the cross-request batching is also happening on the single query test and the fortunes test. just-js sends a sync with every sql command for these tests but from what i can see on the wire, lithium-postgres-batch seems to send large batches of bind/execs and then separate sync messages at some kind of regular interval.

this is what lithium traffic looks like lithium

and just-js just

i still feel personally that this usage is not against the rules and i think it's useful to see what the peak performance case looks like even if it is cutting some corners - it's clearly labelled as a separate implementation. there's nothing stopping all maintainers submitting heavily optimized versions of their tests too and i don't really think the rankings themselves heavily influence people's decisions on what to use as the frameworks at the top are all fairly niche and i think everyone realises that. people take a lot more things into consideration than just RPS when choosing a platform/framework so i'm not sure where all the concern from MSFT folks is coming from.

happy to abide by any changes made by techempower admins of course and happy to help with any further investigations or attempts to improve the verification process. it should be possible to tighten the current checks by verifying the number of txns for each run is same as number of http responses received - so, at least one txn per http response received, no? this would disallow cross request batching for single and fortunes and would allow per request batching for the multi and update queries and make cross-request batching pretty much impossible on those too.

matt-42 commented 2 years ago

Hi @billywhizz ! Thanks for the investigation. Lithium is doing cross request batching I'm working on cleaning up the API to make it explicit and off by default, but I think that for read only scenarios like single and fortunes it is a pretty nice optimization and it don't see how using cross request batching there is dangerous. It is only dangerous for the update requests since like @roji explained, successful update requests can be silently rolled back in case of an error.

roji commented 2 years ago

@matt-42

I think that for read only scenarios like single and fortunes it is a pretty nice optimization and it don't see how using cross request batching there is dangerous.

The point is that a driver generally can't know whether user SQL is going to be read-only or not (see discussion above). So even if we know as users that a certain scenario is read-only, implicit batching as a driver feature is not something that can be implemented safely for real-world usage.

@billywhizz

it should be possible to tighten the current checks by verifying the number of txns for each run is same as number of http responses received - so, at least one txn per http response received, no?

That's a pretty good idea for verification! (/cc @msmith-techempower) Yes, in e.g. Fortunes the number of transactions should be equal to the number of HTTP requests, any driver doing cross-request batching would violate that and be easily detectable. This seems like it can be accessed via pg_stat_database.xact_commit (docs).

billywhizz commented 2 years ago

That's a pretty good idea for verification! (/cc @msmith-techempower) Yes, in e.g. Fortunes the number of transactions should be equal to the number of HTTP requests, any driver doing cross-request batching would violate that and be easily detectable. This seems like it can be accessed via pg_stat_database.xact_commit (docs).

i've already said i am happy to leave rules as they are and leave these decisions up to individual maintainers but if folks think this is a good verification to add i think it would be possible without lots of work - should be able to get the total number of http responses from wrk and total txns from the db stats.

it could cause a lot of disruption and failing tests for maintainers though. i'll leave it up to @msmith-techempower and @nbrady-techempower to decide what is best and happy to help out with a PR if you do want to make changes along these lines.

matt-42 commented 2 years ago

The point is that a driver generally can't know whether user SQL is going to be read-only or not (see discussion above). So even if we know as users that a certain scenario is read-only, implicit batching as a driver feature is not something that can be implemented safely for real-world usage.

Sorry for not being precise I meant explicit batching as a driver feature, not implicit. This way the driver does not have to guess the read only scenarios. The user of the framework manually enable them if he considers that is is safe.

So I would say that frameworks can implement cross-request batching and use them in the single and fortunes tests as long as it is explicit.

roji commented 2 years ago

@matt-42 we've used "explicit batching" to refer to application code explicitly using a batching API in call sites where it wants to send batches. You seem to be talking about allowing users to opting into implicit batching (since I'm not sure what cross-request explicit batching would even mean). That would mean that the application must be 100% read-only, since the driver would be batching across everything, without any specific gesture in specific call sites. That may be safe in some applications, although even in that scenario there are issues.

I get the drive to do "ultimate perf" here, and to get to the top of the charts - but at some point it's good to pause and ask what features we're building as drivers authors, and whether they make sense in the real world. Applications which are 100% read-only aren't the norm, and even there, this really is a disaster waiting to happen in the real world as someone unknowingly adds some non-readonly-workload (or whatever) and data updates silently disappears. It would be a pity if the benchmarks ended up drivers to go in this direction.

matt-42 commented 2 years ago

Sorry again for not being clear, I was also talking about explicit batching (not about opting into implicit batching). I completely agree with you when you say that implicitly batching everything is error-prone and should not be implemented.

I meant that only when the user know what he is doing (i.e read-only scenarios) it is ok to let him use explicit batching, especially when it can lead to a much better utilization of server resources.

Brar commented 2 years ago

I think what @roji is referring to with explicit batching is an API in the data access library that you can use to explicitly create batches like the one in the last example in my comment above.

Opting into implicit batching would be more like the two examples before that but with a global API to enable or disable batching.

billywhizz commented 2 years ago

there's a very good breakdown of the postgres wire protocol here for anyone implementing drivers.

billywhizz commented 2 years ago

@roji @Brar @matt-42 @sebastienros @nbrady-techempower
i've been pondering this issue and have been doing some research, testing and benchmarking over recent days and wanted to share my thoughts and make a further case for leaving the requirements as-is and allowing the "cross-request" batching discussed above.

from the investigations i have done this kind of cross-request batching has little to no risk involved in it as long as the following conditions are true:

if these conditions are true, then the number of possible errors that could occur which would mean a "batch" of queries are rolled back is minimal. in fact, i have been unable to make any error occur other than some kind of internal database error or protocol violation which would result in the database connection being terminated completely.

on my laptop with 8 cores running the latest framework benchmarks i see a 56% improvment for the single query test and 31% improvement for fortunes using the cross request batching technique. these are significant performance gains with little or no loss of security or robustness as long as the criteria above are met and any errors are handled and surfaced correctly to the calling application.

if an error did occur in the middle of a "batch" then the "previous results" returned before the error are still valid at the time they were returned as long as there were no side effects in the queries that selected them, which there aren't in any of the techempower tests.

as such, i feel it would be a pity to rule out the ability to use this optimization - it would effectively hobble the frameworks that are able to use it for no good reason.

vonzshik commented 2 years ago

if these conditions are true, then the number of possible errors that could occur which would mean a "batch" of queries are rolled back is minimal. in fact, i have been unable to make any error occur other than some kind of internal database error or protocol violation which would result in the database connection being terminated completely.

That's definitely true for a benchmark-oriented driver since you have specific requirements which do satisfy your conditions. And that's definitely not true for a general-purpose driver since everyone has different expectations. There is, of course "I want my queries to run fast and not fail" which everyone wants, but saying that a driver isn't going to support calling functions with side effects to make queries run faster in extreme cases doesn't look that great.

And if we do allow this, where exactly we're going to draw the line between being acceptable or not? For example, having a select-only supporting driver might make sense for someone. There is even benchmarks (e.g. Fortunes) which do select-only queries. I have no doubt that that specific driver is going to be on the top of these benchmarks. And that's fine if the only thing we want is to attempt to measure the raw difference between different platforms and providers. That's not going to works if we want users to decide the platform/provider they're going to use based on the benchmark results since every single driver on the top is going to take shortcuts (which usually are not publicized and are going to blow in your face eventually).

Even not taking such an extreme example into account, I have no doubt in my mind that every single one of us here can make a 'raw' provider (I actually have one for testing potential optimizations and platform's overhead) which does satisfy every single benchmark requirements, but forgoes some of the things you expect from a driver (for example, the driver I linked doesn't check whether a database returned an error, this has to be handled by a user). Again, this doesn't make it useful either to us or to users (due to extreme requirements you have to have to use these drivers).

To summarize, having a general-purpose driver optimize some use cases with explicit user's approval is great. Whether we should have separate benchmarks (or mark these contenders with a tag) is debatable. Normalizing such optimization by saying that there is "little to no loss of security or robustness" will only encourage people to create drivers to abuse benchmarks in a race to the top. While I do work on PG driver in my spare time, I'm definitely not an expert and as such, I would rather be a bit more conservative while dealing with a potential data loss or even data corruption.