aspnet / DataAccessPerformance

Benchmarks, prototypes and discussions for database access performance investigation
MIT License
116 stars 26 forks source link

Understand impact of batching #18

Open divega opened 6 years ago

divega commented 6 years ago

In general batching is considered good for performance because it can help reduce the number of roundtrips to the database, however understanding more its impact across different database could help us prioritize when to use it and the addition of APIs.

E.g. this would help inform the prioritization of

Some questions we would like to address/answer:

  1. Concatenated SQL in a SQL command vs. proper multi-statement per network packet support: Depending on the server and protocol capabilities, the first leads to either parsing and splitting the query on the client or to polluting the server's query cache. The second one you can still send multiple statements in one network roundtrip but using an API that allows to collect the individual SQL queries and parameters without stitching them together, which avoids having to parse the SQL on the client and polluting the cache on the server. How does this affect different database? We need to measure. We know that PostgreSQL could use this and that SqlClient has the basic capability but only when you use DataAdapter APIs (see https://blogs.msdn.microsoft.com/dataaccess/2005/05/19/does-ado-net-update-batching-really-do-something/), therefore things like EF Core can only use concatenated SQL approach.

  2. Multiple calls to ExecuteReader vs. one call to ExecuteReader and multiple calls NextResult: For reading scenarios batching can help too. To understand how much, we would need to measure. ADO.NET already has the right APIs but the question is how much we should be using it in our higher-level APIs. We have precedent in NHibernate's future queries. For things like EF Core this could be used automatically when we generate multiple queries, or we could come up with a similar API to future queries.

dario-l commented 6 years ago

Proper multi-statement per network packet is a good choice. We are using System.Data.SqlClient.SqlCommandSet but it is internal. Thanks to NHibernate we are using exactly this implementation.

This solution give us big improvement in performance even if we sending just few records (from 5 statements and higher is unbeatable). Unfortunately for us ASP.NET Core doesn't have that implementation.