dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
853 stars 286 forks source link

ReadAsync CanellationTokenSource performance problem #2408

Open TomislavBaljint opened 8 months ago

TomislavBaljint commented 8 months ago

Describe the bug

Reading data from a table in parallel with multiple SqlDataReaders (each reading their part of a table). When a reader is iterated with a cancellation token created from CanellationTokenSource there is a significant performance drop.

Difference in performance: image

cancellation = CancellationToken.None; 
cancellation = new CancellationToken();
cancellation = new CancellationTokenSource().Token;
while (await reader.ReadAsync(cancellation).ConfigureAwait(false)) { }

To reproduce

Create the table in a MS SqlServer database. Use the TABLE_CREATE_SQL script from the code to create a 50M record table with around 1.7GB of data. Change the connection data in the code to your database. Run the benchmark.

    <PackageReference Include="BenchmarkDotNet" Version="0.13.12" />
    <PackageReference Include="Microsoft.Data.SqlClient" Version="5.2.0" />
using BenchmarkDotNet.Attributes;
using Microsoft.Data.SqlClient;

namespace SqlReadAsync;

[SimpleJob(1, 1, 1, 1, "1", false)]
public class Benchmarks
{
    private const string SERVER = "server";
    private const string DATABASE = "db";
    private const string USER = "user";
    private const string PASSWORD = "pass";
    private const string CONNECTION_STRING = $"Data Source={SERVER};Initial Catalog={DATABASE};User ID={USER};Password={PASSWORD};Trust Server Certificate=True;Pooling=False;";

    #region TABLE SQL

    private const string TABLE_CREATE_SQL = """
        DROP TABLE IF EXISTS dbo.test_table;
        WITH t1(n) AS
            (
                SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL
                    UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL
            )
            , t2(n) AS
        (
            SELECT NULL FROM t1 AS t1, t1 AS t2, t1 AS t3, t1 AS t4, t1 AS t5, t1 AS t6, t1 AS t7, t1 AS t8, t1 AS t9
        )
        , t3(n) AS
        (
            SELECT TOP 50000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t2
        )
        SELECT
            ID = ISNULL(t.n, 0),
            c1 = CAST(t.n AS VARCHAR(12)),
            d1 = DATEADD(MINUTE, t.n, '1970-01-01')
        INTO dbo.test_table FROM t3 AS t;
        ALTER TABLE dbo.test_table ADD CONSTRAINT PK_test_table PRIMARY KEY(ID);
        """;

    #endregion

    private const string SELECT_PART = "SELECT ID, c1, d1 FROM dbo.test_table";
    private readonly string[] whereParts =
    [
        " WHERE ID > 0 AND ID <= 10000000",
        " WHERE ID > 10000000 AND ID <= 20000000",
        " WHERE ID > 20000000 AND ID <= 30000000",
        " WHERE ID > 30000000 AND ID <= 40000000",
        " WHERE ID > 40000000 AND ID <= 50000000",
    ];

    public enum ReadType
    {
        Sync,
        Async,
        AsyncToken,
        AsyncTokenSource,
    }

    [Params(ReadType.Sync, ReadType.Async, ReadType.AsyncToken, ReadType.AsyncTokenSource)]
    public ReadType Type { get; set; }

    private static Task ReadSync(string? whereClause)
    {
        using var conn = new SqlConnection(CONNECTION_STRING);
        using var cmd = new SqlCommand($"{SELECT_PART}{whereClause}", conn);
        conn.Open();
        using SqlDataReader reader = cmd.ExecuteReader();

        while (reader.Read()) { }

        return Task.CompletedTask;
    }

    private static async Task ReadAsync(string whereClause, CancellationToken cancellation)
    {
        using var conn = new SqlConnection(CONNECTION_STRING);
        using var cmd = new SqlCommand($"{SELECT_PART}{whereClause}", conn) { CommandTimeout = 0 };
        conn.Open();
        using SqlDataReader reader = cmd.ExecuteReader();

        while (await reader.ReadAsync(cancellation).ConfigureAwait(false)) { }
    }

    [Benchmark]
    public async Task TestRead()
    {
        CancellationToken cancellation = CancellationToken.None;
        switch (Type)
        {
            case ReadType.Sync:
            case ReadType.Async:
                cancellation = CancellationToken.None;
                break;
            case ReadType.AsyncToken:
                cancellation = new CancellationToken();
                break;
            case ReadType.AsyncTokenSource:
                cancellation = new CancellationTokenSource().Token;
                break;
        }

        var tasks = new List<Task>();
        foreach (string wherePart in whereParts)
        {
            switch (Type)
            {
                case ReadType.Sync:
                    tasks.Add(Task.Run(() => ReadSync(wherePart)));
                    break;
                case ReadType.Async:
                    tasks.Add(Task.Run(() => ReadAsync(wherePart, cancellation)));
                    break;
                case ReadType.AsyncToken:
                    tasks.Add(Task.Run(() => ReadAsync(wherePart, cancellation)));
                    break;
                case ReadType.AsyncTokenSource:
                    tasks.Add(Task.Run(() => ReadAsync(wherePart, cancellation)));
                    break;
            }
        }
        await Task.WhenAll(tasks);
    }
}

Expected behavior

Similar performance between runs.

Further technical details

Microsoft.Data.SqlClient version: 5.2.0 .NET target: net8.0 SQL Server version: SQL Server 2019 SQL Server docker: mcr.microsoft.com/mssql/server:2019-latest
Operating system: Windows 11, Docker container

Additional context Problem was found on Microsoft.Data.SqlClient version 5.1.4, but still present.

Wraith2 commented 8 months ago

The performance difference is caused by registration and unregistration of a cancellation event handler in ReadAsync and cannot be avoided if you want to use a cancellation token which is provided by a CancellationTokenSource.

Douglasproglima commented 1 month ago

https://github.com/dotnet/SqlClient/issues/593

Wraith2 commented 1 month ago

Can you give context to your comment please? I'm not sure why this is still open. The performance difference is explained and unavoidable, unfortunately.

roji commented 1 month ago

@Wraith2 are you saying that the almost 4 second difference is just a result of registering/unregistering the event handler?

vonzshik commented 1 month ago

@roji that's the result of registering/unregistering the event handler 50 million times distributed over 5 concurrent threads. For example, locally with this repro I see the difference between 1400 ms (if CancellationToken is passed) and 11ms (if it's not).

using System.Diagnostics;

while (true)
{
    using var cts = new CancellationTokenSource();
    var sw = Stopwatch.StartNew();

    var tasks = new List<Task>();

    for (var i = 0; i < 5; i++)
    {
        tasks.Add(Task.Run(() => Callback(cts.Token)));
    }
    await Task.WhenAll(tasks);

    Console.WriteLine($"Elapsed: {sw.ElapsedMilliseconds}ms");
}

void Callback(CancellationToken cancellationToken)
{
    for (var i = 0; i < 10_000_000; i++)
    {
        using var _ = cancellationToken.Register(() =>
        {

        });
    }
}
roji commented 1 month ago

I see, thanks @vonzshik!

Wraith2 commented 1 month ago

@Wraith2 are you saying that the almost 4 second difference is just a result of registering/unregistering the event handler?

Yup, I profiled it. Slightly surprising but it's all down to the registration of the callback and administering it.