Azure / azure-cosmos-table-dotnet

.NET SDK for Azure Cosmos Table API
14 stars 6 forks source link

Significant performance degration on ExecuteQuerySegmentedAsync between Microsoft.Azure.Cosmos.Table and Microsoft.WindowsAzure.Storage #52

Open JoeSchimo opened 4 years ago

JoeSchimo commented 4 years ago

I've been researching moving from Storage Account table storage to CosmosDB table storage. Currently I am using the WindowsAzure.Storage (9.3.3) library to query data in a .net core 3.1 application. As part of this migration I have switched to the Microsoft.Azure.Cosmos.Table 1.0.7 library. I wrote the LinqPad benchmark below to compare the performance of both when doing a full table scan.

async Task Main()
{
    var timer = Stopwatch.StartNew();
    await QueryCosmosDb().ConfigureAwait(false);
    timer.Stop();
    var cosmosExecutionTime = timer.Elapsed;

    timer = Stopwatch.StartNew();
    await QueryTableStorage().ConfigureAwait(false);
    timer.Stop();
    var tableExecutionTime = timer.Elapsed;

    cosmosExecutionTime.Dump();
    tableExecutionTime.Dump();
}

public async Task QueryCosmosDb()
{
    var cosmosTableEndpoint = new Uri($"https://***.table.cosmos.azure.com:443/");
    var storageAccount = new Microsoft.Azure.Cosmos.Table.CloudStorageAccount(new Microsoft.Azure.Cosmos.Table.StorageCredentials("***", "****"), cosmosTableEndpoint);
    var client = storageAccount.CreateCloudTableClient();
    var table = client.GetTableReference("tablename");
    var query = new Microsoft.Azure.Cosmos.Table.TableQuery();
    Microsoft.Azure.Cosmos.Table.TableContinuationToken token = null;
    do
    {
        var segment = await table.ExecuteQuerySegmentedAsync(query, token).ConfigureAwait(false);
        token = segment.ContinuationToken.Dump();
    }
    while (token != null);
}

public async Task QueryTableStorage()
{
    var storageAccount = new Microsoft.WindowsAzure.Storage.CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials("***", "****"), true);
    var client = storageAccount.CreateCloudTableClient();
    var table = client.GetTableReference("tablename");
    var query = new Microsoft.WindowsAzure.Storage.Table.TableQuery();
    Microsoft.WindowsAzure.Storage.Table.TableContinuationToken token = null;
    do
    {
        var segment = await table.ExecuteQuerySegmentedAsync(query, token).ConfigureAwait(false);
        token = segment.ContinuationToken;
    }
    while (token != null);
}

The Storage Account table and CosmosDb table have an identical datasets of roughly 200k entities.

The Cosmos Table Account has a shared provision throughput of 2200 RUs.

When using the Cosmos Executor with the Microsoft.Azure.Cosmos.Table library I am getting an execution time of ~3 hours. The Storage Account table with the Microsoft.WindowsAzure.Storage library takes ~2 minutes. If I switch the Microsoft.Azure.Cosmos.Table library to use the rest executor in the Cloud Table Client I get an execution time of ~3 minutes.

Has anyone encountered similar behavior or aware of issues around empty table queries?

medevod commented 4 years ago

I encountered similar behavior, the issue might be localized in internal TableEntity implementation of Microsoft.Azure.Cosmos.Table package ( ReadEntity method)

As a workaround, i use another TableEntity class by overriding ReadEntity method. Its implementation was based on original Microsoft.WindowsAzure.Storage code.

All it's works very well but we still waiting for a fix

KyleMit commented 3 years ago

Related StackOverflow thread