Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.47k stars 4.8k forks source link

[BUG] TableClient.DeleteAsync returns while the table still can be queried #23600

Closed dzendras closed 3 years ago

dzendras commented 3 years ago

Describe the bug TableClient.DeleteAsync returns successfully while the table still can be queried shortly after.

Expected behavior After successfully awaiting TableClient.DeleteAsync, any calls to QueryAsync are supposed to fail with ErrorCode == TableErrorCode.TableNotFound.

Actual behavior (include Exception or Stack Trace) About to delete tables... [aTable3] Success after 1 attempts [aTable0] Success after 1 attempts [aTable4] Success after 1 attempts [aTable5] Success after 1 attempts [aTable1] Success after 1 attempts [aTable2] Success after 1 attempts All tables have been deleted. About to create tables... All tables have been created. About to delete tables... [aTable0] Success after 1 attempts [aTable3] Success after 1 attempts [aTable4] Success after 1 attempts [aTable5] Success after 1 attempts [aTable1] Success after 1 attempts [aTable2] Success after 2 attempts At least one table exists, while all should have been deleted

To Reproduce Run the attached RecreateTableRepro.zip after having provided a valid connection string to a Table Storage.

Environment: Windows 10 Pro, .NET 5.0/.NET 4.8

jsquire commented 3 years ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

christothes commented 3 years ago

Hi @dzendras - I tried to reproduce this using the following simplified code but was not able to.

var svc = new TableServiceClient(new Uri("https://<yourAccount>.table.core.windows.net"), new DefaultAzureCredential());
var table = svc.GetTableClient("testingFoo");
await table.CreateIfNotExistsAsync();
Console.WriteLine("Table created");

await foreach (var e in svc.QueryAsync())
{
    Console.WriteLine($"{e.Name} Table found");
}

var response = await table.DeleteAsync();
Console.WriteLine($"Table Deleted: {response.Status}");

while (true)
{
    try
    {
        // Should throw
        await foreach (var e in table.QueryAsync<TableEntity>())
        {}
        // Should not write anything to the console
        await foreach (var e in svc.QueryAsync())
        {
            Console.WriteLine($"{e.Name} Table found");
        }
        // Should not get here
        Console.WriteLine("Table Query successful");
        await Task.Delay(1000);
    }
    catch (RequestFailedException ex)
    {
        Console.WriteLine("*** FAILED ***");
        Console.WriteLine(ex.ErrorCode);
        break;
    }
}

Result:

Table created
testingFoo Table found
Table Deleted: 204
*** FAILED ***
TableNotFound

As a side note, if this behavior does reproduce, it would be something we'd need to take up with the service team, as the client SDK doesn't control the service side deletion behavior.

feedback

dzendras commented 3 years ago

Hi @christothes

I wish you ran my repro. It showed that subsequent calls to TableClient.QueryAsync may sequentially 1. throw and then 2. not throw, after having deleted the queried table. I've modified yours to better display the issue with as little extra code as possible. The output I get is as follows:

Table Table3 created
Table Table1 created
Table Table0 created
Table Table5 created
Table Table2 created
Table Table4 created
Table5 Table found as expected
Table2 Table found as expected
Table1 Table found as expected
Table3 Table found as expected
Table4 Table found as expected
Table0 Table found as expected
Table Deleted: 204
Table Deleted: 204
Table Deleted: 204
Table Deleted: 204
Table Deleted: 204
Table Deleted: 204
Table Table4 inexistence confirmed after 1 attempt(s)
Table Table3 inexistence confirmed after 1 attempt(s)
Table Table1 inexistence confirmed after 1 attempt(s)
Table Table0 inexistence confirmed after 1 attempt(s)
Table Table5 inexistence confirmed after 1 attempt(s)
Table Table2 inexistence confirmed after 1 attempt(s)
That was totally UNEXPECTED!The specified table is being deleted. Try operation later.
RequestId:86db4598-f002-0026-39a4-9d73fb000000
Time:2021-08-30T13:37:38.7858219Z
Status: 409 (Conflict)
ErrorCode: TableBeingDeleted

Content:
{"odata.error":{"code":"TableBeingDeleted","message":{"lang":"en-US","value":"The specified table is being deleted. Try operation later.\nRequestId:86db4598-f002-0026-39a4-9d73fb000000\nTime:2021-08-30T13:37:38.7858219Z"}}}

Headers:
Cache-Control: no-cache
Transfer-Encoding: chunked
Server: Windows-Azure-Table/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: 86db4598-f002-0026-39a4-9d73fb000000
x-ms-client-request-id: b554d2ce-400d-4dc9-81a9-cde0209bd7fc
x-ms-version: REDACTED
X-Content-Type-Options: REDACTED
Date: Mon, 30 Aug 2021 13:37:38 GMT
Content-Type: application/json; odata=minimalmetadata; streaming=true; charset=utf-8

That was totally UNEXPECTED!The specified table is being deleted. Try operation later.
RequestId:6480b101-f002-0019-3da4-9dbb58000000
Time:2021-08-30T13:37:38.7908136Z

However I have managed to get such an exception: System.InvalidOperationException: '[Table0] Checks show inconsistent results! ServiceClient = False, TableQuery = True' I totally understand that timing is the key here and this is a normal behavior and definitely NOT a bug. I've introduced this check for informational purposes only.

class TableRepro
    {
        public static async Task Start(string connectionString)
        {
            const int totalRuns = 2;
            var tables = Enumerable.Range(0, 6)
                .Select(index => $"Table{index}")
                .Select(tableName => new TableRepro(tableName))
                .Select(x => x.RecreateAndCheckIfExists(connectionString));

            for (int i = 0; i < totalRuns; i++)
            {
                await Task.WhenAll(tables.ToArray());
            }
        }

        private readonly string tableName;

        private TableRepro(string tableName)
        {
            this.tableName = tableName;
        }

        private async Task RecreateAndCheckIfExists(string connectionString)
        {
            try
            {
                var svc = new TableServiceClient(connectionString);
                var table = svc.GetTableClient(tableName);

                async Task<bool> TableExistsUsingServiceClientAsync()
                {
                    await foreach (var e in svc.QueryAsync())
                    {
                        if (e.Name == tableName)
                        {
                            return true;
                        }
                    }
                    return false;
                }
                async Task<bool> TableExistsUsingTableQueryAsync()
                {
                    try
                    {
                        await foreach (var e in table.QueryAsync<TableEntity>())
                        {
                        }
                        return true;
                    }
                    catch (RequestFailedException e) when (e.ErrorCode == TableErrorCode.TableNotFound)
                    {
                        return false;
                    }
                }

                async Task<bool> TableExistsUsingBothChecks()
                {
                    var usingServiceClient = await TableExistsUsingServiceClientAsync();
                    var usingTableQuery = await TableExistsUsingTableQueryAsync();

                    if (usingServiceClient ^ usingTableQuery)
                    {
                        throw new InvalidOperationException($"[{table.Name}] Checks show inconsistent results! ServiceClient = {usingServiceClient}, TableQuery = {usingTableQuery}");
                    }
                    return usingServiceClient;
                }

                await table.CreateAsync();
               Console.WriteLine($"Table {table.Name} created");

                if (await TableExistsUsingBothChecks())
                {
                    Console.WriteLine($"{table.Name} Table found as expected");
                }

                var response = await table.DeleteAsync();
                Console.WriteLine($"Table Deleted: {response.Status}");

                int attemptNumber = 1;
                while (await TableExistsUsingBothChecks())
                {
                    Console.WriteLine($"Table {table.Name} still exists...");
                    attemptNumber++;
                }
                Console.WriteLine($"Table {table.Name} inexistence confirmed after {attemptNumber} attempt(s)");
            }
            catch (Exception e)
            {
                Console.WriteLine("That was totally UNEXPECTED!" + e.Message);
                throw;
            }
        }
    }

This is kind of in line with my first repro. Using queries does not guarantee that a table has been deleted. The previous SDK offered a method ExistsAsync on CloudTable that did the job. Since it's missing I tried to implement it using queries, but it's not enough. So the question is: how do I make sure that a table has been deleted? I can wait and retry, but I need a method that's trustworthy and thus deterministic.

christothes commented 3 years ago

Hi @dzendras - Thanks for the repro code. One interesting thing to note is that the inconsistency exception in your example is also reproduceable with the older client if you compare the result of CloudTable.Exists with the successful result from CloudTable.CreateQuery<DynamicTableEntity>().Execute().

I looked a bit further into the behavior of the older client's Exists() method and it is doing a single table resource GET request for the given table name. The new client doesn't have this method defined. I think the solution here would be to implement an Exists method in the new client, which calls the single table resource URI.

Another item to note: The fact that querying the table is possible after it is marked for deletion with both clients is definitely against the documented behavior of the service. According to the REST API documentation, which is what both of these clients interact with to perform service operations, any operation against a table that is in the process of being deleted should fail with a 409 response. See (https://docs.microsoft.com/en-us/rest/api/storageservices/delete-table#remarks)

christothes commented 3 years ago

@dzendras BTW, what is your use case for the Exists() method? I only ask because if you are using it for client side logic that is timing dependent, this could create a race condition.

Ex. Time Action Result
0 Client 1: Exists() returns true
0 Client 2: Delete Table success
1 Client 1: Do something assuming table exists fails or unexpected behavior
dzendras commented 3 years ago

Hi @christothes,

Having the method Exists(Async) implemented would be great. If you could log an issue for this, that would be much appreciated. As a side note, I was a bit underwhelmed about the lack of useful classes/methods that the old SDK has ( #19881 ). I was forced to write some ugly reflection code to get to the internal classes dealing with connection strings for the SAS token purposes. Going back to the issue, I am aware that the exists checks inconsistency is not a fault of any SDK as it's a matter of timing. Fortunately, deleting tables is not a common scenario for me. One of integration tests verifies it and that's how we discovered the issue. The most common scenario is to check whether a table exists, and if not, create it. For that purpose the QueryAsync workaround should suffice.

christothes commented 3 years ago

Having the method Exists(Async) implemented would be great. If you could log an issue for this, that would be much appreciated.

We're still evaluating if this additional method is needed. It's a trade off between convenience and encouraging scenarios that introduce race conditions.

As a side note, I was a bit underwhelmed about the lack of useful classes/methods that the old SDK has ( #19881 ). I was forced to write some ugly reflection code to get to the internal classes dealing with connection strings for the SAS token purposes.

Thanks for the feedback. Could you talk more about the scenario that forced you to use reflection? Most of the items in the issue you linked have been addressed. If there are compelling scenarios that remain unaddressed we'd be happy to fix them.

The most common scenario is to check whether a table exists, and if not, create it. For that purpose the QueryAsync workaround should suffice.

Why isn't CreateIfNotExists sufficient for this scenario? It's fewer steps and doesn't risk a race condition.

https://github.com/Azure/azure-sdk-for-net/blob/b2bbfd69a246c805e9d3d454e093d6bee6785016/sdk/tables/Azure.Data.Tables/src/TableClient.cs#L458

dzendras commented 3 years ago

Here are two pieces of reflection code I had to write.

  1. In order to get the URI of the table. This is useful for logging purposes given that I have a bunch of storage accounts with tables named in the same way, belonging to all storage accounts. For example, Table01 in SA1 and Table01 in SA2:

    private static readonly Lazy<FieldInfo> EndpointFieldInfo = new(() => typeof(TableClient)
            .GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
            .Single(x => x.Name == "_endpoint"));
    
        public static Uri Uri(this TableClient tableClient)
        {
            var uriFieldInfo = EndpointFieldInfo.Value;
            var baseUri = (Uri) uriFieldInfo.GetValue(tableClient);
            return new Uri(
                baseUri,
                baseUri.IsLoopback ? $"{tableClient.AccountName}/{tableClient.Name}" : tableClient.Name);
        }
  2. This bit is required to generate a SAS token:

    static class ConnectionStringHelper
    {
        // Hacky way, yet the only one, to get the credentials we need
        // Called after the the TableServiceClient constructor, so we are guaranteed to have the valid connection string, hence no null checks etc.
        // https://github.com/Azure/azure-sdk-for-net/issues/11590
    
        public static TableSharedKeyCredential GetSharedKeyCredential(string connectionString)
        {
            var tableConnectionStringType = Type.GetType("Azure.Data.Tables.TableConnectionString, Azure.Data.Tables");
    
            // ReSharper disable PossibleNullReferenceException
            var method = tableConnectionStringType.GetMethod("TryParse", BindingFlags.Public | BindingFlags.Static);
            //https://stackoverflow.com/questions/569249/methodinfo-invoke-with-out-parameter
            object[] parameters = { connectionString, null };
            method.Invoke(null, parameters);
            object parsed = parameters[1];
    
            var credentialsProperty = tableConnectionStringType.GetProperty("Credentials", BindingFlags.Public | BindingFlags.Instance);
            return (TableSharedKeyCredential)credentialsProperty.GetValue(parsed);
            // ReSharper restore PossibleNullReferenceException
        }
    }

    Here's the code that uses the snippet above:

    public string GenerateSasToken(TimeSpan expiresAfter) =>
            this.tableServiceClient.GetSasBuilder(
                TableAccountSasPermissions.List | TableAccountSasPermissions.Read,
                TableAccountSasResourceTypes.Container | TableAccountSasResourceTypes.Object | TableAccountSasResourceTypes.Service,
                DateTimeOffset.UtcNow.Add(expiresAfter)
            ).ToSasQueryParameters(ConnectionStringHelper.GetSharedKeyCredential("<connectionString>")).ToString();

I think both cases boil down to a bit counterinuitive behaviour - the SDK allows us to pass in a connection string which contains all the information we need, yet the SDK does not expose any of it afterwards. I could understand why ExistsAsync is disputable, but I can't find any good reason for the above pieces of information. Surely I could parse the connection string by myself and retrieve the information, but I don't want to spend time writing the code that you had already written.

Even though these are rather minor lackings, they contribute to a rather poor migration experience. On top of this, I took some time to write some unit tests that document the missing features and weird behavior. If I upgrade and they are fixed in the latest version of the SDK, then my tests will notify me. I just wish there wasn't the need to write them.

Regarding the most common scenario, we have 2 separate workflows:

  1. When the app starts, create all required tables if they don't exist. We create a table per month, so this code is used constantly.
  2. A healthcheck that verifies that the tables for the current and the next months have been created. By its nature it should only check not fix if it fails.
dzendras commented 3 years ago

I have just realized that CreateIfNotExistsAsync might actually be bugged. I don't have a repro for this. I just had a look at its code and came up with this conclusion. The method claims that it returns null if the table exists already and the expectation is that after the method returns the table exists and can be used. Internally, the method checks if an attempt to create the table threw RequestFailedException with ex.Status equal to Conflict. If it did, then null is returned. However, the same error code is returned when a table is being deleted (plus ex.ErrorCode == TableErrorCode.TableBeingDeleted). So what happens when I delete a table and then immediately attempt to recreate it using the method above? CreateIfNotExistsAsync is going to return null, making me think that a table had already existed, while in fact the table could not be created as it was being deleted. So I end up without the table.

christothes commented 3 years ago

Here are two pieces of reflection code I had to write.

  1. In order to get the URI of the table.

I think adding Uri as a property on the TableClient and TableServiceClient makes sense. We'll add it shortly.

I think both cases boil down to a bit counterinuitive behaviour - the SDK allows us to pass in a connection string which contains all the information we need, yet the SDK does not expose any of it afterwards.

Could you explain why you'd need to recall the secret in your code if you're able to make authorized requests and generate SAS tokens?

  1. When the app starts, create all required tables if they don't exist. We create a table per month, so this code is used constantly.

I think CreateIfNotExists solves this problem (assuming we work out the TableBeingDeleted behavior mentioned below).

  1. A healthcheck that verifies that the tables for the current and the next months have been created. By its nature it should only check not fix if it fails.

If there is a possibility that a table can be randomly deleted, a health check checking for the existence of a resource is only as reliable as it is frequent. Assuming such a health check is periodic, some check subsequent to the deletion will definitely fail using any of the approaches mentioned in this issue.

I have just realized that CreateIfNotExistsAsync might actually be bugged. I don't have a repro for this. I just had a look at its code and came up with this conclusion. The method claims that it returns null if the table exists already and the expectation is that after the method returns the table exists and can be used. Internally, the method checks if an attempt to create the table threw RequestFailedException with ex.Status equal to Conflict. If it did, then null is returned. However, the same error code is returned when a table is being deleted (plus ex.ErrorCode == TableErrorCode.TableBeingDeleted). So what happens when I delete a table and then immediately attempt to recreate it using the method above? CreateIfNotExistsAsync is going to return null, making me think that a table had already existed, while in fact the table could not be created as it was being deleted. So I end up without the table.

In this scenario, do you think the best result would be for CreateIfNotExists to throw?

dzendras commented 3 years ago

I think both cases boil down to a bit counterinuitive behaviour - the SDK allows us to pass in a connection string which contains all the information we need, yet the SDK does not expose any of it afterwards.

Could you explain why you'd need to recall the secret in your code if you're able to make authorized requests and generate SAS tokens?

I don't know how can I generate a SAS token/URI given a connection string and without having to use my relfection code. ToSasQueryParameters requires TableSharedKeyCredential, which is embedded into the connection string.

  1. When the app starts, create all required tables if they don't exist. We create a table per month, so this code is used constantly.

I think CreateIfNotExists solves this problem (assuming we work out the TableBeingDeleted behavior mentioned below).

True, that's not an issue.

  1. A healthcheck that verifies that the tables for the current and the next months have been created. By its nature it should only check not fix if it fails.

If there is a possibility that a table can be randomly deleted, a health check checking for the existence of a resource is only as reliable as it is frequent. Assuming such a health check is periodic, some check subsequent to the deletion will definitely fail using any of the approaches mentioned in this issue.

It is unlikely as deletion of a table is a very rare scenario. The health check is mostly executed to check if TableStorage service is up and running. Checking for a table's existence was most likely chosen as the easiest/cheapest check you can have.

I have just realized that CreateIfNotExistsAsync might actually be bugged. I don't have a repro for this. I just had a look at its code and came up with this conclusion. The method claims that it returns null if the table exists already and the expectation is that after the method returns the table exists and can be used. Internally, the method checks if an attempt to create the table threw RequestFailedException with ex.Status equal to Conflict. If it did, then null is returned. However, the same error code is returned when a table is being deleted (plus ex.ErrorCode == TableErrorCode.TableBeingDeleted). So what happens when I delete a table and then immediately attempt to recreate it using the method above? CreateIfNotExistsAsync is going to return null, making me think that a table had already existed, while in fact the table could not be created as it was being deleted. So I end up without the table.

In this scenario, do you think the best result would be for CreateIfNotExists to throw? I think so. If the SDK knows that a table is during the deletion process, then it's best to throw to let the user know. An alternative would be to retry until the tables gets deleted, but I guess that would be too much for the SDK as I consider it a thin client. So I would change the current catch block to have a "when" condition checking for ex.ErrorCode to be equal to whatever code is returned during normal circumstances. Having said that, the scenario is pretty unlikely, though.

christothes commented 3 years ago

I don't know how can I generate a SAS token/URI given a connection string and without having to use my relfection code. ToSasQueryParameters requires TableSharedKeyCredential, which is embedded into the connection string.

You can accomplish this with the GenerateSasUri method. It allows you to generate a SaS given a TableClient or TableServiceClient constructed with a connection string or TableSharedKeyCredential. See the sample here

The health check is mostly executed to check if TableStorage service is up and running. Checking for a table's existence was most likely chosen as the easiest/cheapest check you can have.

It sounds like this health scenario doesn't need to worry about whether any one particular table exists then. It's possible to validate service availability in a number of other ways that are much more deterministic.

In this scenario, do you think the best result would be for CreateIfNotExists to throw?

I think so. If the SDK knows that a table is during the deletion process, then it's best to throw to let the user know. An alternative would be to retry until the tables gets deleted, but I guess that would be too much for the SDK as I consider it a thin client. So I would change the current catch block to have a "when" condition checking for ex.ErrorCode to be equal to whatever code is returned during normal circumstances. Having said that, the scenario is pretty unlikely, though.

Sounds good - I think this change makes sense.

Thank you for the great feedback and constructive discussion!

dzendras commented 3 years ago

The health check is mostly executed to check if TableStorage service is up and running. Checking for a table's existence was most likely chosen as the easiest/cheapest check you can have.

It sounds like this health scenario doesn't need to worry about whether any one particular table exists then. It's possible to validate service availability in a number of other ways that are much more deterministic.

What ways do you have in mind?

christothes commented 3 years ago

The health check is mostly executed to check if TableStorage service is up and running. Checking for a table's existence was most likely chosen as the easiest/cheapest check you can have.

It sounds like this health scenario doesn't need to worry about whether any one particular table exists then. It's possible to validate service availability in a number of other ways that are much more deterministic.

What ways do you have in mind?

Any successful operation with the TableServiceClient against the account would validate that the service exists and is available.

dzendras commented 3 years ago

I don't know how can I generate a SAS token/URI given a connection string and without having to use my relfection code. ToSasQueryParameters requires TableSharedKeyCredential, which is embedded into the connection string.

You can accomplish this with the GenerateSasUri method. It allows you to generate a SaS given a TableClient or TableServiceClient constructed with a connection string or TableSharedKeyCredential. See the sample here

I tried to use the GenerateSasUri method, but came across another inconsistency in the SDK related to using connection strings.

public static string GenerateSasToken(TableServiceClient tableServiceClient, TimeSpan expiresAfter) =>
tableServiceClient.GenerateSasUri(
TableAccountSasPermissions.Read | TableAccountSasPermissions.Add | TableAccountSasPermissions.Update | TableAccountSasPermissions.Delete | TableAccountSasPermissions.List | TableAccountSasPermissions.Write,
TableAccountSasResourceTypes.Container | TableAccountSasResourceTypes.Object | TableAccountSasResourceTypes.Service,
DateTimeOffset.UtcNow.Add(expiresAfter)
).Query;

got me this: System.ArgumentNullException HResult=0x80004003 Message=Value cannot be null. Parameter name: uri Source=Azure.Data.Tables StackTrace: at Azure.Data.Tables.TableUriBuilder..ctor(Uri uri) at Azure.Data.Tables.TableServiceClient.GenerateSasUri(TableAccountSasBuilder builder)

christothes commented 3 years ago

This was recently fixed and will be in next week's release. https://github.com/Azure/azure-sdk-for-net/pull/23446

ghost commented 3 years ago

Hi @dzendras. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

dzendras commented 3 years ago

OK, cool. Thanks!

ghost commented 3 years ago

Hi @dzendras, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.