Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.26k stars 4.61k forks source link

[BUG] UploadAsync() chooses block IDs that can result in a corrupt upload when others occur at same time #45510

Open peterchase opened 1 month ago

peterchase commented 1 month ago

Library name and version

Azure.Storage.Blobs 12.21.2

Describe the bug

We sometimes have multiple processes uploading data to the same-named blob, using UploadAsync(). It doesn't matter to us which one "wins" but it is important that the blob contents that we end up with are those that correspond to one of the uploads, not a mixture of contents from different uploads. Unfortunately, a mixture is what we often get.

We believe that the reason is that, when doing a "partitioned" upload, whereby blocks are staged and then finally knitted together by committing a block list, all processes are using the same sequence of block IDs. Blocks with the same ID, but different contents, are written by the different processes.

It is important to note that the block IDs are chosen by Azure.Storage.Blobs, not by our code. Indeed, the GenerateBlockId(long offset) method contains a TODO comment: -

            // TODO #8162 - Add in a random GUID so multiple simultaneous
            // uploads won't stomp on each other and the first to commit wins.
            // This will require some changes to our test framework's
            // RecordedClientRequestIdPolicy.

This is referencing a previous report of this bug #8162, which unfortunately was closed unfixed.

Expected behavior

When multiple clients upload to the same-named blob simultaneously, the resultant content of the blob should be the exact contents of one of the uploads.

Actual behavior

The content of the blob is sometimes a mixture of the contents of the different uploads.

Reproduction Steps

Here is a small C#.Net program that I believe demonstrates the issue.

using Azure.Storage;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;

namespace UploadBug;

internal static class Program
{
  private const string AzuriteConnectionString = "UseDevelopmentStorage=true";
  private const string ContainerName = "test";
  private const string BlobName = "test.dat";
  private const int NumBytes = 1024 * 1024 * 25;
  private const int NumData = 5;
  private const int NumWriters = 5;

  static async Task Main()
  {
    await Console.Out.WriteLineAsync("Creating random data...");
    var random = new Random();
    var data = Enumerable.Range(0, NumData).Select(_ => new byte[NumBytes]).ToArray();
    foreach (byte[] bytes in data)
    {
      random.NextBytes(bytes);
    }

    await Console.Out.WriteLineAsync("Creating blob container...");
    var container = new BlobContainerClient(AzuriteConnectionString, ContainerName);
    await container.DeleteIfExistsAsync();
    await container.CreateIfNotExistsAsync();
    BlobClient blob = container.GetBlobClient(BlobName);

    using var cts = new CancellationTokenSource();

    var transferOptions = new StorageTransferOptions { MaximumTransferSize = NumBytes / 16, InitialTransferLength = NumBytes / 16 };
    var uploadOptions = new BlobUploadOptions { TransferOptions = transferOptions, Conditions = null };

    await Console.Out.WriteLineAsync($"Starting {NumWriters} writer(s)...");
    var writtenOne = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
    for (int i = 0; i < NumWriters; ++i)
    {
      Task.Run(() => UploadBlobRepeatedlyAsync(cts.Token));
    }
    await writtenOne.Task;

    await Console.Out.WriteLineAsync("Starting checker...");
    Task checkerTask = CheckBlobRepeatedlyAsync();

    await Console.Out.WriteLineAsync("Press a key to stop...");
    Console.ReadKey();
    await cts.CancelAsync();
    await checkerTask;
    return;

    async Task UploadBlobRepeatedlyAsync(CancellationToken cancellation)
    {
      try
      {
        while (!cancellation.IsCancellationRequested)
        {
          foreach (byte[] bytes in data)
          {
            using var stream = new MemoryStream(bytes);
            await blob.UploadAsync(stream, uploadOptions, cancellation);
            writtenOne.TrySetResult();
          }

          await Console.Out.WriteLineAsync("Uploaded all data.");
        }
      }
      catch (OperationCanceledException) { }
      catch (Exception e)
      {
        await Console.Error.WriteLineAsync($"Error in writer: {e}");
      }
    }

    async Task CheckBlobRepeatedlyAsync()
    {
      try
      {
        while (!cts.IsCancellationRequested)
        {
          var response = await blob.DownloadStreamingAsync();
          await using var stream = response.Value.Content;

          var buf = new Memory<byte>(new byte[10240]);
          bool[] matches = Enumerable.Range(0, data.Length).Select(_ => true).ToArray();

          int pos = 0;
          for (;;)
          {
            int numRead = await stream.ReadAsync(buf);
            if (numRead < 1)
            {
              break;
            }

            for (int i = 0; i < numRead; ++i, ++pos)
            {
              for (int j = 0; j < data.Length; ++j)
              {
                if (buf.Span[i] != data[j][pos])
                {
                  matches[j] = false;
                }
              }
            }
          }

          if (pos != NumBytes || !matches.Any(b => b))
          {
            await Console.Out.WriteLineAsync("Blob does not match any data.");
            await cts.CancelAsync();
            break;
          }

          await Console.Out.WriteLineAsync("Checked data.");
        }
      }
      catch (Exception e)
      {
        await Console.Error.WriteLineAsync($"Error in checker: {e}");
      }
    }
  }
}

Note that this is expecting the Azurite storage emulator to be running on the default ports. Alternatively, the code could be changed to use a connection string for real Azure.

Environment

.NET SDK: Version: 8.0.303 Commit: 29ab8e3268 Workload version: 8.0.300-manifests.34944930 MSBuild version: 17.10.4+10fbfbf2e

Runtime Environment: OS Name: Windows OS Version: 10.0.19045 OS Platform: Windows RID: win-x64 Base Path: C:\Program Files\dotnet\sdk\8.0.303\

Visual Studio 2022

github-actions[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

peterchase commented 1 month ago

Note that the pre-existing comment in the Microsoft code, that I mention in the bug report, is a bit misleading and understates the problem. If it was just a case of "the first to commit wins", that would be OK. But this is not the case - the blob ends up with a mixture of contents because the same block gets overwritten by multiple uploads before commit, meaning the contents of each block is an arbitrary choice of that block from any one of the simultaneous uploads and so is the eventual committed blob.