Open peterchase opened 1 month ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.
Note that the pre-existing comment in the Microsoft code, that I mention in the bug report, is a bit misleading and understates the problem. If it was just a case of "the first to commit wins", that would be OK. But this is not the case - the blob ends up with a mixture of contents because the same block gets overwritten by multiple uploads before commit, meaning the contents of each block is an arbitrary choice of that block from any one of the simultaneous uploads and so is the eventual committed blob.
Library name and version
Azure.Storage.Blobs 12.21.2
Describe the bug
We sometimes have multiple processes uploading data to the same-named blob, using UploadAsync(). It doesn't matter to us which one "wins" but it is important that the blob contents that we end up with are those that correspond to one of the uploads, not a mixture of contents from different uploads. Unfortunately, a mixture is what we often get.
We believe that the reason is that, when doing a "partitioned" upload, whereby blocks are staged and then finally knitted together by committing a block list, all processes are using the same sequence of block IDs. Blocks with the same ID, but different contents, are written by the different processes.
It is important to note that the block IDs are chosen by Azure.Storage.Blobs, not by our code. Indeed, the
GenerateBlockId(long offset)
method contains a TODO comment: -This is referencing a previous report of this bug #8162, which unfortunately was closed unfixed.
Expected behavior
When multiple clients upload to the same-named blob simultaneously, the resultant content of the blob should be the exact contents of one of the uploads.
Actual behavior
The content of the blob is sometimes a mixture of the contents of the different uploads.
Reproduction Steps
Here is a small C#.Net program that I believe demonstrates the issue.
Note that this is expecting the Azurite storage emulator to be running on the default ports. Alternatively, the code could be changed to use a connection string for real Azure.
Environment
.NET SDK: Version: 8.0.303 Commit: 29ab8e3268 Workload version: 8.0.300-manifests.34944930 MSBuild version: 17.10.4+10fbfbf2e
Runtime Environment: OS Name: Windows OS Version: 10.0.19045 OS Platform: Windows RID: win-x64 Base Path: C:\Program Files\dotnet\sdk\8.0.303\
Visual Studio 2022