Open DaRosenberg opened 6 years ago
The proposed IAsyncDisposable
feature would be a perfect fix for this.
https://github.com/dotnet/csharplang/issues/43
Hi @DaRosenberg, thanks for reaching out!
I've logged this issue for further investigation.
@zezha-msft I have now completed testing to confirm that this did indeed account for a significant bottleneck, and that changing it to commit asynchronously nearly doubled the scalability of our service.
After that, other bottlenecks came into play that we proceeded to address. Through a combination of a few changes we made in our fork, we were able to achieve perfect scalability. I will be creating separate issues shortly for the other changes we had to make.
Usually, the way you work around async dispose is to call FlushAsync (assuming FlushAsync also causes the commit in this case).
@davidfowl Unfortunately FlushAsync()
does not commit the blocks, so that’s not an option in this case. Nor should it I think. They are semantically very different operations - flushing can be performed repeatedly while writing, whereas committing is a very final operation after which no more writes can occur.
I see CommitAsYnc is exposed directly. Is it an option to call it? Or is some other component writing that you have no control over?
@davidfowl The class is internal
. Making it public is what I suggested in my OP - this would allow us to CommitAsync()
before disposing. This is how we fixed the scalability issue in our fork, but obviously we would rather not be on a forked code base. :)
Ah hah! Yes that's bad 😄
I think we have a solution for this. CloudBlockBlob.OpenWriteAsync() should return a CloudBlobStream, which has a public CommitAsync() method - does that not work? If it does, the issue becomes one of discoverability :) It would probably be good to have OpenWriteAsync() explicitly call out this issue in the comments. I'm not sure how much that will help, though - people are probably fairly used to writing
using (Stream stream = await blob.OpenWriteAsync()) { //... }
it's entirely plausible to me that the comment would be missed. Any suggestions?
@asorrin-msft You are right - that should work without any code change actually, because BlobWriteStream
inherits from CloudBlobStream
. So from client code we should be able to do something like:
using (var stream = await blob.OpenWriteAsync())
{
// Do some writing to the stream here...
if (stream is CloudBlobStream blobStream)
await blobStream.CommitAsync();
}
The issue of discoverability is something I've also been thinking about. An XML doc comment on ´OpenWriteAsync()` will not go very far I'm afraid.
Might one consider taking this so far as to actually throw from Dispose()
if the stream has not been committed, thus always forcing the pattern of committing (asynchronously) before disposing? At least that way, it will be a fail-fast and obvious thing very early in the dev lifecycle, instead of a very elusive and hard-to-diagnose scalability problem 18 months down the road when the application is already in production...
Any plans to solve this with DisposeAsync?
@DaRosenberg
A word of caution with the using blocks around the CloudBlobStream. If there's an exception before you commit, you will end up committing partial writes or an empty stream. This would then overwrite your exist blob with either incomplete or 0 bytes of data.
Ask me how I know :(
@justinSelf I'm a bit perplexed as to how that could happen. Are you saying commits happen even before we even call CloudBlobStream.CommitAsync()
?
@DaRosenberg
Here's a code sample to show what I'm talking about. If you open a CloudBlobStream
inside of a using, and do anything that could cause an exception, you risk losing data if the blob already has data in it.
class Program
{
static async Task Main(string[] args)
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("connection string");
var blobClient = storageAccount.CreateCloudBlobClient();
var blobContainer = blobClient.GetContainerReference("test-container");
await blobContainer.CreateIfNotExistsAsync();
var blob = blobContainer.GetBlockBlobReference("test-blob");
await blob.UploadTextAsync("test");
//inspect blob - should say "test"
Console.ReadKey();
using (var stream = await blob.OpenWriteAsync())
{
throw new Exception();
}
//inspect blob - will be empty
}
}
I'm only resurrecting this thread because of searching for this issue.
@DaRosenberg Also, to directly answer your question, Dispose, as you know, will also call Commit
. So, if there's an exception inside the using block, when the stream gets disposed, it will also commit.
@justinSelf Duh, of course, stupid of me. The CommitAsync() happening as part of Dispose() was the whole reason why I opened this issue in the first place. 🤣
Which service(blob, file, queue, table) does this issue concern?
Blob
Which version of the SDK was used?
9.3.0
Which platform are you using? (ex: .NET Core 2.1)
.NET Core 2.1
What problem was encountered?
I've been troubleshooting a scalability issue with our ASP.NET Core Web API, which makes heavy use of blob storage for underlying storage.
We observed:
Long story short, through profiling I have finally pinpointed the issue to be here:
https://github.com/Azure/azure-storage-net/blob/38425e715e1bcdb4cab344bcb9b448c08bf8af5c/Lib/WindowsRuntime/Blob/BlobWriteStream.cs#L200-L220
The problem is the
CommitAsync().Wait()
. Since the dispose pattern in .NET is not async, this stream implementation of course has to block on the commit operation, and the commit is necessary to ensure that the written blocks are committed to blob storage. This blocks the thread for the duration of that commit operation, which under load leads to thread starvation.Have you found a mitigation/solution?
One solution would be for our service to do
await stream.CommitAsync();
before disposing the stream, but unfortunatelyBlobWriteStream
is marked asinternal
so this type is not visible to our code. The only solution I can think of is to makeBlobWriteStream
public, allowing client code to asynchronously commit the stream before disposing it.