Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.18k stars 4.54k forks source link

[BUG] BlobBaseClient.OpenRead(BlobOpenReadOptions options) not end reading at the buffer size #24664

Closed EverettSummer closed 3 months ago

EverettSummer commented 2 years ago

Describe the bug Please provide the description of issue you're seeing.

We have an old abstract parrellel download layer that handles Cosmos Store, ADLS, ADLSGen2 without issue. but when I want to follow the same logic with Azure Storage Blob SDK. using the OpenRead() method, I find it's not work as I expect.

Here is a code that read the stream from an offset and read the requested length of stream.

private void DownloadRegion(DownloadThreadParameters args, Region region)
        {
                long offset = region.Offset + region.CopiedLength;
                long length = region.Length - region.CopiedLength;
                // init client code is omitted
                BlobOpenReadOptions options = new BlobOpenReadOptions(true)
                {
                    BufferSize = (int) length,
                    Position = offset
                };
                using (Stream source = blobBaseClient.OpenRead(options ))
                {
                    using (Stream dest = new FileStream(args.DestinationPath, FileMode.Open, FileAccess.Write, FileShare.ReadWrite))
                    {
                        dest.Seek(offset - args.DestinationOffset, SeekOrigin.Begin);

                        if (length > 0)
                        {
                            var buffer = new byte[100000];
                            var bytesRead = FillBuffer(source, buffer);
                            int updateCount = 0;
                            while (bytesRead > 0)
                            {
                                dest.Write(buffer, 0, bytesRead);
                                region.CopiedLength += bytesRead;
                                bytesRead = FillBuffer(source, buffer);
                                if (this.nochunk && (updateCount++ % 15) == 0)
                                {
                                    UpdateCopiedSize(args);
                                }
                            }
                        }
                    }
                }
            }
            region.Finished = true;
        }

DownloadRegion will download one chunk of the blob, there will be multiple DownloadRegion run in parallel.

int FillBuffer(Stream stream, byte[] buffer)
        {
            var totalBufferBytesRead = 0;
            var lastBytesRead = 0;

            lastBytesRead = stream.Read(buffer, totalBufferBytesRead, buffer.Length - totalBufferBytesRead);
            totalBufferBytesRead += lastBytesRead;
            while (lastBytesRead > 0 && totalBufferBytesRead < buffer.Length && !interrupted)
            {
                lastBytesRead = stream.Read(buffer, totalBufferBytesRead, buffer.Length - totalBufferBytesRead);
                totalBufferBytesRead += lastBytesRead;
            }

            return totalBufferBytesRead;
        }
    }

FillBuffer will read the stream returned by OpenRead until it no longer readable which the lastBytesRead will eventually be 0.

Expected behavior What is the expected behavior?

The OpenRead will stop readable after read the buffer size.

Actual behavior (include Exception or Stack Trace) What is the actual behavior?

It will be able to read the whole size of the file.

To Reproduce Steps to reproduce the behavior (include a code snippet, screenshot, or any additional information that might help us reproduce the issue)

the main logic I have pasted, just a modification should work. be sure to prepare a large file. for example, 200MB file

Environment:

ghost commented 2 years ago

Thank you for your feedback. This has been routed to the support team for assistance.

ghost commented 2 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Issue Details
**Describe the bug** Please provide the description of issue you're seeing. We have an old abstract parrellel download layer that handles Cosmos Store, ADLS, ADLSGen2 without issue. but when I want to follow the same logic with Azure Storage Blob SDK. using the OpenRead() method, I find it's not work as I expect. Here is a code that read the stream from an offset and read the requested length of stream. ```csharp private void DownloadRegion(DownloadThreadParameters args, Region region) { long offset = region.Offset + region.CopiedLength; long length = region.Length - region.CopiedLength; // init client code is omitted BlobOpenReadOptions options = new BlobOpenReadOptions(true) { BufferSize = (int) length, Position = offset }; using (Stream source = blobBaseClient.OpenRead(options )) { using (Stream dest = new FileStream(args.DestinationPath, FileMode.Open, FileAccess.Write, FileShare.ReadWrite)) { dest.Seek(offset - args.DestinationOffset, SeekOrigin.Begin); if (length > 0) { var buffer = new byte[100000]; var bytesRead = FillBuffer(source, buffer); int updateCount = 0; while (bytesRead > 0) { dest.Write(buffer, 0, bytesRead); region.CopiedLength += bytesRead; bytesRead = FillBuffer(source, buffer); if (this.nochunk && (updateCount++ % 15) == 0) { UpdateCopiedSize(args); } } } } } } region.Finished = true; } ``` DownloadRegion will download one chunk of the blob, there will be multiple DownloadRegion run in parallel. ```csharp int FillBuffer(Stream stream, byte[] buffer) { var totalBufferBytesRead = 0; var lastBytesRead = 0; lastBytesRead = stream.Read(buffer, totalBufferBytesRead, buffer.Length - totalBufferBytesRead); totalBufferBytesRead += lastBytesRead; while (lastBytesRead > 0 && totalBufferBytesRead < buffer.Length && !interrupted) { lastBytesRead = stream.Read(buffer, totalBufferBytesRead, buffer.Length - totalBufferBytesRead); totalBufferBytesRead += lastBytesRead; } return totalBufferBytesRead; } } ``` FillBuffer will read the stream returned by OpenRead until it no longer readable which the lastBytesRead will eventually be 0. **Expected behavior** What is the expected behavior? The OpenRead will stop readable after read the buffer size. **Actual behavior (include Exception or Stack Trace)** What is the actual behavior? It will be able to read the whole size of the file. **To Reproduce** Steps to reproduce the behavior (include a code snippet, screenshot, or any additional information that might help us reproduce the issue) the main logic I have pasted, just a modification should work. be sure to prepare a large file. for example, 200MB file **Environment:** - Name and version of the Library package used: [e.g. Azure.Storage.Blobs 12.2.0] Azure.Core 1.14.0 Azure.Storage.Blobs 12.8.4 Azure.Storage.Common 12.7.3 Azure.Storage.Files.DataLake 12.6.2 System.Memory 4.5.1 - Hosting platform or OS and .NET runtime version (`dotnet --info` output for .NET Core projects): [e.g. Azure AppService or Windows 10 .NET Framework 4.8] - IDE and version : [e.g. Visual Studio 16.3] Windows 10.0. 19043 Build 19043 .NET Framework 4.7.2 Visual Studio 16.11.3
Author: EverettSummer
Assignees: SaurabhSharma-MSFT
Labels: `Storage`, `Service Attention`, `Client`, `needs-team-attention`
Milestone: -
SaurabhSharma-MSFT commented 2 years ago

@EverettSummer We are currently doing investigation on the same. We will get back to you.

amishra-dev commented 2 years ago

@seanmcc-msft Sean could you please look at this?

seanmcc-msft commented 2 years ago

@EverettSummer, this behavior is by design. BlobOpenReadOptions.BufferSize specifies the size of the buffer to be used for the streaming download, not the number of bytes to download.

If you would like to limit your download to a specific number of bytes, I recommend you keep track of the number of bytes you receive when calling stream.Read()

-Sean

github-actions[bot] commented 4 months ago

Hi @EverettSummer, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

github-actions[bot] commented 3 months ago

Hi @EverettSummer, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.