Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.26k stars 1.93k forks source link

[BUG] PagedIterable<BlobItem> stream().paralllel() is behaving as sequential #40768

Open varenyavv opened 3 weeks ago

varenyavv commented 3 weeks ago

Describe the bug PagedIterable\<BlobItem> stream().paralllel() is behaving as sequential.

Exception or Stack Trace Log showing a single forkjoinpool worker is performing the task even though it has 20 workers.

2024-06-21 17:59:21,275 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939632.xml
2024-06-21 17:59:22,208 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939814.xml
2024-06-21 17:59:23,143 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939895.xml
2024-06-21 17:59:24,065 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939934.xml

To Reproduce Steps to reproduce the behavior: Use the below code snippet to list the blobs. Logs will display that the task is occurring serially which is causing slowness if the blob counts are in millions.

Code Snippet

BlobServiceClient blobServiceClient = new BlobServiceClientBuilder().connectionString(connectionString).buildClient();
BlobContainerClient blobContainerClient = blobServiceClient.getBlobContainerClient(blobContainerName);
String blobPrefix = "789/generated/";
ListBlobsOptions options = new ListBlobsOptions().setPrefix(blobPrefix).setMaxResultsPerPage(5000);
PagedIterable<BlobItem> blobItems = blobContainerClient.listBlobs(options, null, null);
blobItems.stream()
        .parallel()
        .filter(this::isAnXml)
        .forEach(
            blobItem -> {
              LOGGER.info("Downloading blob {}", blobItem.getName());
              //more business logic
            });

Expected behavior Execution should happen in parallel by multiple worker threads rather than by the single thread in sequential manner.

Setup (please complete the following information): OS: Ubuntu 22.04.4 LTS IDE: Intellij Library/Libraries: com.azure:azure-storage-blob:12.25.0 Java version: Openjdk version 17.0.10 App Server/Environment: Tomcat embedded in Springboot Frameworks: Springboot v3.2.3

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

github-actions[bot] commented 3 weeks ago

@ibrahimrabab @ibrandes @seanmcc-msft

github-actions[bot] commented 3 weeks ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.