dapr / components-contrib

Community driven, reusable components for distributed apps
Apache License 2.0
548 stars 480 forks source link

Azure Blob Storage List Blobs does not return Marker #3010

Closed rodyvansambeek closed 1 year ago

rodyvansambeek commented 1 year ago

Expected Behavior

When using the following code on a Azure Blob storage account with more than 10 files the marker value is not returned, but always returns an empty string:

var bindingSettings = new { maxResults = 10 };
var bindingSettingsBytes = JsonSerializer.SerializeToUtf8Bytes(bindingSettings);
bindingRequest.Data = new ReadOnlyMemory<byte>(bindingSettingsBytes);
var response = await _daprClient.InvokeBindingAsync(bindingRequest);

response.Metadata.TryGetValue("marker", out marker);

I would expect the marker variable to have the marker value.

I have tested it with a local Azurite blob storage account, but also an online Azure Blob Storage account. Both just return the first 10 results and an empty string as marker. This way we don't got any way to iterate through all pages.

Tested the same query using the AZ CLI, to confirm that it wasn't an issue in Azure, but that returns the "nextMarker" field correctly as the last object of the result:

az storage blob list --connection-string "[connstring]" -c [containername] --num-results 10 --show-next-marker

Will return:

[
  {
     [blob 1]
  },
  {
    [blob 2]
  },
  .... (total 10 results)
  {
    "nextMarker": "e61534b1-9010-4e8e-838a-46049d01b393/54-spiderfull/property-models/18216611.json"
  }
]

Actual Behavior

The marker value in the response.Metadata object is always ""

Steps to Reproduce the Problem

Use the above code using Dapr with C# and invoke the "list" operation on the binding with a maxResults lower then the number of items in the blob storage account.

ItalyPaleAle commented 1 year ago

Thanks for reporting this, we'll take a look

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

rodyvansambeek commented 1 year ago

I don't think this is fixed, so needs to stay open.

berndverst commented 1 year ago

From what I can see, Dapr will correctly return a marker when there are more than 5000 (default max results) results, or whatever number you configured for this. Additionally, Dapr will return a marker if fewer results had to be returned because of requests across multiple partitions.

However, when partitions are not involved and there are fewer results than max results, Dapr will automatically page for you and reach the last page which has no further additional marker. In this case we return the empty marker.

I do not see any issue here with the marker behavior. Our integration (certification) tests also show that the markers are returned correctly.

@rodyvansambeek perhaps try setting the max results to a lower number and you should see that Dapr is indeed returning markers.

berndverst commented 1 year ago

For convenience I am adding a new property pagesTraversed in the response metadata of the list operation. This will return the number of pages that were internally traversed.

However, I maintain that I do not see any issue with the component @rodyvansambeek

https://github.com/dapr/components-contrib/pull/3126

rodyvansambeek commented 1 year ago

@berndverst Thank you for explaining the inner workings. In the next few days, I'll check the project I'm working on to see if it performs as described when I don't specify the 'maxResults' parameter and want to retrieve more than 5,000 results. In my initial test setup, I'm fairly certain it only returned the first 10 results, even though there were hundreds available in the blob storage account, and it did not include a marker. I'll update you as soon as possible. I also plan to test it with a storage account that has over 5,000 results to see if it returns a marker in that case.

berndverst commented 1 year ago

@rodyvansambeek if you don't mind - wait until Dapr 1.12 - or the 1.12.0-rc.2 version (if you don't mind trying out our release candidate). That will include the change I made yesterday to also show how many pages were traversed, and that will report the number of items were returned (which now can be at most maxResults + pageSize)

berndverst commented 1 year ago

@rodyvansambeek one additional comment here:

If you specify maxResults = 10 -- that is not the page size. This actually tells the server to only return 10 results. Even if more results exist, that does not mean there will pagination for additional results beyond 10 results - the server will not actually return more results.

We do not have a way to control the page size.

rodyvansambeek commented 1 year ago

@berndverst That was the issue. I assumed that would be the page size, and the marker would be returned after that amount of results. I understand the inner workings now. Maybe it's a good idea to change this in the documentation because if you read the part of the data parameters on: https://docs.dapr.io/reference/components-reference/supported-bindings/blobstorage/#list-blobs It looks like you need to specify a maxResults and get a marker back for every page to iterate through all the results.