Azure / azure-storage-cpplite

Lite version of C++ Client Library for Microsoft Azure Storage
MIT License
25 stars 43 forks source link

list_blobs_segmented is not returning directories hierarchically #105

Closed NaraVen closed 3 years ago

NaraVen commented 3 years ago

We are calling list_blobs_segmented from blob_client, we get en-GUIB, en_HI and then en. Shouldn't directory en be returned first? See below, we get the following error if we ask for only 2 results (en is the third) drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 ca drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 ch d????????? ? ? ? ? ? codeswitch drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 cs drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 da drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 de drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 el d????????? ? ? ? ? ? en drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 en-GBAU drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 en-IN

Jinming-Hu commented 3 years ago

Hi @NaraVen , can you give us a code snippet to reproduce this issue?

Here's a program I used to test

std::string continuation_token;
do
{
  auto list_result = client.list_blobs_segmented(container_name, std::string(), continuation_token, std::string(), 2).get();
  if (!list_result.success())
  {
    std::cout << "error" << std::endl;
    break;
  }
  continuation_token = list_result.response().next_marker;
  for (auto b : list_result.response().blobs)
  {
    std::cout << b.name << std::endl;
  }
} while (!continuation_token.empty());

The result is

en
en-GBAU
en-IN

in this API reference, it says Blobs are listed in alphabetical order in the response body, with upper-case letters listed first.. it works exactly as expected.

NaraVen commented 3 years ago

Hi Jinming, I see the opposite order. The storage container I am testing it is in Central US and has 500 petabytes of data. Could that be the reason? list_segmented_item do loop blob prefix: am_data/en continuation: 2!76!MDAwMDExIWFtX2RhdGEvZW4wITAwMDAyOCE5OTk5LTEyLTMxVDIzOjU5OjU5Ljk5OTk5OTlaIQ-- Sep 26 22:02:20 naraPhilygpu blobfuse[53016]: Function azs_getattr, in file /home/azureuser/azure-storage-fuse/blobfuse/utilities.cpp, line 254: In azs_getattr list_segmented_item 0 file am_data/en-GBAU/ Sep 26 22:02:20 naraPhilygpu blobfuse[53016]: Function azs_getattr, in file /home/azureuser/azure-storage-fuse/blobfuse/utilities.cpp, line 254: In azs_getattr list_segmented_item 1 file am_data/en-IN/ Sep 26 22:02:20 naraPhilygpu blobfuse[53016]: Function azs_getattr, in file /home/azureuser/azure-storage-fuse/blobfuse/utilities.cpp, line 254: In azs_getattr list_segmented_item 2 file am_data/en/

You can ping me at narven@microsoft.commailto:narven@micvrosoft.com if you want to see this test. Nara

From: JinmingHu notifications@github.com Sent: Saturday, September 26, 2020 7:07 PM To: Azure/azure-storage-cpplite azure-storage-cpplite@noreply.github.com Cc: Nara V narven@microsoft.com; Mention mention@noreply.github.com Subject: Re: [Azure/azure-storage-cpplite] list_blobs_segmented is not returning directories hierarchically (#105)

Hi @NaraVenhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNaraVen&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427629641&sdata=qycB5OyqGi6w64wN%2BG9rHs9cjwKJmR9Gv%2BbqRuCQb%2Fg%3D&reserved=0 , can you give us a code snippet to reproduce this issue?

Here's a program I used to test

std::string continuation_token;

do

{

auto list_result = client.list_blobs_segmented(container_name, std::string(), continuation_token, std::string(), 2).get();

if (!list_result.success())

{

std::cout << "error" << std::endl;

break;

}

continuation_token = list_result.response().next_marker;

for (auto b : list_result.response().blobs)

{

std::cout << b.name << std::endl;

}

} while (!continuation_token.empty());

The result is

en

en-GBAU

en-IN

in thishttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Frest%2Fapi%2Fstorageservices%2Flist-blobs&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427639636&sdata=ObdRb1BtFlaalmCRkhZIOpkIVfejnUqKhoVqiPJEnwc%3D&reserved=0 API reference, it says Blobs are listed in alphabetical order in the response body, with upper-case letters listed first.. it works exactly as expected.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-storage-cpplite%2Fissues%2F105%23issuecomment-699573186&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427639636&sdata=bUywrboKPH3dwFhYsd3s%2BB2H5jWlf6c7YUHfHBho77o%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAD5FI4CPEXEGPNO2DEFCE5LSH2M5RANCNFSM4R2UKDTQ&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427649629&sdata=gl5sdGZibe5UjGXNCFdwcmgvcbFm%2BeT5AQRA9EIe%2Bn4%3D&reserved=0.

Jinming-Hu commented 3 years ago

Hi @NaraVen , cpplite sdk doesn't sort the result at all, it simply returns what storage server returns.

I just talked with some guy who's familiar with server side implementation. Since Blob service doesn't have real directory, it only has virtual directory. Storage server stores every blobs under the directory except the directory itself. For example, in your case you have directory en-IN and blob en-IN/a and directory en and blob en/b. Storage server only stores

en-IN/a
en/b

So when you list directories, the / also counts and en/ comes after en-IN/. This is by design.

NaraVen commented 3 years ago

Nice and detailed explanation, Thanks, closing the issue.