Closed NaraVen closed 3 years ago
Hi @NaraVen , can you give us a code snippet to reproduce this issue?
Here's a program I used to test
std::string continuation_token;
do
{
auto list_result = client.list_blobs_segmented(container_name, std::string(), continuation_token, std::string(), 2).get();
if (!list_result.success())
{
std::cout << "error" << std::endl;
break;
}
continuation_token = list_result.response().next_marker;
for (auto b : list_result.response().blobs)
{
std::cout << b.name << std::endl;
}
} while (!continuation_token.empty());
The result is
en
en-GBAU
en-IN
in this API reference, it says Blobs are listed in alphabetical order in the response body, with upper-case letters listed first.
. it works exactly as expected.
Hi Jinming, I see the opposite order. The storage container I am testing it is in Central US and has 500 petabytes of data. Could that be the reason? list_segmented_item do loop blob prefix: am_data/en continuation: 2!76!MDAwMDExIWFtX2RhdGEvZW4wITAwMDAyOCE5OTk5LTEyLTMxVDIzOjU5OjU5Ljk5OTk5OTlaIQ-- Sep 26 22:02:20 naraPhilygpu blobfuse[53016]: Function azs_getattr, in file /home/azureuser/azure-storage-fuse/blobfuse/utilities.cpp, line 254: In azs_getattr list_segmented_item 0 file am_data/en-GBAU/ Sep 26 22:02:20 naraPhilygpu blobfuse[53016]: Function azs_getattr, in file /home/azureuser/azure-storage-fuse/blobfuse/utilities.cpp, line 254: In azs_getattr list_segmented_item 1 file am_data/en-IN/ Sep 26 22:02:20 naraPhilygpu blobfuse[53016]: Function azs_getattr, in file /home/azureuser/azure-storage-fuse/blobfuse/utilities.cpp, line 254: In azs_getattr list_segmented_item 2 file am_data/en/
You can ping me at narven@microsoft.commailto:narven@micvrosoft.com if you want to see this test. Nara
From: JinmingHu notifications@github.com Sent: Saturday, September 26, 2020 7:07 PM To: Azure/azure-storage-cpplite azure-storage-cpplite@noreply.github.com Cc: Nara V narven@microsoft.com; Mention mention@noreply.github.com Subject: Re: [Azure/azure-storage-cpplite] list_blobs_segmented is not returning directories hierarchically (#105)
Hi @NaraVenhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNaraVen&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427629641&sdata=qycB5OyqGi6w64wN%2BG9rHs9cjwKJmR9Gv%2BbqRuCQb%2Fg%3D&reserved=0 , can you give us a code snippet to reproduce this issue?
Here's a program I used to test
std::string continuation_token;
do
{
auto list_result = client.list_blobs_segmented(container_name, std::string(), continuation_token, std::string(), 2).get();
if (!list_result.success())
{
std::cout << "error" << std::endl;
break;
}
continuation_token = list_result.response().next_marker;
for (auto b : list_result.response().blobs)
{
std::cout << b.name << std::endl;
}
} while (!continuation_token.empty());
The result is
en
en-GBAU
en-IN
in thishttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Frest%2Fapi%2Fstorageservices%2Flist-blobs&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427639636&sdata=ObdRb1BtFlaalmCRkhZIOpkIVfejnUqKhoVqiPJEnwc%3D&reserved=0 API reference, it says Blobs are listed in alphabetical order in the response body, with upper-case letters listed first.. it works exactly as expected.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-storage-cpplite%2Fissues%2F105%23issuecomment-699573186&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427639636&sdata=bUywrboKPH3dwFhYsd3s%2BB2H5jWlf6c7YUHfHBho77o%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAD5FI4CPEXEGPNO2DEFCE5LSH2M5RANCNFSM4R2UKDTQ&data=02%7C01%7Cnarven%40microsoft.com%7C24a87e6d15cf49af442008d8628a11b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637367692427649629&sdata=gl5sdGZibe5UjGXNCFdwcmgvcbFm%2BeT5AQRA9EIe%2Bn4%3D&reserved=0.
Hi @NaraVen , cpplite sdk doesn't sort the result at all, it simply returns what storage server returns.
I just talked with some guy who's familiar with server side implementation. Since Blob service doesn't have real directory, it only has virtual directory. Storage server stores every blobs under the directory except the directory itself. For example, in your case you have directory en-IN
and blob en-IN/a
and directory en
and blob en/b
. Storage server only stores
en-IN/a
en/b
So when you list directories, the /
also counts and en/
comes after en-IN/
. This is by design.
Nice and detailed explanation, Thanks, closing the issue.
We are calling list_blobs_segmented from blob_client, we get en-GUIB, en_HI and then en. Shouldn't directory en be returned first? See below, we get the following error if we ask for only 2 results (en is the third) drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 ca drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 ch d????????? ? ? ? ? ? codeswitch drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 cs drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 da drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 de drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 el d????????? ? ? ? ? ? en drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 en-GBAU drwxrwxrwx 2 azureuser azureuser 4096 Jan 1 1970 en-IN