Open petr-tichy opened 4 years ago
@petr-tichy Could you provide an example of how to create this condition? Do you just list an empty Azure subdirectory?
Well, that is a good question. Unfortunately I'm not yet certain why this happens. I'm unable to replicate this behavior manually with a test account. The reason could be anything from Azure slowing down clients, trying to resolve some consistency issues, or simply a bug or undocumented feature. Accessing our account with other native tools shows no problems, while these seem to handle this gracefully (see below)
We are running FoundationDB backup thru s3proxy to Azure Blob Storage. After the backup was continuously running for several weeks, creating few snapshots, we tried to start expiration and hit this situation. The storage container has currently about 1.4M blobs.
When describing the backups using fdbbackup describe
, it first successfully lists some control files, and then tries to list the path data/fdbbackup/snapshots/
:
GET /fdbbackup/?max-keys=1000&prefix=data/fdbbackup/snapshots/&delimiter=/&marker=
This prefix currently contains just 8 blobs, but Azure responds first with an empty Blobs and a marker:
[s3proxy] D 08-26 22:02:44.362 S3Proxy-Jetty-17 jclouds.wire:56 |::] << "[0xef][0xbb][0xbf]<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://myaccount.blob.core.windows.net/" ContainerName="fdbbackup">
<Prefix>data/fdbbackup/snapshots/</Prefix>
<MaxResults>1000</MaxResults>
<Blobs />
<NextMarker>2!160!MDAwMDc1IWRhdGEvZmRiYmFja3VwL3NuYXBzaG90cy9zbmFwc2hvdCw2NjM2ODYxNDQzNzAzLDY3MjMyOTExNzcxNjksNDYzMTM0MTYzNDYyOCEwMDAwMjghMTYwMS0wMS0wMVQwMDowMDowMC4wMDAwMDAwWiE-</NextMarker>
</EnumerationResults>"
Using Azure Storage Blobs client library for Python - Version 12.4.0 to list the prefix, I see it really performs two requests. The first response contains the empty <Blobs />
node with continuation marker, and the second has the 8 <Blob>
entries (redacted).
Request URL: 'https://myaccount.blob.core.windows.net/fdbbackup?prefix=data%2Ffdbbackup%2Fsnapshots%2F&restype=container&comp=list'
Request method: 'GET'
Request headers:
'Accept': 'application/xml'
'x-ms-version': '2019-12-12'
'x-ms-date': 'Wed, 26 Aug 2020 20:42:33 GMT'
'x-ms-client-request-id': 'aaedab24-e7dc-11ea-9c08-784f435dbd4f'
'User-Agent': 'azsdk-python-storage-blob/12.4.0 Python/3.8.2 (macOS-10.15.6-x86_64-i386-64bit)'
'Authorization': '*****'
Request body:
None
Response status: 200
Response headers:
'Transfer-Encoding': 'chunked'
'Content-Type': 'application/xml'
'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
'x-ms-request-id': '79a39425-a01e-00c5-41e9-7bbabb000000'
'x-ms-client-request-id': 'aaedab24-e7dc-11ea-9c08-784f435dbd4f'
'x-ms-version': '2019-12-12'
'Date': 'Wed, 26 Aug 2020 20:42:34 GMT'
Response content:
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://myaccount.blob.core.windows.net/" ContainerName="fdbbackup">
<Prefix>data/fdbbackup/snapshots/</Prefix>
<Blobs />
<NextMarker>2!160!MDAwMDc1IWRhdGEvZmRiYmFja3VwL3NuYXBzaG90cy9zbmFwc2hvdCw2NjM2ODYxNDQzNzAzLDY3MjMyOTExNzcxNjksNDYzMTM0MTYzNDYyOCEwMDAwMjghMTYwMS0wMS0wMVQwMDowMDowMC4wMDAwMDAwWiE-</NextMarker>
</EnumerationResults>
Request URL: 'https://myaccount.blob.core.windows.net/fdbbackup?prefix=data%2Ffdbbackup%2Fsnapshots%2F&marker=2%21160%21MDAwMDc1IWRhdGEvZmRiYmFja3VwL3NuYXBzaG90cy9zbmFwc2hvdCw2NjM2ODYxNDQzNzAzLDY3MjMyOTExNzcxNjksNDYzMTM0MTYzNDYyOCEwMDAwMjghMTYwMS0wMS0wMVQwMDowMDowMC4wMDAwMDAwWiE-&restype=container&comp=list'
Request method: 'GET'
Request headers:
'Accept': 'application/xml'
'x-ms-version': '2019-12-12'
'x-ms-date': 'Wed, 26 Aug 2020 20:42:34 GMT'
'x-ms-client-request-id': 'ab840dee-e7dc-11ea-9c08-784f435dbd4f'
'User-Agent': 'azsdk-python-storage-blob/12.4.0 Python/3.8.2 (macOS-10.15.6-x86_64-i386-64bit)'
'Authorization': '*****'
Request body:
None
Response status: 200
Response headers:
'Transfer-Encoding': 'chunked'
'Content-Type': 'application/xml'
'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
'x-ms-request-id': '79a39499-a01e-00c5-25e9-7bbabb000000'
'x-ms-client-request-id': 'ab840dee-e7dc-11ea-9c08-784f435dbd4f'
'x-ms-version': '2019-12-12'
'Date': 'Wed, 26 Aug 2020 20:42:34 GMT'
Response content:
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://myaccount.blob.core.windows.net/" ContainerName="fdbbackup">
<Prefix>data/fdbbackup/snapshots/</Prefix>
<Marker>2!160!MDAwMDc1IWRhdGEvZmRiYmFja3VwL3NuYXBzaG90cy9zbmFwc2hvdCw2NjM2ODYxNDQzNzAzLDY3MjMyOTExNzcxNjksNDYzMTM0MTYzNDYyOCEwMDAwMjghMTYwMS0wMS0wMVQwMDowMDowMC4wMDAwMDAwWiE-</Marker>
<Blobs>
<Blob>
<Name>data/fdbbackup/snapshots/snapshot...</Name>
...
</Blob>
...
</Blobs>
<NextMarker />
</EnumerationResults>
The code that generated this is:
import sys
import logging
from azure.storage.blob import BlobServiceClient
logger = logging.getLogger('azure.storage.blob')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
logger.addHandler(handler)
blob_service_client = BlobServiceClient.from_connection_string(
"DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=...;EndpointSuffix=core.windows.net",
logging_enable=True)
container_client = blob_service_client.get_container_client("fdbbackup")
blobs_list = container_client.walk_blobs(name_starts_with="data/fdbbackup/snapshots/")
for blob in blobs_list:
pass
After some testing it looks like the fix in #326 is not sufficient. Sending a response to s3 client with neither
CommonPrefixes
norContents
elements is not expected, at least bys3cmd
and our backup tool. I suspect that s3proxy would have to handle this internally, issuing the next request(s) and returning it to the client only when it finally contains eitherCommonPrefixes
orContents
s3cmd
fails withOriginally posted by @petr-tichy in https://github.com/gaul/s3proxy/issues/326#issuecomment-677878461