Azure / azure-storage-python

Microsoft Azure Storage Library for Python
https://azure-storage.readthedocs.io
MIT License
338 stars 240 forks source link

BlockBlobService.get_blob_to_path() crashes even when if_match='*' #625

Open marco-rossi29 opened 4 years ago

marco-rossi29 commented 4 years ago

Which service(blob, file, queue) does this issue concern?

BlockBlobService.get_blob_to_path()

Which version of the SDK was used? Please provide the output of pip freeze.

Same issue is encountered in versions: 0.37.1, 1.3.0, and 2.1.0

Version 0.37.1

pip freeze azure-common==1.1.8 azure-nspkg==2.0.0 azure-storage==0.36.0 azure-storage-blob==0.37.1 azure-storage-common==0.37.1 azure-storage-nspkg==2.0.0

Version 1.3.0

pip freeze azure-common==1.1.8 azure-nspkg==2.0.0 azure-storage==0.36.0 azure-storage-blob==1.3.0 azure-storage-common==1.3.0 azure-storage-nspkg==2.0.0

Latest version

pip freeze azure-common==1.1.8 azure-nspkg==2.0.0 azure-storage==0.36.0 azure-storage-blob==2.1.0 azure-storage-common==2.1.0 azure-storage-nspkg==2.0.0

What problem was encountered?

When trying to download a blob that is currently been written in append mode (i.e., the ETag changes during time of download), BlockBlobService.get_blob_to_path() crashes even when using if_match='*' giving ConditionNotMet Error: Client-Request-ID=9cce10c8-b739-11e9-9366-b32a86ae676b Retry policy did not allow for a retry: Server-Timestamp=Mon, 05 Aug 2019 04:29:28 GMT, Server-Request-ID=86bb1780-a01e-002e-6146-4b01e3000000, HTTP status code=412, Exception=The condition specified using HTTP conditional header(s) is not met. ErrorCode: ConditionNotMet<?xml version="1.0" encoding="utf-8"?><Error><Code>ConditionNotMet</Code><Message>The condition specified using HTTP conditional header(s) is not met.RequestId:86bb1780-a01e-002e-6146-4b01e3000000Time:2019-08-05T04:29:28.3320963Z</Message></Error>

Have you found a mitigation/solution?

Code to repro:

bbs = BlockBlobService(connection_string=connection_string)
bbs.get_blob_to_path(app_id, blob_name, output_fp, start_range=0, end_range=100*1024**2, if_match='*')
zezha-msft commented 4 years ago

Hi @marco-rossi29, thanks for reaching out!

This behavior is by design, as the SDK needs to guarantee the data integrity: once the transfer starts, the downloader will lock onto the blob's Etag, as we cannot possibly know which part of the blob changed.

When given if_match='*', the method checks the condition by performing a GET request on the beginning of the blob; and the Etag to lock onto is obtained in the same call.

You could work around this problem by having your own version of the downloader, would that be a satisfactory solution for you?

marco-rossi29 commented 4 years ago

yes, I'm happy to have my own version of the downloader.

I've found two ways to work around it by modifying this line: https://github.com/Azure/azure-storage-python/blob/master/azure-storage-blob/azure/storage/blob/_download_chunking.py#L130 to be either: 1) self.if_match = None 2) comment out the line and passing if_match='*' in get_blob_to_path() (this in my opinion is the expected behavior of if_match='*')

Do you have other suggestions?

zezha-msft commented 4 years ago

option 1 seems to be the easiest. 😄

marco-rossi29 commented 4 years ago

Thank you for your answer. I'm now puzzeled by the if_match='*' specified in the docs:

if_match (str) – An ETag value, or the wildcard character (*). Specify this header to perform the operation only if the resource’s ETag matches the value specified.

My expectation is that when using if_match='*' the blob is downloaded no matter what the ETag is. But, as we saw above, this is not the case.

Can you give me an example of when using if_match='*' would give a behaviour different than the default, i.e., if_match=None?

zezha-msft commented 4 years ago

if_match='*' guarantees that a blob exists (i.e. there is an actual Etag to look at). if_none_match='*' guarantees that a blob does not exist.

marco-rossi29 commented 4 years ago

I see. Thanks. So the bottom line is that the current API doesn't allow me to download a file while it is modified by an append operation. Assuming the user takes responsibility about the integrity of the file, is it possible to add a flag or any other mechanism on your side to ensure that get_blobto*() would work in this append scenario?

marco-rossi29 commented 4 years ago

Would something like this PR be acceptable? https://github.com/Azure/azure-storage-python/pull/630