Azure / azure-storage-python

Microsoft Azure Storage Library for Python
https://azure-storage.readthedocs.io
MIT License
338 stars 240 forks source link

Downloading a blob fails for v1.3.1+ if blob is written during download #602

Closed marco-rossi29 closed 5 years ago

marco-rossi29 commented 5 years ago

Note: I understand that this breaking change was done on purpose in commit 8f68597815022e20df5df5402508a2569d8e6bfe. Is there a better way to download a blob that may be written?

Which service(blob, file, queue) does this issue concern?

get_blob_to_path()

Which version of the SDK was used? Please provide the output of pip freeze.

azure-storage-blob-1.3.0 works azure-storage-blob-1.3.1 doesn't work

What problem was encountered?

download fails:

Client-Request-ID=877e7040-7e47-11e9-b109-a08cfde1f962 Retry policy did not allow for a retry: Server-Timestamp=Fri, 24 May 2019 17:15:29 GMT, Server-Request-ID=7b05956d-801e-0008-4b54-12e202000000, HTTP status code=412, Exception=The condition specified using HTTP conditional header(s) is not met. ErrorCode: ConditionNotMet<?xml version="1.0" encoding="utf-8"?><Error><Code>ConditionNotMet</Code><Message>The condition specified using HTTP conditional header(s) is not met.RequestId:7b05956d-801e-0008-4b54-12e202000000Time:2019-05-24T17:15:29.2184347Z</Message></Error>.
Error: The condition specified using HTTP conditional header(s) is not met. ErrorCode: ConditionNotMet
<?xml version="1.0" encoding="utf-8"?><Error><Code>ConditionNotMet</Code><Message>The condition specified using HTTP conditional header(s) is not met.
RequestId:7b05956d-801e-0008-4b54-12e202000000

Have you found a mitigation/solution?

I downgraded to 1.3.0

zezha-msft commented 5 years ago

Hi @marco-rossi29, thanks for reaching out!

Could you please clarify your use case? Are you trying to download a blob (in its entirety) while it's being modified by another process/user?

zezha-msft commented 5 years ago

Which blob type are you using?

marco-rossi29 commented 5 years ago

Hi Ze, Thank you for your answer: Our logic is here

We are using BlockBlobService(connection_string=conn_string) (see here)

The use case is: there is a blob in Azure Storage that is being modified by another process/user and I'd like to download the blob, but I'm happy to not download any new portion of it.

To clarify, say I'm accessing Azure Storage and the file I want is 100MB. As I'm downloading it, another process increases the file to 110MB. I'm happy to download only the 100MB as long as the get_blob_to_path command shouldn't throw. Currently, the workaround to avoid throwing is by setting max_connections = 1. However, since v1.3.1 you removed such workaround, and even with max_connections = 1 the program throws

zezha-msft commented 5 years ago

Hi @marco-rossi29, thanks for the clarification!

I assume your blob is being manipulated in an append-only pattern, right? E.g. the extra 10MB of data was added at the end of the blob.

To be clear, the SDK is locking onto the blob's etag to protect the integrity of the downloaded data. In most use cases, there's no guarantee that the data is being modified in a way that still allows the downloaded data to be "good", but if you are only appending blocks at the end of the blob, then you could work around the protection by specifying if_match=*.

marco-rossi29 commented 5 years ago

Perfect, that's a great tip and it will probably solve my issue. Thank you very much for your help!