Azure / Azurite

A lightweight server clone of Azure Storage that simulates most of the commands supported by it with minimal dependencies
MIT License
1.74k stars 309 forks source link

XStore C++ tests are failing because Azurite returns wrong md5 #2409

Open mikamins opened 3 weeks ago

mikamins commented 3 weeks ago

Which service(blob, file, queue, table) does this issue concern?

blob

Which version of the Azurite was used?

3.30.0

Where do you get Azurite? (npm, DockerHub, NuGet, Visual Studio Code Extension)

npm

What's the Node.js version?

v20.10.0

What problem was encountered?

My team in XStore is trying to migrate tests from Storage Emulator to Azurite. We are blocked by failures that occur only when using Azurite.

The following code is a simplified example of the problem. It works against a real storage account and Storage Emulator. With Azurite, it throws a storage_exception indicating Calculated MD5 does not match existing property.

Steps to reproduce the issue?

void Sample(
    const std::string& accountName,
    const std::string& accountKey,
    const std::string& blobUrl)
{
    storage_credentials credentials(
        to_string_t(accountName),
        to_string_t(accountKey));
    auto blob = cloud_block_blob(
        storage_uri(web::http::uri(to_string_t(blobUrl))),
        credentials);
    const auto& client = blob.service_client();

    auto context = operation_context();
    auto options = blob_request_options();
    options.set_retry_policy(no_retry_policy());
    auto conditions = access_condition::generate_empty_condition();

    auto container = client.get_container_reference(blob.container().name());
    container.create_if_not_exists(
        blob_container_public_access_type::off,
        options,
        context);

    // Create an empty blob
    std::vector<BYTE> emptyContent;
    blob.upload_from_stream(
        Concurrency::streams::rawptr_stream<BYTE>::open_istream(
            emptyContent.data(),
            emptyContent.size()),
        access_condition::generate_if_not_exists_condition(),
        options,
        context);

    // Write some data to the blob
    utility::string_t blockId = L"Zm9v";
    blob.upload_block(
        blockId,
        Concurrency::streams::rawptr_stream<BYTE>::open_istream(
            (const BYTE*)"abc",
            3),
        checksum_none,
        conditions,
        options,
        context);
    blob.upload_block_list(
        {blockId},
        conditions,
        options,
        context);

    // Read the data back
    BYTE buffer[10]{};
    blob.download_range_to_stream(
        Concurrency::streams::rawptr_stream<BYTE>::open_ostream(buffer, 10),
        0,
        6,
        conditions,
        options,
        context);
    VERIFY_ARE_EQUAL('a', buffer[0]);
    VERIFY_ARE_EQUAL('b', buffer[1]);
    VERIFY_ARE_EQUAL('c', buffer[2]);
}

If possible, please provide the debug log

azurite.log

Have you found a mitigation/solution?

No

blueww commented 2 weeks ago

@mikamins

The Azurite debug log has no error occur, so the error you meet should be reported from c++ SDK.

However, from above code, it looks you are using old c++ SDK which is already deprecated. (see link)

Would you please see if you can repro this issue with latest c++ SDK? If so, we will continue investigation on it. Here's a migration guild from the old deprecated c++ SDK to latest c++ SDK: https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/storage/MigrationGuide.md

BTW, from the code it looks first upload a blob with 3 bytes length, then download 6 bytes from it. Currently Azurite will just return the 3 bytes in the blob, which looks is aligned with server behavior per my test. Then not sure why the error happens. So please try to repro the issue with latest c++ SDK. If so, we can find SDK to look why the error happens.

mikamins commented 2 weeks ago

Azurite is returning the correct status code 206 and returning the partial content as expected. The issue is in the response headers.

The latest C++ SDK performs all operations synchronously, so it can never be adopted by our team within XStore. Could you please investigate why Azurite does not work with Microsoft.Azure.Storage.CPP.v140 v7.5.0?

The MD5 and version headers are only major differences I see between the responses from Azurite and Azure/Storage Emulator. One of them is causing issues with the SDK. Considering the exception message says Calculated MD5 does not match existing property., I suspect the MD5 header

blueww commented 2 weeks ago

@mikamins

From the server responds header (get from fiddler) and Azurite responds header (get from Azurite debug log), of a GetBlob request with "x-ms-range: bytes=0-5" on a blob whose length is 3B. They are very similar.

Besides same status code, same content, they also have same Content-Range, Content-Length, x-ms-blob-content-md5 headers. Azurite has one additional header content-md5 whose value is also correct.

So not sure why the deprecated C++ SDK report this error. I can't repro this issue with other SDK like .net.

It will need SDK team support to look into deprecated SDK code and find the issue. If you would like to continue the investigation, would you please file a github issue to C++ SDK and ask why the error reported? When we know why the error happen, then we can know how to fix it in Azurite.

Azure Server

HTTP/1.1 206 Partial Content
Content-Length: 3
Content-Type: application/octet-stream
Content-Range: bytes 0-2/3
Last-Modified: Wed, 12 Jun 2024 02:52:43 GMT
Accept-Ranges: bytes
ETag: "0x8DC8A8ABB866A3A"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: cc35a03b-301e-0057-6173-bced59000000
x-ms-client-request-id: a2141546-1af3-4e05-b763-e4f220100d5a
x-ms-version: 2019-07-07
x-ms-creation-time: Wed, 12 Jun 2024 02:52:43 GMT
x-ms-blob-content-md5: kAFQmDzST7DWlj99KOF/cg==
x-ms-lease-status: unlocked
x-ms-lease-state: available
x-ms-blob-type: BlockBlob
x-ms-server-encrypted: true
Date: Wed, 12 Jun 2024 02:52:47 GMT

Azurite

Headers={
"server":"Azurite-Blob/3.30.0",
"last-modified":"Wed, 12 Jun 2024 02:50:58 GMT",
"x-ms-creation-time":"Wed, 12 Jun 2024 02:50:58 GMT",
"content-length":"3",
"content-type":"application/octet-stream",
"content-range":"bytes 0-2/3",
"etag":"\"0x1F4502392A75EB0\"",
"content-md5":"kAFQmDzST7DWlj99KOF/cg==",
"x-ms-blob-type":"BlockBlob",
"x-ms-lease-state":"available",
"x-ms-lease-status":"unlocked",
"x-ms-client-request-id":"4ac1e003-92a8-4722-b5db-819b53abf9fe",
"x-ms-request-id":"579187d7-dd0d-4c3f-9ab5-9197d75ff924",
"x-ms-version":"2024-05-04",
"accept-ranges":"bytes",
"date":"Wed, 12 Jun 2024 02:51:32 GMT",
"x-ms-server-encrypted":"true",
"x-ms-blob-content-md5":"kAFQmDzST7DWlj99KOF/cg=="}
Jinming-Hu commented 2 weeks ago

Hi @mikamins , we were not able to reproduce this issue with latest versions of Azurite and C++ SDK. Was the attached log generated with your sample code?

We found

2024-06-07T15:04:19.988Z 60120bfb-be88-425d-8db6-5d7a9eda1537 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/43221676-0E2C-4EF8-AEDD-7FB73B1E18CA?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:4vOY+2gpZE7ww3MmewE83WaD4yiRuAgPdpEscZLzq2Y=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"00eaef02-b1b5-48f1-b585-647d4ec2975f","x-ms-date":"Fri, 07 Jun 2024 15:04:19 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

in your log, x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg== indicates you set blob content md5 by yourself, but you didn't in your sample code.

blueww commented 2 weeks ago

Hi @mikamins,

If you really need the new C++ SDK to support Async call to upgrade to it, you can raise an issue in https://github.com/Azure/azure-sdk-for-cpp/issues to raise your requirement.

mikamins commented 1 week ago

Hi @mikamins , we were not able to reproduce this issue with latest versions of Azurite and C++ SDK. Was the attached log generated with your sample code?

We found

2024-06-07T15:04:19.988Z 60120bfb-be88-425d-8db6-5d7a9eda1537 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/43221676-0E2C-4EF8-AEDD-7FB73B1E18CA?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:4vOY+2gpZE7ww3MmewE83WaD4yiRuAgPdpEscZLzq2Y=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"00eaef02-b1b5-48f1-b585-647d4ec2975f","x-ms-date":"Fri, 07 Jun 2024 15:04:19 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

in your log, x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg== indicates you set blob content md5 by yourself, but you didn't in your sample code.

Yes, the attached log was created with the sample code, and using Azurite 3.30 and azure-storage-cpp 7.50. If you are unable to reproduce, could you provide the exact versions that you used and attach the log?

mikamins commented 1 week ago

I stepped through the sample code in more detail, and the SDK is doing the right thing. Azurite is returning the incorrect md5 when downloading the blob.

Blob setup:

Failing download:

Log is attached: azurite-2024-06-17.log

Download HTTP Request:

GET http://127.0.0.1:10000/devstoreaccount1/unittest/E2642A3C-58CF-4CA4-A7C5-2CE4C7A29B91 HTTP/1.1
Connection: Keep-Alive
Accept-Encoding: peerdist
Authorization: SharedKey devstoreaccount1:JhenggacHCvhOxTnO7qcK8+OaibtuQcSzPTkZ8zu6zw=
User-Agent: Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)
x-ms-client-request-id: 31a5ac86-f1a3-458a-ba22-cc4400be02a9
x-ms-date: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-range: bytes=0-5
x-ms-version: 2019-12-12
X-P2P-PeerDist: Version=1.1
X-P2P-PeerDistEx: MinContentInformation=1.0, MaxContentInformation=2.0
Host: 127.0.0.1:10000

Response with bad content-md5:

HTTP/1.1 206 Partial Content
Server: Azurite-Blob/3.30.0
last-modified: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-creation-time: Tue, 18 Jun 2024 00:17:02 GMT
content-length: 3
content-type: application/octet-stream
content-range: bytes 0-2/3
etag: "0x22AF68371ECD940"
content-md5: 1B2M2Y8AsgTpgAmY7PhCfg==
x-ms-blob-type: BlockBlob
x-ms-lease-state: available
x-ms-lease-status: unlocked
x-ms-client-request-id: 31a5ac86-f1a3-458a-ba22-cc4400be02a9
x-ms-request-id: bff087cc-05f0-4c98-996f-0a39ccd4838e
x-ms-version: 2024-05-04
accept-ranges: bytes
date: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-server-encrypted: true
x-ms-blob-content-md5: 1B2M2Y8AsgTpgAmY7PhCfg==
Connection: keep-alive
Keep-Alive: timeout=5

abc
Jinming-Hu commented 1 week ago

@blueww This seems to be a bug in Azurite. Azurite doesn't clear all blob properties when it's overwritten.

blueww commented 1 week ago

@mikamins , @Jinming-Hu

Thanks for the investigation! I will look into it and update later.

blueww commented 1 week ago

@mikamins

I can't repro this with Azurite. Azurite will return correct content MD5 "kAFQmDzST7DWlj99KOF/cg==" after commit block list with a block contains "abc".

After look into the debug log shared in the above comment from you, I see you have set header "x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==" when commit the block list, so the wrong content MD5 is send from client side. If client set the content MD5, Azurite will respect it, else Azurite should have the correct MD5.

2024-06-18T00:17:02.633Z 63c975e5-7b5d-4755-ac9a-a65e617053c7 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/E2642A3C-58CF-4CA4-A7C5-2CE4C7A29B91?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:bprNXXG2v3W9YXS4l8Z9KS6A4MYYoiOhsMWQWqSoKd0=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"31a5ac86-f1a3-458a-ba22-cc4400be02a9","x-ms-date":"Tue, 18 Jun 2024 00:17:02 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

Jinming-Hu commented 1 week ago

@blueww Track1 SDK keeps state of a blob at client side (state includes blob properties). Is it possible that when we get properties of the old blob (empty content), the local state is populated, then md5 is sent out over the wire when calling CommitBlocks?

This cannot be reproed with public Azure because public Azure service doesn't return blob-md5 for partial read. Hmm, it explains everything.

blueww commented 1 week ago

Thanks @Jinming-Hu for the investigation!

Per rest API doc, Put Blob should return Content-MD5, and Azurite is aligned with the rest API doc. Besides Azurite is returning the correct MD5. (If user set it, return the user set value. )

@mikamins The suggested way to fix this issue for long term is upgrading to the latest C++ SDK. Else a workaround is to clear the blob object contentMD5 properties before you run blob.upload_block_list(). Would you please try and see if it works on you scenario?

Jinming-Hu commented 1 week ago

@blueww

per REST API doc

If the request is to read a specified range and the x-ms-range-get-content-md5 is set to true, the request returns an MD5 hash for the range, as long as the range size is less than or equal to 4 MiB. If neither of these sets of conditions is true, no value is returned for the Content-MD5 header.

Azurite should fix its wrong behavior.

blueww commented 1 week ago

@Jinming-Hu

The REST API doc you shared is for Get blob API. But the API which get the content MD5 is Put Blob (per the c++ code and Azurite debug log in this issue, blob object get the Content MD5 "1B2M2Y8AsgTpgAmY7PhCfg==" when Put blob with 0 size). Put Blob API doc should be the one I shared: rest API doc. And ContentMD5 should be returned per this API doc.

Jinming-Hu commented 1 week ago

@blueww I don't think we're on the same page. Anyway, the workaround you proposed does sound good to me.