Azure / Azurite

A lightweight server clone of Azure Storage that simulates most of the commands supported by it with minimal dependencies
MIT License
1.74k stars 309 forks source link

Azurite cannot handle encoded blob names #2405

Closed viswanr closed 2 weeks ago

viswanr commented 1 month ago

Which service(blob, file, queue, table) does this issue concern?

Blob

Which version of the Azurite was used?

3.29.0

Where do you get Azurite? (npm, DockerHub, NuGet, Visual Studio Code Extension)

DockerHub/MCR

What's the Node.js version?

18

What problem was encountered?

We are running an integration test written in Typescript using Docker, where we have a container with our Java Function app an d it calls to into Azurite, the official storage emulator from Azure. The Java Azure Function is running in Docker along with Azurite within the same network.

The PUT request to upload the file to Azurite is as follows, which is successful: "PUT /devstoreaccount1/tdigests/hourlyDigests/v2/EMEA/t2_1/ChatSwitch/eyJBcHBJbmZvX0NsaWVudFR5cGUiOiJ3ZWIiLCJBcHBJbmZvX0Vudmlyb25tZW50IjoicHJvZCIsIkFwcEluZm9fRXhwZXJpZW5jZU5hbWUiOiJyZWFjdC13ZWItY2xpZW50IiwiRGVyaXZlZF9Vc2VySW5mb19FZHVUeXBlIjoiRURVIiwiY29udGV4dFR5cGUiOiJ0Ml8xIiwiZXZlbnROYW1lIjoiY2hhdF9zd2l0Y2giLCJncm91cGluZ05hbWUiOiJBcHBJbmZvX0NsaWVudFR5cGUiLCJtZXRyaWNLaW5kIjoibGF0ZW5jeSJ9/2024-04-07/21/tdigests.json HTTP/1.1" 201

Now, when we look at the URL that the Java app is using for the GET request,

"GET /devstoreaccount1/tdigests/hourlyDigests%2Fv2%2FROW%2Ft2_1%2FChatSwitch%2FeyJBcHBJbmZvX0NsaWVudFR5cGUiOiJ3ZWIiLCJBcHBJbmZvX0Vudmlyb25tZW50IjoicHJvZCIsIkFwcEluZm9fRXhwZXJpZW5jZU5hbWUiOiJyZWFjdC13ZWItY2xpZW50IiwiRGVyaXZlZF9Vc2VySW5mb19FZHVUeXBlIjoiRURVIiwiY29udGV4dFR5cGUiOiJ0Ml8xIiwiZXZlbnROYW1lIjoiY2hhdF9zd2l0Y2giLCJncm91cGluZ05hbWUiOiJTY2VuYXJpb19TdGF0dXMiLCJtZXRyaWNLaW5kIjoibGF0ZW5jeSJ9%2F2024-04-07%2F23%2Ftdigests.json HTTP/1.1" 404

As you can see, the slashes in the path are being encoded. Consequently, this is causing the following 404 issue where it cannot find the blob (since the path is wrong because of the encoding). The download path is not encoded before the GET request is sent, but it seems like it is getting encoded during transit to Azurite.

This behavior seems to happen only in the Java SDK. We have similar setups with .NET and Nodejs Function apps, and they work well with Azurite. https://github.com/Azure/azure-sdk-for-net/pull/30271

Here is an issue we filed on the Java SDK - https://github.com/Azure/azure-sdk-for-java/issues/40370

Since the actual Azure Blob Storage service in production treats slashes and %2F the same, Azurite should emulate the same.

Steps to reproduce the issue?

If possible, please provide the debug log using the -d parameter, replacing \<pathtodebuglog> with an appropriate path for your OS, or review the instructions for docker containers:

-d "<pathtodebuglog>"

Please be sure to remove any PII or sensitive information before sharing!
The debug log will log raw request headers and bodies, so that we can replay these against Azurite using REST and create tests to validate resolution.

Have you found a mitigation/solution?

blueww commented 1 month ago

@viswanr

Thanks for raising this issue!

Do you mean you upload/download same blob with different Uri, upload with not encoded uri, download with encoded uri? If you could share the Azurite debug log, we can be more clear on which Uri Azurite receive.

If this is the case, and if the different Uris both work on product Azure, this issue looks an undocumented behavior of product Azure. It will take time to add this support the Azurite, since if there are no clear doc for this behavior, we need first contact server team and get a clear picture of how this feature works, include all cornet cases. It might take a long time. Besides that, we also has other feature requests on hand, we will priority this item together with other feature items, and this item might won't be our highest priority in the recent future.

As you said , this issue only occur on Java app but not .net and js app, so this looks more like a Java app issue, since it upload/download same blob with different Uri. I see you have already opened an issue for Java. It would be better if Java app could fix this issue.

viswanr commented 1 month ago

@blueww

  1. The PUT url and the GET url above are pulled out from the Azurite log as Azurite sees it. Yes, the PUT request isn't encoded. But the GET is encoded. Azurite should still properly map / to %2F and vice versa.
  2. The actual Azure Blob Storage service does treat them to be similar. From the documentation, it's clear that: "Reserved URL characters must be properly escaped.". So technically, Azurite should understand these escaped characters, and this is not undocumented behavior.

cc: @alzimmermsft

blueww commented 1 month ago

@viswanr Thanks for point out the documentation ! However, for "/" in blob name, it's a little different from other char in doc. In the documentation, it says:

1.Certain characters must be percent-encoded to appear in a URL, using UTF-8 (preferred) or MBCS. This encoding occurs automatically when you use the Azure Storage client libraries. However, per my testing, current storage client libraries won't encode "/" in storage blob path. So "/" is not in the certain characters need encode.

  1. For blob name, it says A path segment is the string between consecutive delimiter characters (for example, a forward slash /) that corresponds to the directory or virtual directory. So "/" is used as path segment delimiter, instead of normal char that need encoded.

Azurite already has a complex logic to handle "/" and "\" in the Uri (for corner cases), and have get customer issues for regression caused by some before changes on this part. So please understand we need to investigate and get the whole picture of the server behavior on this case, then see how to apply it to Azurite to avoid regression as much as possible. This will take time. And we will priority this item together with other feature items, so this item might won't be our highest priority in the recent future since it related with abnormal client behavior.

A better way is Java App fixing code to upload/download same blob with same Uri. It's really abnormal for same application/library handle same object with different kind of Uri.

blueww commented 1 month ago

@viswanr

I just tested on latest Azurite 3.30.0, and replace "/" to "%2F" in blob name, the Uri still works on Azurite (test with Getblob request). So it looks the encoded blob name already works on Azurite.

Besides that, in the upload/download Uri you provide, I see they are no point to same blob. Upload point to "hourlyDigests/v2/ROW/..." , download point to "hourlyDigests/v2/EMEA/...".

To continue investigation, would you please provide Azurite debug log (collect with "-d" parameter, see details) for upload/download on same blob?

Following Azurite debug log is from my testing, you can see Azurite has Retrieved account name from context: devstoreaccount1, container: weitest, blob: aa/bb/cc/dd, when the Uri is "http://127.0.0.1/devstoreaccount1/weitest/aa%2Fbb%2Fcc%2Fdd"

2024-05-30T05:18:58.983Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobStorageContextMiddleware: RequestMethod=GET RequestURL=http://127.0.0.1/devstoreaccount1/weitest/aa%2Fbb%2Fcc%2Fdd?sv=2023-08-03&se=2024-06-05T05%3A18%3A05Z&sr=b&sp=rw&sig=[hidden] RequestHeaders:{"user-agent":"Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.22621.2506","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobStorageContextMiddleware: Account=devstoreaccount1 Container=weitest Blob=aa/bb/cc/dd
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 verbose DispatchMiddleware: Dispatching request...
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info DispatchMiddleware: Operation=Blob_Download
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 verbose AuthenticationMiddlewareFactory:createAuthenticationMiddleware() Validating authentications.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info PublicAccessAuthenticator:validate() Start validation against public access.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug PublicAccessAuthenticator:validate() Getting account properties...
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug PublicAccessAuthenticator:validate() Retrieved account name from context: devstoreaccount1, container: weitest, blob: aa/bb/cc/dd
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug PublicAccessAuthenticator:validate() Skip public access authentication. Cannot get public access type for container weitest
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSharedKeyAuthenticator:validate() Start validation against account shared key authentication.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSharedKeyAuthenticator:validate() Request doesn't include valid authentication header. Skip shared key authentication.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info AccountSASAuthenticator:validate() Start validation against account Shared Access Signature pattern.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug AccountSASAuthenticator:validate() Getting account properties...
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug AccountSASAuthenticator:validate() Retrieved account name from context: devstoreaccount1, container: weitest, blob: aa/bb/cc/dd
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug AccountSASAuthenticator:validate() Got account properties successfully.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug AccountSASAuthenticator:validate() Retrieved signature from URL parameter sig: [hidden]
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info AccountSASAuthenticator:validate() Failed to get valid account SAS values from request.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Start validation against blob service Shared Access Signature pattern.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Getting account properties...
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Retrieved account name from context: devstoreaccount1, container: weitest, blob: aa/bb/cc/dd
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Got account properties successfully.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Retrieved signature from URL parameter sig: Ytu9dRcmOu97+1f7wxjBKPqHb6c/wnKoC5R/XolDZK0=
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Signed resource type is b.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Successfully got valid blob service SAS values from request. {"version":"2023-08-03","expiryTime":"2024-06-05T05:18:05Z","permissions":"rw","containerName":"weitest","blobName":"aa/bb/cc/dd","signedResource":"b"}
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Validate signature based account key1.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() String to sign is: "rw\n\n2024-06-05T05:18:05Z\n/blob/devstoreaccount1/weitest/aa/bb/cc/dd\n\n\n\n2023-08-03\nb\n\n\n\n\n\n\n"
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Calculated signature is: Ytu9dRcmOu97+1f7wxjBKPqHb6c/wnKoC5R/XolDZK0=
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Signature based on key1 validation passed.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Validate start and expiry time.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Validate IP range.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Validate request protocol.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 debug BlobSASAuthenticator:validate() Got permission requirements for operation Blob_Download - {"permission":"r"}
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobSASAuthenticator:validate() Blob service SAS validation successfully.
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 verbose DeserializerMiddleware: Start deserializing...
2024-05-30T05:18:58.984Z e12f3ac8-5b30-451b-9b19-d33576136229 info HandlerMiddleware: DeserializedParameters={"options":{"leaseAccessConditions":{},"cpkInfo":{},"modifiedAccessConditions":{}}}
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 info BlobHandler:downloadBlockBlobOrAppendBlob() NormalizedDownloadRange=bytes=0-8 RequiredContentLength=9
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 debug OperationQueue.operate() Schedule incoming job 1f602f89-7af0-4f9c-91ec-b45a46f8b630
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 debug OperationQueue:execute() Current runningConcurrency:0 maxConcurrency:100 operations.length:1
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 verbose FSExtentStore:readExtent() Creating read stream. LocationId:Default extentId:31ef7206-f4ee-466d-be40-94f3255793c8 path:c:\temp\Azurite\__blobstorage__\31ef7206-f4ee-466d-be40-94f3255793c8 offset:0 count:9 end:8
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 debug OperationQueue.operate() Job 1f602f89-7af0-4f9c-91ec-b45a46f8b630 completes callback, resolve.
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 verbose SerializerMiddleware: Start serializing...
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 info Serializer: Start returning stream body.
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 debug OperationQueue:execute() Current runningConcurrency:0 maxConcurrency:100 operations.length:0
2024-05-30T05:18:58.985Z e12f3ac8-5b30-451b-9b19-d33576136229 debug OperationQueue:execute() return. Operation.length === 0
2024-05-30T05:18:58.986Z e12f3ac8-5b30-451b-9b19-d33576136229 info EndMiddleware: End response. TotalTimeInMS=3 StatusCode=200 StatusMessage=OK Headers={"server":"Azurite-Blob/3.30.0","last-modified":"Thu, 30 May 2024 05:18:00 GMT","x-ms-creation-time":"Thu, 30 May 2024 05:18:00 GMT","content-length":"9","content-type":"application/octet-stream","etag":"\"0x1EA1684C8C5FF50\"","content-md5":"63M6AMDJ0zbmVpGjerVCkw==","x-ms-blob-type":"BlockBlob","x-ms-lease-state":"available","x-ms-lease-status":"unlocked","x-ms-request-id":"e12f3ac8-5b30-451b-9b19-d33576136229","x-ms-version":"2024-05-04","accept-ranges":"bytes","date":"Thu, 30 May 2024 05:18:58 GMT","x-ms-server-encrypted":"true","x-ms-blob-content-md5":"63M6AMDJ0zbmVpGjerVCkw=="}
2024-05-30T05:18:58.986Z e12f3ac8-5b30-451b-9b19-d33576136229 verbose FSExtentStore:readExtent() Read stream closed. LocationId:Default extentId:31ef7206-f4ee-466d-be40-94f3255793c8 path:c:\temp\Azurite\__blobstorage__\31ef7206-f4ee-466d-be40-94f3255793c8 offset:0 count:9 end:8
blueww commented 2 weeks ago

@viswanr I will close this issue as not get further responds from you. Feel free to contact us again if need any further assistance on Azurite.