Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.09k stars 1.2k forks source link

[storage] blobClient.download() creates unique url and content is never retrieved from browser cache #24348

Open rafagsiqueira opened 1 year ago

rafagsiqueira commented 1 year ago

For a non-public storage account, we are using azure storage sdk to download blobs using AD access token credential. However, every time a call is made to blobClient.download(), the URL of the blob is appended with a _=. This prevents the browser from retrieving the blob from its cache.

Is there an option to avoid appending this to the query string? Perhaps sending it as a header?

I have tried setting CacheControl on the blobs, and also retrieving blobs by specific versionId, but that also results in a URL that is not cached by the browser.

rafagsiqueira commented 1 year ago

A workaround is downloading the blob using Bearer Token authentication and a different httpclient:

public downloadBlob(container: string, blobName: string): Observable<any> {
    return of(this.blobServiceClient.getContainerClient(container)).pipe(
      map((containerClient: ContainerClient) => containerClient.getBlockBlobClient(blobName.toLowerCase())),
      map((blobClient: BlobClient) => blobClient.url),
      concatMap((url: string) => from(this.http.get(url, {responseType: 'blob', headers: {'x-ms-version': '2021-08-06'}}))),
      map((blob) => ({blob, name: blobName, type: blob.type})),
      catchError((err) => {
        console.error(err);
        return of(null);
      })
    )
  }

The x-ms-version header has to be present, for Bearer Token authentication to work. The bearer token was injected into the httpclient. This sample is from Angular code.

xirzec commented 1 year ago

@rafagsiqueira if I am correct in understanding you, you are seeing the download URL have a query parameter appended to it named _ that has a timestamp value?

This sounds like the logic here: https://github.com/Azure/azure-sdk-for-js/blob/28b2aa281227b2f1b50f68cb1ac332901c353219/sdk/storage/storage-blob/src/policies/StorageBrowserPolicy.ts#L27

Can you give some more details about how you are generating the download URL? Is everything happening in the browser or are you using the SDK server-side as well?

rafagsiqueira commented 1 year ago

@xirzec you are correct. The download was originally done by the blobclient download method (blobClient.download()). I was not generating the download URL myself. After realizing the files were never retrieved from cache, I inspected the requests on my browser developer console and realized the URL had that timestamp appended, which was preventing the browser from retrieving from cache, regardless of the Cache-Control header. Everything is happening in the browser, this is not on the server-side.

xirzec commented 1 year ago

@rafagsiqueira I think you could manage this by tweaking the pipeline a bit to remove the cache-busting policy, perhaps something like:

const pipeline = newPipeline(credential);
const policyIndex = pipeline.factories.findIndex(factory => factory instanceof StorageBrowserPolicyFactory);
if (policyIndex > -1) {
  pipeline.factories.splice(policyIndex, 1);
}
const blobServiceClient = new BlobServiceClient(
    `https://${account}.blob.core.windows.net`,
    pipeline
);
xirzec commented 1 year ago

@EmmaZhu perhaps we should have an option in StoragePipelineOptions to disable the cache busting for browsers?

rafagsiqueira commented 1 year ago

@xirzec What about this parameter URLConstants.Parameters.FORCE_BROWSER_NO_CACHE? Is the default value true? https://github.com/Azure/azure-sdk-for-js/blob/28b2aa281227b2f1b50f68cb1ac332901c353219/sdk/storage/storage-blob/src/policies/StorageBrowserPolicy.ts#L52

jeremymeng commented 1 year ago

@rafagsiqueira it is just a constant string "_", the name of the query parameter that will have the timestamp value https://github.com/Azure/azure-sdk-for-js/blob/48c3ad1014d04b6cd17d8d3464046430e1118a92/sdk/storage/storage-blob/src/utils/constants.ts#L22

rafagsiqueira commented 1 year ago

The naming of the constant indicates the _ querystring parameter was created precisely to force the browser not to cache. I agree with your suggestion that there should be a way to disable this querystring parameter.

EmmaZhu commented 1 year ago

@xirzec

Echo your suggestion. Seems we'd need to add an option to disable the default behavior.