Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.09k stars 1.2k forks source link

[Storage][File-share] intermittent SAS live tests failures #7076

Closed jeremymeng closed 3 years ago

jeremymeng commented 4 years ago

Only happening in all three windows build in the most recent run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=241573&view=ms.vss-test-web.build-test-results-tab

failures like

Shared Access Signature (SAS) generation Node.js only generateAccountSASQueryParameters should work
Failed 5h ago
Duration0:00:00.077
Failing build20200123.1
Ownernot available

Debug

Work items

Attachments

History

Attachments*

Error message
<Error>AuthorizationFailure<Message>This request is not authorized to perform this operation. RequestId:b75819c8-201a-0078-73ef-d1ab3f000000 Time:2020-01-23T13:16:01.7749146Z</Message></Error>

Stack trace
Error: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationFailure</Code><Message>This request is not authorized to perform this operation.
RequestId:b75819c8-201a-0078-73ef-d1ab3f000000
Time:2020-01-23T13:16:01.7749146Z</Message></Error>
    at new RestError (D:\a\1\s\sdk\core\core-http\lib\restError.ts:23:5)
    at D:\a\1\s\sdk\core\core-http\lib\policies\deserializationPolicy.ts:168:27
    at <anonymous>
    at process._tickDomainCallback (internal/process/next_tick.js:229:7)
jeremymeng commented 4 years ago

Although this test always passed

Shared Access Signature (SAS) generation Node.js only generateFileSASQueryParameters should work for file with access policy

It ran after all the failing tests (right after generateFileSASQueryParameters should work for file in the same file.

ljian3377 commented 4 years ago

https://dev.azure.com/azure-sdk/internal/_test/analytics?definitionId=159&contextType=build These tests succeeded except that three run in the last two weeks. Since it has been 12 days, can't find the verbose service log for more info.

According to @jiacfan, we once find an issue on SAS validation due to clock skew where the start time is not started yet. But we have no proof on what actually happened this time without the verbose log.

Need to repro.

mikeharder commented 4 years ago

Another repro on 2/25/20:

https://dev.azure.com/azure-sdk/internal/_build/results?buildId=271046&view=ms.vss-test-web.build-test-results-tab&runId=9094896&resultId=100188&paneView=debug

jeremymeng commented 4 years ago

Another one

RestError: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:2fb711e5-301a-0077-3433-ec989a000000
Time:2020-02-25T23:28:05.8770849Z</Message><AuthenticationErrorDetail>SAS identifier cannot be found for specified signed identifier</AuthenticationErrorDetail></Error>
    at new RestError (/home/vsts/work/1/s/sdk/core/core-http/lib/restError.ts:27:5)
    at /home/vsts/work/1/s/sdk/core/core-http/lib/policies/deserializationPolicy.ts:176:21
    at process._tickCallback (internal/process/next_tick.js:68:7)

https://dev.azure.com/azure-sdk/internal/_build/results?buildId=271232&view=ms.vss-test-web.build-test-results-tab&runId=9098166&resultId=100188&paneView=debug

mikeharder commented 4 years ago

I wonder if this could be a race condition between the calls to setAccessPolicy() and listFilesAndDirectories(), where the service has accepted the SAS policy but it hasn't been fully propagated yet, so there is a small delay where the SAS policy cannot be used yet.

https://github.com/Azure/azure-sdk-for-js/blob/0ce15cc/sdk/storage/storage-file-share/test/node/sas.spec.ts#L302-L330

https://github.com/Azure/azure-sdk-for-js/blob/21f43d5/sdk/storage/storage-file-share/recordings/node/shared_access_signature_sas_generation_nodejs_only/recording_generatefilesasqueryparameters_should_work_for_file_with_access_policy.js#L99-L121

If so, I think the best fix would be to do one of the following after calling setAccessPolicy():

  1. If the service has an API to list access policies, poll this until the service returns the new policy, then continue with the test.
  2. Otherwise, wrap the call to listFilesAndDirectories() in a retry loop, and only retry for this specific error message.
mikeharder commented 4 years ago

This still may be a bug in the service, depending on the expected behavior. Is the service supposed to guarantee a SAS policy can be used as soon as it is set?

mikeharder commented 4 years ago

According to the documentation:

When you establish a stored access policy on a container, table, queue, or share, it may take up to 30 seconds to take effect. During this interval, requests against a shared access signature that is associated with the stored access policy may fail with status code 403 (Forbidden), until the access policy becomes active.

So I think the best fix here is to wrap the call which uses the stored access policy in a retry loop, and retry say once every second up to 30 seconds, but only retry on HTTP 403.

mikeharder commented 4 years ago

Another repro on 2/27/20:

https://dev.azure.com/azure-sdk/internal/_build/results?buildId=273380&view=ms.vss-test-web.build-test-results-tab&runId=9144172&resultId=100188&paneView=debug

ljian3377 commented 4 years ago

Partially fixed by #7600. Still need to do Data Lake.

ramya-rao-a commented 3 years ago

Closing as we now have issues being logged for every new live test failure. This should cover any of the failures discussed in this issue