Azure / azure-storage-php

Microsoft Azure Storage Library for PHP
MIT License
217 stars 198 forks source link

SSL errors / Unreliable #216

Open Vandersteen opened 4 years ago

Vandersteen commented 4 years ago

The blob service seems unreliable, it fails quite often. We are dealing with files ranging between 1-10mb.

Our environment is an AKS cluster (CNI). Running in a Vnet where the Microsoft.Storage Service endpoint has been added

Now strangely, these same kind of errors sometimes also occur for other azure services (Like mysql). But blob has the most issues with it.

Which service(blob, file, queue, table) does this issue concern?

Blob

Which version of the SDK was used?

1.4

What's the PHP/OS version?

7.3 / Debian

What problem was encountered?

GuzzleHttp\Exception\ConnectException: cURL error 35: OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to nalintstgstor1.blob.core.windows.net:443  (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)
#19 vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(200): createRejection
#18 vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(155): finishError
#17 vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(105): finish
#16 vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(187): processMessages
#15 vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(116): tick
#14 vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(131): execute
#13 vendor/guzzlehttp/promises/src/Promise.php(246): invokeWaitFn
#12 vendor/guzzlehttp/promises/src/Promise.php(223): waitIfPending
#11 vendor/guzzlehttp/promises/src/Promise.php(267): invokeWaitList
#10 vendor/guzzlehttp/promises/src/Promise.php(225): waitIfPending
#9 vendor/guzzlehttp/promises/src/Promise.php(62): wait
#8 vendor/microsoft/azure-storage-blob/src/Blob/BlobRestProxy.php(1806): createBlockBlob
Jan 27, 2020 8:16:33 AM UTC

MicrosoftAzure\Storage\Common\Exceptions\ServiceException: Fail:
Code: 500
Value: Operation could not be completed within the specified time.
details (if any): <?xml version="1.0" encoding="utf-8"?><Error><Code>OperationTimedOut</Code><Message>Operation could not be completed within the specified time.
RequestId:d6c7bd63-601e-00b1-5ae9-d43fbe000000
Time:2020-01-27T08:16:33.1239417Z</Message></Error>.
#21 vendor/microsoft/azure-storage-common/src/Common/Internal/ServiceRestProxy.php(490): throwIfError
#20 vendor/microsoft/azure-storage-common/src/Common/Internal/ServiceRestProxy.php(406): MicrosoftAzure\Storage\Common\Internal\{closure}
#19 vendor/guzzlehttp/promises/src/Promise.php(203): callHandler
#18 vendor/guzzlehttp/promises/src/Promise.php(174): GuzzleHttp\Promise\{closure}
#17 vendor/guzzlehttp/promises/src/RejectedPromise.php(40): GuzzleHttp\Promise\{closure}
#16 vendor/guzzlehttp/promises/src/TaskQueue.php(47): run
#15 vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(104): tick
#14 vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(131): execute
#13 vendor/guzzlehttp/promises/src/Promise.php(246): invokeWaitFn
#12 vendor/guzzlehttp/promises/src/Promise.php(223): waitIfPending
#11 vendor/guzzlehttp/promises/src/Promise.php(267): invokeWaitList
#10 vendor/guzzlehttp/promises/src/Promise.php(225): waitIfPending
#9 vendor/guzzlehttp/promises/src/Promise.php(62): wait
#8 vendor/microsoft/azure-storage-blob/src/Blob/BlobRestProxy.php(1806): createBlockBlob

Steps to reproduce the issue?

Upload a blob?

Have you found a mitigation/solution?

No

Is there a failing request ID related to this problem returned by server? What is it?

RequestId:d6c7bd63-601e-00b1-5ae9-d43fbe000000 Time:2020-01-27T08:16:33.1239417Z

RequestId:c1e17fad-901e-0063-382e-c181e6000000 Time:2020-01-02T05:39:04.4147653Z

RequestId:85dad7d7-e01e-00c6-761d-b6baff000000 Time:2019-12-19T03:39:27.0651824Z

What is the storage account name and time frame of your last reproduce? (UTC YYYY/MM/DD hh:mm:ss)

(If you think some of the information should not be shared publicly, you can e-mail the main Microsoft contributors of the repository instead.)

XiaoningLiu commented 4 years ago

Hi @Vandersteen

Sorry for the late response, due to the holiday.

For your questions:

  1. OperationTimedOut or 500 error code means something wrong with Azure Storage service. SDK cannot do much in this case, besides making sure retry policy is enabled.

  2. SSL error code 35 means "A problem occurred somewhere in the SSL/TLS handshake. You really want the error buffer and read the message there as it pinpoints the problem slightly more. Could be certificates (file formats, paths, permissions), passwords, and others." https://curl.haxx.se/libcurl/c/libcurl-errors.html

If the SSL error keeps happening for every request, it means some thing wrong with the SDK cert or security configuration.

If the SSL error randomly happens, it means the secure network connection cannot established in Open SSL layer which beyond SDK's control. It's difficulty to debug, you try to use HTTP endpoint to see the error gone or not.

Vandersteen commented 4 years ago
  1. OperationTimedOut or 500 error code means something wrong with Azure Storage service. SDK cannot do much in this case, besides making sure retry policy is enabled.

Is it enabled by default ?

  1. SSL error code 35 means "A problem occurred somewhere in the SSL/TLS handshake. You really want the error buffer and read the message there as it pinpoints the problem slightly more. Could be certificates (file formats, paths, permissions), passwords, and others." https://curl.haxx.se/libcurl/c/libcurl-errors.html

If the SSL error keeps happening for every request, it means some thing wrong with the SDK cert or security configuration.

If the SSL error randomly happens, it means the secure network connection cannot established in Open SSL layer which beyond SDK's control. It's difficulty to debug, you try to use HTTP endpoint to see the error gone or not.

Should I write custom middleware to enable CURLOPT_ERRORBUFFER ?

spelltwister commented 4 years ago

I see the same problem using the C# stuff. We get bursts of SSL connection errors and the request shows as Faulted in app insights.

XiaoningLiu commented 4 years ago

Feel free to customize middlewares to do what you want, but not sure CURLOPT_ERRORBUFFER will help in this case.

SDK doesn't enforce retry by default, need to manually initialize retry middleware instance.

@spelltwister Curious any limitations from SSL connection from Azure.

Vandersteen commented 4 years ago

I want to confirm that adding retries did help with the amount of errors we received. It is not fully gone but it did help.