Closed johnwheffner closed 3 years ago
The following unit test demonstrates the issue more directly
diff --git a/sdk/core/azure-core/src/http/curl/curl_connection_pool_private.hpp b/sdk/core/azure-core/src/http/curl/curl_connection_pool_private.hpp
index e9c2be4..6f1495e 100644
--- a/sdk/core/azure-core/src/http/curl/curl_connection_pool_private.hpp
+++ b/sdk/core/azure-core/src/http/curl/curl_connection_pool_private.hpp
@@ -24,6 +24,7 @@
// Define the class name that reads from ConnectionPool private members
namespace Azure { namespace Core { namespace Test {
class CurlConnectionPool_connectionPoolTest_Test;
+ class CurlConnectionPool_pooledConnectionClosed_Test;
}}} // namespace Azure::Core::Test
#endif
@@ -40,6 +41,7 @@ namespace Azure { namespace Core { namespace Http {
#if defined(TESTING_BUILD)
// Give access to private to this tests class
friend class Azure::Core::Test::CurlConnectionPool_connectionPoolTest_Test;
+ friend class Azure::Core::Test::CurlConnectionPool_pooledConnectionClosed_Test;
#endif
public:
/**
diff --git a/sdk/core/azure-core/src/http/curl/curl_connection_private.hpp b/sdk/core/azure-core/src/http/curl/curl_connection_private.hpp
index 906fe68..14273e6 100644
--- a/sdk/core/azure-core/src/http/curl/curl_connection_private.hpp
+++ b/sdk/core/azure-core/src/http/curl/curl_connection_private.hpp
@@ -15,6 +15,14 @@
#include <curl/curl.h>
#include <string>
+#if defined(TESTING_BUILD)
+// Define the class name that reads from ConnectionPool private members
+namespace Azure { namespace Core { namespace Test {
+ class CurlConnectionPool_connectionPoolTest_Test;
+ class CurlConnectionPool_pooledConnectionClosed_Test;
+}}} // namespace Azure::Core::Test
+#endif
+
namespace Azure { namespace Core { namespace Http {
namespace Details {
@@ -83,6 +91,11 @@ namespace Azure { namespace Core { namespace Http {
* @brief CURL HTTP connection.
*/
class CurlConnection : public CurlNetworkConnection {
+#if defined(TESTING_BUILD)
+ // Give access to private to this tests class
+ friend class Azure::Core::Test::CurlConnectionPool_pooledConnectionClosed_Test;
+#endif
+
private:
CURL* m_handle;
curl_socket_t m_curlSocket;
diff --git a/sdk/core/azure-core/src/http/curl/curl_session_private.hpp b/sdk/core/azure-core/src/http/curl/curl_session_private.hpp
index 49bd8d9..1355c06 100644
--- a/sdk/core/azure-core/src/http/curl/curl_session_private.hpp
+++ b/sdk/core/azure-core/src/http/curl/curl_session_private.hpp
@@ -23,6 +23,7 @@
// Define the class name that reads from ConnectionPool private members
namespace Azure { namespace Core { namespace Test {
class CurlConnectionPool_connectionPoolTest_Test;
+ class CurlConnectionPool_pooledConnectionClosed_Test;
}}} // namespace Azure::Core::Test
#endif
@@ -43,6 +44,7 @@ namespace Azure { namespace Core { namespace Http {
#ifdef TESTING_BUILD
// Give access to private to this tests class
friend class Azure::Core::Test::CurlConnectionPool_connectionPoolTest_Test;
+ friend class Azure::Core::Test::CurlConnectionPool_pooledConnectionClosed_Test;
#endif
private:
/**
diff --git a/sdk/core/azure-core/test/ut/curl_connection_pool.cpp b/sdk/core/azure-core/test/ut/curl_connection_pool.cpp
index 86ab67a..818555b 100644
--- a/sdk/core/azure-core/test/ut/curl_connection_pool.cpp
+++ b/sdk/core/azure-core/test/ut/curl_connection_pool.cpp
@@ -10,6 +10,8 @@
#include "azure/core/http/curl/curl.hpp"
#endif
+#include <sys/socket.h>
+
#include <iostream>
#include <string>
#include <thread>
@@ -172,5 +174,47 @@ namespace Azure { namespace Core { namespace Test {
#endif
}
+ TEST(CurlConnectionPool, pooledConnectionClosed)
+ {
+ Azure::Core::Http::CurlConnectionPool::ClearIndex();
+ // Make sure there are nothing in the pool
+ EXPECT_EQ(Azure::Core::Http::CurlConnectionPool::ConnectionPoolIndex.size(), 0);
+
+ std::vector<std::unique_ptr<Azure::Core::Http::HttpPolicy>> policies;
+ policies.emplace_back(std::make_unique<Azure::Core::Http::TransportPolicy>());
+ Azure::Core::Http::HttpPipeline pipeline(policies);
+
+ Azure::Core::Http::Url url("http://httpbin.org/get");
+ Azure::Core::Http::Request request(Azure::Core::Http::HttpMethod::Get, url);
+
+ {
+ std::unique_ptr<Azure::Core::Http::RawResponse> response;
+ ASSERT_NO_THROW(response = pipeline.Send(Azure::Core::GetApplicationContext(), request));
+ auto responseCode = response->GetStatusCode();
+ int expectedCode = 200;
+ EXPECT_EQ(expectedCode, static_cast<int>(responseCode));
+ }
+
+ // Check that after the connection is gone, it is moved back to the pool
+ ASSERT_EQ(Azure::Core::Http::CurlConnectionPool::ConnectionPoolIndex.size(), 1);
+ auto connectionFromPool = Azure::Core::Http::CurlConnectionPool::ConnectionPoolIndex.begin()
+ ->second.begin()
+ ->get();
+
+ {
+ auto curlConnection = dynamic_cast<Azure::Core::Http::CurlConnection *>(connectionFromPool);
+ ASSERT_NE(nullptr, curlConnection);
+ ::shutdown(curlConnection->m_curlSocket, SHUT_RDWR);
+ }
+
+ {
+ std::unique_ptr<Azure::Core::Http::RawResponse> response;
+ ASSERT_NO_THROW(response = pipeline.Send(Azure::Core::GetApplicationContext(), request));
+ auto responseCode = response->GetStatusCode();
+ int expectedCode = 200;
+ EXPECT_EQ(expectedCode, static_cast<int>(responseCode));
+ }
+ }
+
#endif
}}} // namespace Azure::Core::Test
An approach that's worked for me is upon removing from the pool to first check the connection's liveness by doing a recv
with MSG_PEEK
; if this returns success or EAGAIN
then use the connection, otherwise discard and try again immediately.
I have created a draft PR for this.
The problem with this fix, is that we still have to pay the performance implication of calling recv()
for every connection, even when it might be valid already. So, we are basically moving the check from the retry policy to the connection pool.
I wonder if Azurite can be configurable to change the time for closing idle connections?
The syscall overhead is about 1 usec, a tax I'm happy enough to pay in this context. The other option would be to handle this condition in the error path when using the connection, which is arguably more correct but does not encapsulate the complexity.
I'm not sure whether the Azurite timeout can be tuned -- I didn't find anything when I looked, but I didn't look too hard. However, it's not too surprising for other systems like proxies to close idle connections.
@johnwheffner what you mentioned made me remember that the curl transport adapter implementation should be taking care of getting a new connection if the one that it used initially didn't work. Since you have the repro env already, can you tell me what do you get from this line: https://github.com/Azure/azure-sdk-for-cpp/blob/master/sdk/core/azure-core/src/http/curl/curl.cpp#L178
while reproducing the issue? I would expect libcurl to return CURLE_UNSUPPORTED_PROTOCOL
because the connection is closed and request a new connection from the pool.
Updating to a new PR where the curl transport adapter will react to CURLE_SEND_ERROR
and request a new connection.
@johnwheffner I was expecting CURLE_UNSUPPORTED_PROTOCOL
but it looks like CURLE_SEND_ERROR
will also be used when the data transfer didn't work.
I tried out this change and unfortunately it doesn't solve the problem. When the remote server closes the connection, it puts the client connection into a CLOSE_WAIT
state, in which writes to the socket are still allowed. So sending the request will not return CURLE_SEND_ERROR
immediately, and the error will only be surfaced later while trying to read the response.
You can see this in the unit test by changing the CurlConnection::Shutdown
call to use SHUT_RD
instead of SHUT_RDWR
.
Idle connections sit in a pool for reuse, for up to 90 seconds. Some servers are pretty eager to close idle connections. In particular, the Azurite storage emulator uses a 5 seconds idle timeout. Connections that are closed by the server while sitting in the idle pool sit in a FIN_WAIT2 state on the client side (and CLOSE_WAIT on the storage service side).
When such a connection gets picked for use, an error will occur after sending out a request header, when waiting for a response. (When the remote end gets the header data, it'll send back a TCP RST.)
This error surfaces up through the SDK pipeline. The upper layers will retry, but (a) we don't want to pay the back-off and retry cost, and (b) if we have enough such idle connections -- more than a few by default, it causes operation failure.
To reproduce, on a build using the curl client, run an Azurite storage emulator, establish at least four concurrent requests, close them all, wait for >5 seconds then make another request, reusing the pooled curl connections. The connections will all have been closed by the remote side, and the request will fail after retrying three times.