Azure / azure-sdk-for-cpp

This repository is for active development of the Azure SDK for C++. For consumers of the SDK we recommend visiting our versioned developer docs at https://azure.github.io/azure-sdk-for-cpp.
MIT License
170 stars 118 forks source link

_CrtIsValidHeapPointer(block) assert in curl CleanupThread #5579

Open ArneWouters opened 1 month ago

ArneWouters commented 1 month ago

Describe the bug We are getting a _CrtIsValidHeapPointer(block) assertion when using the SDK with curl on windows. The issue only occurs when the CleanupThread function is executed. At first we thought the issue was due to the library not using our custom memory allocators so we fixed that by setting them at the beginning of the program. Looking at the Stack Trace, it seems that the free is not using our custom free though. We're not sure why that is happening. Curl is definitely using our custom alloc and free functions during the executions before the cleanup. We have also noticed that the SDK statically initializes curl which could lead to a deadlock on windows according to the documentation of curl_global_init.

Exception or Stack Trace

    ntdll.dll!RtlpBreakPointHeap()  Unknown
    ntdll.dll!RtlpValidateHeapEntry()   Unknown
    ntdll.dll!RtlValidateHeap() Unknown
    KernelBase.dll!00007ffe14248e1b()   Unknown
>   ucrtbased.dll!_CrtIsValidHeapPointer(const void * block) Line 1407  C++
    ucrtbased.dll!free_dbg_nolock(void * const block, const int block_use) Line 904 C++
    ucrtbased.dll!_free_dbg(void * block, int block_use) Line 1030  C++
    ucrtbased.dll!free(void * block) Line 32    C++
    libcurl-d.dll!schannel_session_free(void * ptr) Line 2462   C
    libcurl-d.dll!Curl_ssl_kill_session(Curl_ssl_session * session) Line 610    C
    libcurl-d.dll!Curl_ssl_close_all(Curl_easy * data) Line 759 C
    libcurl-d.dll!Curl_close(Curl_easy * * datap) Line 268  C
    libcurl-d.dll!curl_easy_cleanup(Curl_easy * data) Line 812  C
    azure-core-1.0.0d.dll!Azure::Core::_internal::UniqueHandleDeleter<void,&curl_easy_cleanup>::operator()(void * obj) Line 57  C++
    azure-core-1.0.0d.dll!std::unique_ptr<void,Azure::Core::_internal::UniqueHandleDeleter<void,&curl_easy_cleanup>>::~unique_ptr<void,Azure::Core::_internal::UniqueHandleDeleter<void,&curl_easy_cleanup>>() Line 3252    C++
    azure-core-1.0.0d.dll!Azure::Core::Http::CurlConnection::~CurlConnection() Line 185 C++
    azure-core-1.0.0d.dll!Azure::Core::Http::CurlConnection::`scalar deleting destructor'(unsigned int) C++
    azure-core-1.0.0d.dll!std::default_delete<Azure::Core::Http::CurlNetworkConnection>::operator()(Azure::Core::Http::CurlNetworkConnection * _Ptr) Line 3141  C++
    azure-core-1.0.0d.dll!std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>::~unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>() Line 3254  C++
    azure-core-1.0.0d.dll!std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>::`scalar deleting destructor'(unsigned int)   C++
    azure-core-1.0.0d.dll!std::_Default_allocator_traits<std::allocator<std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>>>::destroy<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>>(std::allocator<std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>> & __formal, std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>> * const _Ptr) Line 693  C++
    azure-core-1.0.0d.dll!std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>::_Freenode<std::allocator<std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>>>(std::allocator<std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>> & _Al, std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *> * _Ptr) Line 317   C++
    azure-core-1.0.0d.dll!std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>::_Free_non_head<std::allocator<std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>>>(std::allocator<std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *>> & _Al, std::_List_node<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,void *> * _Head) Line 329 C++
    azure-core-1.0.0d.dll!std::list<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,std::allocator<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>>>::_Tidy() Line 1424 C++
    azure-core-1.0.0d.dll!std::list<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,std::allocator<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>>>::~list<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>,std::allocator<std::unique_ptr<Azure::Core::Http::CurlNetworkConnection,std::default_delete<Azure::Core::Http::CurlNetworkConnection>>>>() Line 1036    C++
    azure-core-1.0.0d.dll!`anonymous namespace'::CleanupThread() Line 274   C++
    azure-core-1.0.0d.dll!std::invoke<void (__cdecl*)(void)>(void(*)() && _Obj) Line 1549   C++
    azure-core-1.0.0d.dll!std::thread::_Invoke<std::tuple<void (__cdecl*)(void)>,0>(void * _RawVals) Line 56    C++
    ucrtbased.dll!thread_start<unsigned int (__cdecl*)(void *),1>(void * const parameter) Line 97   C++
    kernel32.dll!00007ffe14544cb0() Unknown
    ntdll.dll!RtlUserThreadStart()  Unknown

To Reproduce We build the SDK with vcpkg, we have tried linking curl statically and dynamically into the SDK, both are giving us this problem. We employ an in house memory allocation solution. As a workaround we are currently using the SDK with WinHTTP, which works fine.

Code Snippet

curl_global_cleanup();
curl_global_init_mem(CURL_GLOBAL_ALL, om_malloc, om_free, om_realloc, om_strdup, om_calloc);

auto clientSecret = "--clientSecret--";
auto clientId = "--clientId--";
auto tenantId = "--tenantId--";
auto keyVaultUrl = "--url--";
auto secretName = "mySecret";

auto credential = std::make_shared<Azure::Identity::ClientSecretCredential>(tenantId, clientId, clientSecret);
auto secretClient = Azure::Security::KeyVault::Secrets::SecretClient(keyVaultUrl, credential);

auto secret = secretClient.GetSecret(secretName).Value;
auto secretValue = secret.Value.Value();

for (;;)
{
}

Expected behavior We expect it to work without throwing an exception.

Screenshots N/A

Setup (please complete the following information):

Additional context Logging output

29/04/2024 07:32:54.587       HTTP Request : POST https://login.microsoftonline.com/REDACTED/oauth2/v2.0/token
   content-length : 178
   content-type : application/x-www-form-urlencoded
   host : REDACTED
   user-agent : azsdk-cpp-identity/1.6.0 (Windows Server 2022 Datacenter 6.3 20348 20348.1.amd64fre.fe_release.210507-1500)
   x-ms-client-request-id : aabdad55-2ab2-4ce9-b225-c96393ae7b5a
29/04/2024 07:32:57.008       [CURL Transport Adapter]: Creating a new session.
29/04/2024 07:33:03.570       [CURL Transport Adapter]: Spawn new connection.
29/04/2024 07:34:12.005       [CURL Transport Adapter]: Send request without payload
29/04/2024 07:34:12.005       [CURL Transport Adapter]: Parse server response
29/04/2024 07:34:12.140       [CURL Transport Adapter]: Request completed. Moving response out of session and session to response.
29/04/2024 07:34:12.140       Moving connection to pool...
29/04/2024 07:34:12.140       Start clean thread
29/04/2024 07:34:12.141       HTTP Response (77553ms) : 200 OK
   cache-control : no-store, no-cache
   content-length : 1313
   content-type : application/json; charset=utf-8
   date : Mon, 29 Apr 2024 05:34:11 GMT
   expires : -1
   p3p : REDACTED
   pragma : no-cache
   set-cookie : REDACTED
   strict-transport-security : REDACTED
   x-content-type-options : REDACTED
   x-ms-ests-server : REDACTED
   x-ms-request-id : b307215f-9681-4daf-bf27-8bbea7ac3500
   x-ms-srs : REDACTED
   x-xss-protection : REDACTED
29/04/2024 07:34:12.141       HTTP status code 200 won't be retried.
29/04/2024 07:34:12.142       HTTP Request : GET --url--/secrets/mySecret?api-version=7.3
   authorization : REDACTED
   content-type : application/json
   user-agent : azsdk-cpp-keyvault-secrets/4.2.1 (Windows Server 2022 Datacenter 6.3 20348 20348.1.amd64fre.fe_release.210507-1500)
   x-ms-client-request-id : 2e5bbd75-bd11-4292-a534-ca51cebe1222
29/04/2024 07:34:12.142       [CURL Transport Adapter]: Creating a new session.
29/04/2024 07:34:12.142       [CURL Transport Adapter]: Spawn new connection.
29/04/2024 07:34:15.271       [CURL Transport Adapter]: No Host in request headers. Adding it
29/04/2024 07:34:15.271       [CURL Transport Adapter]: Send request without payload
29/04/2024 07:34:15.271       [CURL Transport Adapter]: Parse server response
29/04/2024 07:34:15.308       [CURL Transport Adapter]: Request completed. Moving response out of session and session to response.
29/04/2024 07:34:15.308       Moving connection to pool...
29/04/2024 07:34:15.308       Clean thread running. Won't start a new one.
29/04/2024 07:34:15.308       HTTP Response (3165ms) : 200 OK
   cache-control : no-cache
   content-length : 343
   content-type : application/json; charset=utf-8
   date : Mon, 29 Apr 2024 05:34:14 GMT
   expires : -1
   pragma : no-cache
   strict-transport-security : REDACTED
   x-content-type-options : REDACTED
   x-ms-client-request-id : 2e5bbd75-bd11-4292-a534-ca51cebe1222
   x-ms-keyvault-network-info : REDACTED
   x-ms-keyvault-region : REDACTED
   x-ms-keyvault-service-version : REDACTED
   x-ms-request-id : 3bfb75f6-6459-4598-9434-f57b8329caba
29/04/2024 07:34:15.308       HTTP status code 200 won't be retried.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

github-actions[bot] commented 1 month ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

LarryOsterman commented 1 month ago

I believe that the root cause of this may be in curl.

The code which called free is from lib\vtls\schannel.c.

Note the call to curlx_unicodefree below:

static void schannel_session_free(void *sessionid, size_t idsize)
{
  /* this is expected to be called under sessionid lock */
  struct Curl_schannel_cred *cred = sessionid;

  (void)idsize;
  if(cred) {
    cred->refcount--;
    if(cred->refcount == 0) {
      s_pSecFn->FreeCredentialsHandle(&cred->cred_handle);
      curlx_unicodefree(cred->sni_hostname);
#ifdef HAS_CLIENT_CERT_PATH
      if(cred->client_cert_store) {
        CertCloseStore(cred->client_cert_store, 0);
        cred->client_cert_store = NULL;
      }
#endif
      Curl_safefree(cred);
    }
  }
}

Looking at that code (in the file curl_multibyte.h) we find the following code:

#define curlx_unicodefree(ptr)                          \
  do {                                                  \
    if(ptr) {                                           \
      (free)(ptr);                                      \
      (ptr) = NULL;                                     \
    }                                                   \
  } while(0)

I believe that unilaterally calls the free function, not the user configured free function.

So I strongly suspect that the root cause of this issue is in libcurl and not in the Azure SDK for C++.

But I'm more than willing to be proven wrong.

LarryOsterman commented 1 month ago

Looking deeper, it appears that I may be wrong - the cred->sni_hostname is created using curlx_convert_UTF8_to_tchar which uses malloc directly and not the custom allocator.

However Curl_safefree may have a similar problem, which is more likely to be the problem.

ArneWouters commented 1 month ago

Ah, it indeed mentions at the top of curl_multibyte.h that all curlx functions do not use the configured memory allocators. I completely missed that but that already explains one thing. So far I tried to use the internal memory debugging tools from curl but I haven't gotten them to work with our vcpkg pipeline yet.

LarryOsterman commented 1 month ago

Ok, given this, I believe we can close this out, unless there is something else we can help you with.

ArneWouters commented 1 month ago

Still having the crash that we can't explain though, is there anything that we can still investigate or try? We can also log an issue on the curl github if you think this is not related to the SDK.

LarryOsterman commented 1 month ago

I'm not 100% sure, to be honest. The symptoms are those of a heap corruption, but it's not clear where the corruption is coming from.

Have you tried using AddressSanitizer on your reproduction scenario? I've found it's incredibly helpful for this kind of problem.