aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.97k stars 1.06k forks source link

[aws-cpp-sdk-s3-crt] S3CrtRequestFinishCallback race condition in releasing the s3 meta request #2024

Closed grrtrr closed 2 years ago

grrtrr commented 2 years ago

Describe the bug

This was found when running unit tests with the clang thread sanitizer.

Expected Behavior

Clean shutdown, no race conditions.

Current Behavior

We saw the following race condition(s):

WARNING: ThreadSanitizer: data race (pid=12)
  Read of size 8 at 0x7b30000021b8 by thread T8:
    #0 S3CrtRequestFinishCallback(aws_s3_meta_request*, aws_s3_meta_request_result const*, void*) external/com_github_aws-sdk-cpp/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp:384 (libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-s3-crt.so+0x34466a)
    #1 aws_s3_meta_request_finish_default external/aws-c-s3/source/s3_meta_request.c:1442 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x327bb)
    #2 aws_s3_meta_request_finish external/aws-c-s3/source/s3_meta_request.c:1363 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x32125)
    #3 s_s3_meta_request_default_update external/aws-c-s3/source/s3_default_meta_request.c:213 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x27b4c)
    #4 aws_s3_meta_request_update external/aws-c-s3/source/s3_meta_request.c:430 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x2e5c2)
    #5 aws_s3_client_update_meta_requests_threaded external/aws-c-s3/source/s3_client.c:1237 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x1d0bd)
    #6 s_s3_client_process_work_default external/aws-c-s3/source/s3_client.c:1076 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x2038e)
    #7 s_s3_client_process_work_task external/aws-c-s3/source/s3_client.c:984 (libexternal_Saws-c-s3_Slibaws-c-s3.so+0x214af)
    #8 aws_task_run external/aws-c-common/source/task_scheduler.c:44 (libexternal_Saws-c-common_Slibaws-c-common.so+0x68ca0)
    #9 s_run_all external/aws-c-common/source/task_scheduler.c:249 (libexternal_Saws-c-common_Slibaws-c-common.so+0x69689)
    #10 aws_task_scheduler_run_all external/aws-c-common/source/task_scheduler.c:188 (libexternal_Saws-c-common_Slibaws-c-common.so+0x6a0ca)
    #11 s_main_loop external/aws-c-io/source/linux/epoll_event_loop.c:633 (libexternal_Saws-c-io_Slibaws-c-io.so+0x3642a)
    #12 thread_fn external/aws-c-common/source/posix/thread.c:137 (libexternal_Saws-c-common_Slibaws-c-common.so+0x61464)

  Previous write of size 8 at 0x7b30000021b8 by main thread:
    #0 Aws::S3Crt::S3CrtClient::PutObjectAsync(Aws::S3Crt::Model::PutObjectRequest const&, std::function<void (Aws::S3Crt::S3CrtClient const*, Aws::S3Crt::Model::PutObjectRequest const&, Aws::Utils::Outcome<Aws::S3Crt::Model::PutObjectResult, Aws::S3Crt::S3CrtError> const&, std::shared_ptr<Aws::Client::AsyncCallerContext const> const&)> const&, std::shared_ptr<Aws::Client::AsyncCallerContext const> const&) const external/com_github_aws-sdk-cpp/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp:592 (libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-s3-crt.so+0x349962)
    #1 Aws::S3Crt::S3CrtClient::PutObject(Aws::S3Crt::Model::PutObjectRequest const&) const external/com_github_aws-sdk-cpp/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp:605 (libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-s3-crt.so+0x349d76)
    #2 av::cloud::aws::s3::S3Ostreambuf::PutObject(std::basic_streambuf<char, std::char_traits<char> >*) cloud/aws/s3/s3_streambuf.cc:86 (libcloud_Saws_Ss3_Slibs3_Ustreambuf.so+0x23586)
    #3 av::cloud::aws::s3::S3Ostream::PutObject(std::basic_streambuf<char, std::char_traits<char> >*) cloud/aws/s3/s3_stream.hh:23 (libcloud_Saws_Ss3_Slibs3_Ublobstore_Uplugin_Ucpp.so+0x5707f)
    #4 av::cloud::aws::s3::S3OutputBlob::write_and_close(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) cloud/aws/s3/s3_blobstore_plugin.cc:305 (libcloud_Saws_Ss3_Slibs3_Ublobstore_Uplugin_Ucpp.so+0x4ea9d)
    #5 av::blobstore::write_blob(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) common/blobstore/blobstore.cc:91 (libcommon_Sblobstore_Slibblobstore_Uno_Ulocal.so+0x45814)
    #6 av::blobstore::(anonymous namespace)::S3BlobstorePluginTest_SmokeTest_Test::TestBody() cloud/aws/s3/s3_blobstore_plugin_test.cc:44 (s3_blobstore_plugin_test+0x4e9994)
    #7 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) external/com_google_googletest/googletest/src/gtest.cc:2580 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x14e30c)
    #8 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) external/com_google_googletest/googletest/src/gtest.cc:2616 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x128d2e)
    #9 testing::Test::Run() external/com_google_googletest/googletest/src/gtest.cc:2655 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x1071c1)
    #10 testing::TestInfo::Run() external/com_google_googletest/googletest/src/gtest.cc:2832 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x108221)
    #11 testing::TestSuite::Run() external/com_google_googletest/googletest/src/gtest.cc:2986 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x108f5e)
    #12 testing::internal::UnitTestImpl::RunAllTests() external/com_google_googletest/googletest/src/gtest.cc:5697 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x11c9a4)
    #13 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) external/com_google_googletest/googletest/src/gtest.cc:2580 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x15684c)
    #14 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) external/com_google_googletest/googletest/src/gtest.cc:2616 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x12d35e)
    #15 testing::UnitTest::Run() external/com_google_googletest/googletest/src/gtest.cc:5280 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x11c1db)
    #16 RUN_ALL_TESTS() external/com_google_googletest/googletest/include/gtest/gtest.h:2485 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest_Umain.so+0xdf7)

This is equally present in PutObjectAsync and GetObjectAsync, due to the same construction.

The problem is that (using GetObject as an example), the finish_callback is set in the options :

void S3CrtClient::InitCommonCrtRequestOption(CrtRequestCallbackUserData *userData, aws_s3_meta_request_options *options, const Aws::AmazonWebServiceRequest *request, const Aws::Http::URI &uri, Aws::Http::HttpMethod method) const
{
 // ...
  options->finish_callback = S3CrtRequestFinishCallback;
}

The options are then passed to aws_s3_client_make_meta_request, which returns a populated rawRequest.

void S3CrtClient::GetObjectAsync(const GetObjectRequest& request, const GetObjectResponseReceivedHandler& handler, const std::shared_ptr<const Aws::Client::AsyncCallerContext>& context) const
{ 
  // ...
  aws_s3_meta_request *rawRequest = aws_s3_client_make_meta_request(m_s3CrtClient, &options);
  userData->underlyingS3Request = rawRequest;
}

However, before the last line can be executed after aws_s3_client_make_meta_request returns, the finish_callback may already have been invoked, trying to release a rawRequest that is still set to the nullptr value:

static void S3CrtRequestFinishCallback(struct aws_s3_meta_request *meta_request,
    const struct aws_s3_meta_request_result *meta_request_result, void *user_data)
{ 
  // ...
  aws_s3_meta_request_release(userData->underlyingS3Request);
}

When userData->underlyingS3Request has not been set (nullptr value after new allocation), the call to aws_s3_meta_request_release becomes a no-op:

// aws-c-s3/source/s3_meta_request.c
void aws_s3_meta_request_release(struct aws_s3_meta_request *meta_request) {
    if (meta_request == NULL) {  // <== HERE
        return;
    }

    aws_ref_count_release(&meta_request->ref_count);
}

Back in GetObjectAsync, aws_s3_client_make_meta_request returns a meta request with a non-zero reference count which, as a result of the race condition, may never be decremented.

This can cause programs to hang at shutdown, which is what we are currently observing.

Reproduction Steps

Run unit tests under clang11 using the thread sanitizer.

Possible Solution

The userData struct needs a mutex to protect concurrent access to the aws_s3_client_make_meta_request.

Additional Information/Context

No response

AWS CPP SDK version used

1.9.x (1.9.170, but problem still exists on master).

Compiler and Version used

clang 11

Operating System and version

Linux, ubuntu 18.04

github-actions[bot] commented 2 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

sbiscigl commented 2 years ago

merged, thanks for the contribution!