TileDB-Inc / TileDB

The Universal Storage Engine
https://tiledb.com
MIT License
1.84k stars 186 forks source link

S3 VFS: `init_client()` needs to happen under a _global_ mutex #2201

Closed depthwise closed 3 years ago

depthwise commented 3 years ago

While AWS SDK documentation states that S3 client is thread safe, initialization of S3 client doesn't seem to be thread safe. In particular, there's some global error variable that gets accessed when parsing JSON, so if you initialize two separate VFS instances from different threads, and TSAN is enabled, you will end up with a stack trace like the one below (truncated):

WARNING: ThreadSanitizer: data race (pid=12)
  Write of size 8 at 0x7fe93658a1b0 by thread T16 (mutexes: write M144250874682678808, write M150443169551677688):
    #0 cJSON_ParseWithOpts external/aws/aws-cpp-sdk-core/source/external/cjson/cJSON.cpp:1006 (libexternal_Saws_Slibaws.so+0x299bef)
    #1 Aws::Utils::Json::JsonValue::JsonValue(std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&) external/aws/aws-cpp-sdk-core/source/utils/json/JsonSerializer.cpp:39 (libexternal_Saws_Slibaws.so+0x32d09c)
    #2 Aws::Config::EC2InstanceProfileConfigLoader::LoadInternal() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:341 (libexternal_Saws_Slibaws.so+0x286d88)
    #3 Aws::Config::AWSProfileConfigLoader::Load() external/aws/aws-cpp-sdk-core/source/config/AWSProfileConfigLoader.cpp:47 (libexternal_Saws_Slibaws.so+0x282b4f)
    #4 Aws::Auth::InstanceProfileCredentialsProvider::Reload() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:265 (libexternal_Saws_Slibaws.so+0x235987)
    #5 Aws::Auth::InstanceProfileCredentialsProvider::RefreshIfExpired() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:283 (libexternal_Saws_Slibaws.so+0x23613b)
    #6 Aws::Auth::InstanceProfileCredentialsProvider::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProvider.cpp:250 (libexternal_Saws_Slibaws.so+0x23be9a)
    #7 Aws::Auth::AWSCredentialsProviderChain::GetAWSCredentials() external/aws/aws-cpp-sdk-core/source/auth/AWSCredentialsProviderChain.cpp:35 (libexternal_Saws_Slibaws.so+0x244550)
    #8 Aws::Client::AWSAuthV4Signer::AWSAuthV4Signer(std::shared_ptr<Aws::Auth::AWSCredentialsProvider> const&, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool) external/aws/aws-cpp-sdk-core/source/auth/AWSAuthSigner.cpp:170 (libexternal_Saws_Slibaws.so+0x2262ed)
    #9 std::shared_ptr<Aws::Client::AWSAuthV4Signer> Aws::MakeShared<Aws::Client::AWSAuthV4Signer, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool>(char const*, std::shared_ptr<Aws::Auth::DefaultAWSCredentialsProviderChain>&&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, Aws::Allocator<char> > const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&, bool&&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Saws_Slibaws.so+0x39a78b)
    #10 Aws::S3::S3Client::S3Client(Aws::Client::ClientConfiguration const&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool, Aws::S3::US_EAST_1_REGIONAL_ENDPOINT_OPTION) external/aws/aws-cpp-sdk-s3/source/S3Client.cpp:134 (libexternal_Saws_Slibaws.so+0x39a78b)
    #11 std::shared_ptr<Aws::S3::S3Client>::shared_ptr<Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(std::_Sp_alloc_shared_tag<Aws::Allocator<Aws::S3::S3Client> >, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/ext/new_allocator.h:147 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
    #12 std::shared_ptr<Aws::S3::S3Client> std::allocate_shared<Aws::S3::S3Client, Aws::Allocator<Aws::S3::S3Client>, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(Aws::Allocator<Aws::S3::S3Client> const&, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) /usr/include/c++/9/bits/shared_ptr.h:702 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
    #13 std::shared_ptr<Aws::S3::S3Client> Aws::MakeShared<Aws::S3::S3Client, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy, bool const&>(char const*, Aws::Client::ClientConfiguration&, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy&&, bool const&) external/aws/aws-cpp-sdk-core/include/aws/core/utils/memory/stl/AWSAllocator.h:106 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)
    #14 tiledb::sm::S3::init_client() const external/com_github_tiledb_tiledb/tiledb/sm/filesystem/s3.cc:1104 (libexternal_Scom_Ugithub_Utiledb_Utiledb_Slibtiledb.so+0x2879bd)

Other TSAN output suggests that "Location is global 'global_error' of size 16 at 0x7fe93658a1b0 (libexternal_Saws_Slibaws.so+0x0000007dc1b0)" is to blame.

I haven't looked into this too deeply, but it'd seem that if there is a global variable, then initialization of the client itself needs to be protected by a global mutex, i.e. something like

static std::mutex mu;
std::unique_lock<std::mutex> global_lock(mu);

needs to be added to init_client(). The other mutex becomes redundant in this case. Performance impact should be fairly negligible, since S3 is slow as molasses anyway.

depthwise commented 3 years ago

Related AWS SDK bug I just filed: https://github.com/aws/aws-sdk-cpp/issues/1614