Closed kou closed 1 year ago
I agree that S3 shutdown should happen earlier than on final exit (otherwise an explicit shutdown wouldn't be needed at all).
I am unsure about the proposed solution, though.
A potential solution is to:
AwsInstance
destructor is called)FinalizeS3
is called, invalidate all live S3 filesystems before calling Aws::ShutdownAPI
(by invalidate, I mean changing their internal state such that all subsequent requests fail)But I don't know if that would work reliably in a multi-threaded context.
Edit: C++17 has a shared-exclusive mutex that would be used for that: grab the mutex in shared mode for all S3 client calls, and in exclusive mode in FinalizeS3
.
@westonpace Do you have any idea for this problem?
@pitrou and I had some discussion of this ticket.
As @pitrou mentioned, we are currently calling finalize too late. We can't wait for the thread pools to shutdown because that happens at program exit and by then it is too late to call AWS shutdown.
Edit: C++17 has a shared-exclusive mutex that would be used for that: grab the mutex in shared mode for all S3 client calls, and in exclusive mode in FinalizeS3.
This approach sounds like the simplest solution.
How about just using AwsInstance::is_finalized_
instead of introducing shared-exclusive mutex?
AwsInstance::is_finalized_
approach.
Describe the bug, including details regarding any error messages, version, and platform.
arrow::FinalizeS3()
doesn't call both ofRegionResolver::ResetDefaultInstance()
andAws::ShutdownAPI()
by #33858. This may cause a crash on exit by the "SubTreeFileSystem$create() with URI" R test: https://github.com/apache/arrow/blob/0344a2cdf6219708a25f39e580406e0ce692b61e/r/tests/testthat/test-filesystem.R#L154-L164For example, it's not happen on the current main but it's happen on #36230:
https://github.com/apache/arrow/actions/runs/5384835055/jobs/9793825156?pr=36230#step:6:33597
I could reproduce this by running only the test (I commented out all other tests). And here is the backtrace for the case:
is here: https://github.com/aws/aws-sdk-cpp/blob/1fb97256a2ae7211a741fda0033ef1e18d29e2f0/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp#L27
And
AWS_LOGSTREAM_INFO
is here: https://github.com/aws/aws-sdk-cpp/blob/1fb97256a2ae7211a741fda0033ef1e18d29e2f0/aws-cpp-sdk-core/include/aws/core/utils/logging/LogMacros.h#L159-L168It seems that
Aws::Utils::Logging::GetLogSystem()
returns a destroyed object in the context. Note that this is called inexit()
(#33 0x00007fd53f209a60 in exit () from /usr/lib/x86_64-linux-gnu/libc.so.6
). So object destroyed order will be undefined.Can we call
RegionResolver::ResetDefaultInstance()
andAws::ShutdownAPI()
fromarrow::FinalizeS3()
again? For example:I can avoid the crash on my environment by this patch.
@westonpace What do you think about this problem?
Component(s)
C++