aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.97k stars 1.06k forks source link

aws-core: segfault if CleanupConfigAndCredentialsCacheManager() is called too late #1850

Closed grrtrr closed 1 year ago

grrtrr commented 2 years ago

Describe the bug

CleanupConfigAndCredentialsCacheManager() is not robust against being called late, i.e. after the static variable s_configManager has been destructed already once (as part of the exit sequence, calling the destructors of static variables).

In this case, the ConfigAndCredentialsCacheManager destructor gets called twice:

AWSConfigFileProfileConfigLoader() 0x7fac9d0d9408
AWSConfigFileProfileConfigLoader() 0x7fac9d0d9580
ConfigAndCredentialsCacheManager() 0x7fac9d0d9300
InitConfigAndCredentialsCacheManager() 0x7fac9d0d9300
AWSConfigFileProfileConfigLoader() 0x7fac9d032508
... test runs ...
ConfigAndCredentialsCacheManager() 0x7fac9d0d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fac9d0d9580
~AWSProfileConfigLoader() 0 0x7fac9d0d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fac9d0d9408
~AWSProfileConfigLoader() 10 0x7fac9d0d9408
... shutdown ...
CleanupConfigAndCredentialsCacheManager() 0x7fac9d0d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fac9d0d9580
~AWSProfileConfigLoader() 0 0x7fac9d0d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fac9d0d9408
~AWSProfileConfigLoader() 10 0x7fac9d0d9408

This prints out the object addresses and the sequence of constructor/destructor calls. The 0 and 10 numbers print the length of the m_profiles element at the time of call. The call to CleanupConfigAndCredentialsCacheManager happens after "shutdown", during the application exit sequence.

Here is the _detailed gdb trace:

(gdb) bt
#0  0x00007ffff6b67fbf in free ()
   from _solib_k8/libthird_Uparty_Sjemalloc_Sv4.5.0_Slibjemalloc.so
#1  0x00007ffff7534294 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const, Aws::Config::Profile>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> > >::_M_erase(std::_Rb_tree
_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >*) ()
   from _solib_k8/libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-core.
so
#2  0x00007ffff75341a5 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const, Aws::Config::Profile>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >, std::less<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> > >::_M_erase(std::_Rb_tree
_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >*) ()
   from _solib_k8/libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-core.
so
#3  0x00007ffff75341a5 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >*) ()
   from _solib_k8/libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-core.so
#4  0x00007ffff75341a5 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >*) ()
   from _solib_k8/libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-core.so
#5  0x00007ffff75341a5 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Aws::Config::Profile> >*) ()
   from _solib_k8/libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-core.so
#6  0x00007ffff756de2f in std::enable_if<!std::is_polymorphic<Aws::Config::ConfigAndCredentialsCacheManager>::value, void>::type Aws::Delete<Aws::Config::ConfigAndCredentialsCacheManager>(Aws::Config::ConfigAndCredentialsCacheManager*) ()
   from _solib_k8/libexternal_Scom_Ugithub_Uaws-sdk-cpp_Slibaws-core.so
#7  0x00007ffff7567fe8 in Aws::Config::CleanupConfigAndCredentialsCacheManager() ()

The free is happening for the second time on the std::map instances of the 2 AWSConfigFileProfileConfigLoader embedded into the ConfigAndCredentialsCacheManager. The first destructor call has already de-allocated them (double-free).

By virtue of using std::unique_ptr::reset, the deleter is invoked in

// source/config/AWSProfileConfigLoader.cpp
        void CleanupConfigAndCredentialsCacheManager()
        {   
            if (!s_configManager)
            {   
                return;
            }
            s_configManager = nullptr;  // Calls deleter / destructor.
        }

Expected behavior

Under no circumstances should a bad timing of calling ConfigAndCredentialsCacheManager produce a crash.

Current behavior

Crash, as described above.

Steps to Reproduce

Call CleanupConfigAndCredentialsCacheManager() (or ApiShutdown) late in the sequence, e.g. in the destructor of an object that gets deleted as part of the application exit sequence.

It happened in this way in the unit test, ApiShutdown was called late (as evident in the gdb trace above).

Possible Solution

What to fix

The AWSProfileConfigLoader embeds a std::map directly, not via std::unique_ptr. A function that performs a clean-up on a static global variable should be able to deal with timing issues such as the above.

The crash can be avoided by clearing the map in the AWSConfigFileProfileConfigLoader (or the base class):

// include/aws/core/config/AWSProfileConfigLoader.h

        class AWS_CORE_API AWSConfigFileProfileConfigLoader : public AWSProfileConfigLoader
        {
        public:

            virtual ~AWSConfigFileProfileConfigLoader() {  m_profiles.clear(); }

With this change, under bad timing conditions as above, destructors are again invoked twice, but the code no longer segfaults as the std::map elements have not only been freed, but the map has also been emptied:

~ConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fb79a8d9580
~AWSProfileConfigLoader() 0 0x7fb79a8d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fb79a8d9408
~AWSProfileConfigLoader() 0 0x7fb79a8d9408
E0121 19:38:32.314947  6754 s3_blobstore_plugin.cc:85] DESTRUCTOR 
CleanupConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~ConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fb79a8d9580
~AWSProfileConfigLoader() 0 0x7fb79a8d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fb79a8d9408
~AWSProfileConfigLoader() 0 0x7fb79a8d9408

Note that now the length of all m_profiles elements is reported as 0.

AWS CPP SDK version used

1.9.170 - problem still present on master

Compiler and Version used

7.5.0-3ubuntu1~18.04

Operating System and version

ubuntu 18.04

SergeyRyabinin commented 2 years ago

Hello, Please sorry for a long time no reply. Thank you a lot for a through dive deep I’m trying to follow your report but unfortunately, can’t understand the transition

The free is happening for the second time

Call CleanupConfigAndCredentialsCacheManager() (or ApiShutdown) late in the sequence, e.g. in the destructor of an object that gets deleted as part of the application exit sequence.

It happened in this way in the unit test, ApiShutdown was called late (as evident in the gdb trace above).

Are you sure that ApiShutdown is not getting called twice? Indeed having such API design is not the best practice and 2nd call of this API function is not safe, but so far I did not manage to spot who calls the destructor the second time. As far as I’m aware, static object is destroyed at the termination of program, I don’t think that someone could call ApiShutdown somewhere after main function. Long story short, would it be possible to provide a sample app/reproducer for the issue? I’ve tried to run existing unit/integration tests with a memory sanitizer (valgrind) enabled and could not reproduce the issue reported. Best regards, Sergey

grrtrr commented 2 years ago

Hi Sergey,

thank you so much for taking the time to understand the underlying problem, and getting back with good questions.

With regard to reproducing the double-free problem - this happened for us reliably in unit tests: we use googletest. Outside of googletest runs, the problem still exists, but it may not be as easy to reproduce (as per your comment).

With regard to calling the destructor twice, I should have explained the traces better.

I put std::cout statements that print the destructor/function name and the value of this. The following trace was produced after the fix (the std::map entries are empty, hence they are no longer double-freed).

As far as I remember, the whole sequence below happens after main() returns, which makes sense since the ConfigAndCredentialsCacheManager object is kept in a static variable via an Aws::UniquePtr:

// source/config/AWSProfileConfigLoader.cpp
 static Aws::UniquePtr<ConfigAndCredentialsCacheManager> s_configManager(nullptr); 

static objects are destructed according to § 3.6.3 of the C++03 standard:

Destructors (12.4) for initialized objects (that is, objects whose lifetime (3.8) has begun) with static storage duration are called as a result of returning from main and as a result of calling std::exit (18.5).

We have an object, s3_blobstore_plugin.cc which is also tied to a static variable, you see its destructor called in the line that says DESTRUCTOR.

~ConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fb79a8d9580
~AWSProfileConfigLoader() 0 0x7fb79a8d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fb79a8d9408
~AWSProfileConfigLoader() 0 0x7fb79a8d9408

E0121 19:38:32.314947  6754 s3_blobstore_plugin.cc:85] DESTRUCTOR

CleanupConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~ConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fb79a8d9580
~AWSProfileConfigLoader() 0 0x7fb79a8d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fb79a8d9408
~AWSProfileConfigLoader() 0 0x7fb79a8d9408

In the SDK code, there is only one call to CleanupConfigAndCredentialsCacheManager(), from ShutdownAPI() (source/Aws.cpp).

This is an important detail that I had not included in the above description: in our code, the call to Aws::ShutdownAPI() is tied to another (static) object, which is set up in such a way that it is always called after the objects that use the SDK (in particular the line labeled "DESTRUCTOR" above). That is, we are ensuring that Aws::ShutdownAPI() gets called after main() terminates.

Now, what happens when ~ConfigAndCredentialsCacheManager is called:

The destructor of ConfigAndCredentialsCacheManager is called twice here, because

Long story short

The AWSProfileConfigLoader base class (include/aws/core/config/AWSProfileConfigLoader.h) contains a std::map member that is not wrapped in e.g. a std::unique_ptr.

When the destructor of AWSProfileConfigLoader gets called twice, a double-free of elements in that std::map happens. We were unfortunate in our setup that in unit tests we (reliably) hit this sequence.

The problem can be avoided by explicitly clearing the std::map in ~AWSProfileConfigLoader(), as in #1851.

SergeyRyabinin commented 2 years ago

Hi Gerrit @grrtrr Great news! I managed to reproduce the reported issue. Bad news: even if you try to manually clear the m_profiles variable, you still have double free and memory corruption, but with less symptoms.

Thanks to your really great and deep investigation, I managed to understand more context of the issue, especially with the order of destructions. However, double destructor call still sounded too fishy, this simply should not happen under normal conditions.

Please refer to the unit test I created to reproduce the issue: AWSMemoryTest.cpp#L34-L109, especially the line 76 that will re-enable double-free behavior once uncommented. Please also note that a Release build config with optimizations enabled is required for a reproduction.

Long story short: the original problem is a compiler optimizing out setting of underlying pointer value to a nullptr (default-initialized pointer() to be exact precise) in a unique_ptr destructor:, i.e. these lines:

__ptr = pointer(); // on gcc
__ptr_.first() = pointer(); // on clang

, thus CleanupConfigAndCredentialsCacheManager results in additional destructor call on the already destroyed object. Additionally, our API design encourages usage of global variables set and deleted by global Init/Shutdown API calls, implicitly encouraging wrapping it in some kind of RAII-wrapper, possible defined static...

I have submitted a PR#2105 that attempts to avoid compiler optimizations in our smart pointer wrapper by using a custom deleter function that explicitly releases and resets the value of a static pointer. I have also found 2 additional places with this problematic pattern (and some more in tests but skipped them). Could you please try this fix instead of yours and confirm if you are still observing the initial issue reported?

Thank you again for a great finding.

SergeyRyabinin commented 2 years ago

Btw, the design of the fix is atrocious and I wish I've never seen this or had to write something like this... but ç'est la vie dans le monde du legacy. I will be happy to hear about better approach of fixing or work-arounding it.

SergeyRyabinin commented 2 years ago

Hi @grrtrr, I've just merged another take on this issue: https://github.com/aws/aws-sdk-cpp/pull/2108 (instead of using custom deleter, I'm inheriting from UniquePtr and make sure that the underlying pointer is getting set to NULL on destruction. I would really like to know if it fixes the original issue. Best regards, Sergey

grrtrr commented 2 years ago

@SergeyRyabinin -- thank you for following up and getting to the root of this. In some of our tests we have seen occasional crashes (at exit time) in aws-cpp-sdk-core/source/monitoring/MonitoringManager.cpp, I will backport your #2108 our codebase.

While thinking about, would it still make sense to make AWSConfigFileProfileConfigLoader / AWSProfileConfigLoader robust against being destructed twice, as suggested in #1851; or should that be closed?

jmklix commented 1 year ago

Are you still seeing any segfaults @grrtrr?

grrtrr commented 1 year ago

@jmklix - no, issue can be closed.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.