Closed grrtrr closed 1 year ago
Hello, Please sorry for a long time no reply. Thank you a lot for a through dive deep I’m trying to follow your report but unfortunately, can’t understand the transition
The free is happening for the second time
Call CleanupConfigAndCredentialsCacheManager() (or ApiShutdown) late in the sequence, e.g. in the destructor of an object that gets deleted as part of the application exit sequence.
It happened in this way in the unit test, ApiShutdown was called late (as evident in the gdb trace above).
Are you sure that ApiShutdown
is not getting called twice? Indeed having such API design is not the best practice and 2nd call of this API function is not safe, but so far I did not manage to spot who calls the destructor the second time.
As far as I’m aware, static object is destroyed at the termination of program
, I don’t think that someone could call ApiShutdown somewhere after main
function.
Long story short, would it be possible to provide a sample app/reproducer for the issue? I’ve tried to run existing unit/integration tests with a memory sanitizer (valgrind) enabled and could not reproduce the issue reported.
Best regards,
Sergey
Hi Sergey,
thank you so much for taking the time to understand the underlying problem, and getting back with good questions.
With regard to reproducing the double-free problem - this happened for us reliably in unit tests: we use googletest. Outside of googletest runs, the problem still exists, but it may not be as easy to reproduce (as per your comment).
With regard to calling the destructor twice, I should have explained the traces better.
I put std::cout
statements that print the destructor/function name and the value of this
. The following trace was produced after the fix (the std::map
entries are empty, hence they are no longer double-freed).
As far as I remember, the whole sequence below happens after main()
returns, which makes sense since the ConfigAndCredentialsCacheManager
object is kept in a static
variable via an Aws::UniquePtr
:
// source/config/AWSProfileConfigLoader.cpp
static Aws::UniquePtr<ConfigAndCredentialsCacheManager> s_configManager(nullptr);
static
objects are destructed according to § 3.6.3 of the C++03 standard:
Destructors (12.4) for initialized objects (that is, objects whose lifetime (3.8) has begun) with static storage duration are called as a result of returning from main and as a result of calling std::exit (18.5).
We have an object, s3_blobstore_plugin.cc
which is also tied to a static
variable, you see its destructor called in the line that says DESTRUCTOR
.
~ConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fb79a8d9580
~AWSProfileConfigLoader() 0 0x7fb79a8d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fb79a8d9408
~AWSProfileConfigLoader() 0 0x7fb79a8d9408
E0121 19:38:32.314947 6754 s3_blobstore_plugin.cc:85] DESTRUCTOR
CleanupConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~ConfigAndCredentialsCacheManager() 0x7fb79a8d9300
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/config) 0x7fb79a8d9580
~AWSProfileConfigLoader() 0 0x7fb79a8d9580
~AWSConfigFileProfileConfigLoader(/home/grenker/.aws/credentials) 0x7fb79a8d9408
~AWSProfileConfigLoader() 0 0x7fb79a8d9408
In the SDK code, there is only one call to CleanupConfigAndCredentialsCacheManager()
, from ShutdownAPI()
(source/Aws.cpp
).
This is an important detail that I had not included in the above description: in our code, the call to Aws::ShutdownAPI()
is tied to another (static) object, which is set up in such a way that it is always called after the objects that use the SDK (in particular the line labeled "DESTRUCTOR"
above). That is, we are ensuring that Aws::ShutdownAPI()
gets called after main()
terminates.
Now, what happens when ~ConfigAndCredentialsCacheManager
is called:
AWSConfigFileProfileConfigLoader
member variables (m_credentialsFileLoader
and m_configFileLoader
) get destructed;AWSProfileConfigLoader
base class,ProfilesContainer
m_profiles
member variable,Aws::Map<Aws::String, Aws::Config::Profile>
(a std::map
),m_credentialsFileLoader
and m_configFileLoader
the base-class destructor will destruct the std::map
,The destructor of ConfigAndCredentialsCacheManager
is called twice here, because
static
global variable (unique_ptr) s_configManager
gets destructed (call number 1),Aws::ShutdownAPI
gets called, which calls CleanupConfigAndCredentialsCacheManager
, setting s_configManager = nullptr
, which has the same effect as calling reset()
,reset()
is to call ~ConfigAndCredentialsCacheManager()
(call number 2).The AWSProfileConfigLoader
base class (include/aws/core/config/AWSProfileConfigLoader.h
) contains a std::map
member that is not wrapped in e.g. a std::unique_ptr
.
When the destructor of AWSProfileConfigLoader
gets called twice, a double-free of elements in that std::map
happens.
We were unfortunate in our setup that in unit tests we (reliably) hit this sequence.
The problem can be avoided by explicitly clearing the std::map
in ~AWSProfileConfigLoader()
, as in #1851.
Hi Gerrit @grrtrr Great news! I managed to reproduce the reported issue. Bad news: even if you try to manually clear the m_profiles variable, you still have double free and memory corruption, but with less symptoms.
Thanks to your really great and deep investigation, I managed to understand more context of the issue, especially with the order of destructions. However, double destructor call still sounded too fishy, this simply should not happen under normal conditions.
Please refer to the unit test I created to reproduce the issue: AWSMemoryTest.cpp#L34-L109, especially the line 76 that will re-enable double-free behavior once uncommented. Please also note that a Release build config with optimizations enabled is required for a reproduction.
Long story short: the original problem is a compiler optimizing out setting of underlying pointer value to a nullptr
(default-initialized pointer()
to be exact precise) in a unique_ptr destructor:, i.e. these lines:
__ptr = pointer(); // on gcc
__ptr_.first() = pointer(); // on clang
, thus CleanupConfigAndCredentialsCacheManager
results in additional destructor call on the already destroyed object. Additionally, our API design encourages usage of global variables set and deleted by global Init/Shutdown API calls, implicitly encouraging wrapping it in some kind of RAII-wrapper, possible defined static...
I have submitted a PR#2105 that attempts to avoid compiler optimizations in our smart pointer wrapper by using a custom deleter function that explicitly releases and resets the value of a static pointer. I have also found 2 additional places with this problematic pattern (and some more in tests but skipped them). Could you please try this fix instead of yours and confirm if you are still observing the initial issue reported?
Thank you again for a great finding.
Btw, the design of the fix is atrocious and I wish I've never seen this or had to write something like this... but ç'est la vie dans le monde du legacy. I will be happy to hear about better approach of fixing or work-arounding it.
Hi @grrtrr, I've just merged another take on this issue: https://github.com/aws/aws-sdk-cpp/pull/2108 (instead of using custom deleter, I'm inheriting from UniquePtr and make sure that the underlying pointer is getting set to NULL on destruction. I would really like to know if it fixes the original issue. Best regards, Sergey
@SergeyRyabinin -- thank you for following up and getting to the root of this.
In some of our tests we have seen occasional crashes (at exit
time) in aws-cpp-sdk-core/source/monitoring/MonitoringManager.cpp
, I will backport your #2108 our codebase.
While thinking about, would it still make sense to make AWSConfigFileProfileConfigLoader / AWSProfileConfigLoader
robust against being destructed twice, as suggested in #1851; or should that be closed?
Are you still seeing any segfaults @grrtrr?
@jmklix - no, issue can be closed.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Describe the bug
CleanupConfigAndCredentialsCacheManager()
is not robust against being called late, i.e. after thestatic
variables_configManager
has been destructed already once (as part of theexit
sequence, calling the destructors of static variables).In this case, the
ConfigAndCredentialsCacheManager
destructor gets called twice:This prints out the object addresses and the sequence of constructor/destructor calls. The
0
and10
numbers print the length of them_profiles
element at the time of call. The call toCleanupConfigAndCredentialsCacheManager
happens after "shutdown", during the application exit sequence.Here is the _detailed
gdb
trace:The
free
is happening for the second time on thestd::map
instances of the 2AWSConfigFileProfileConfigLoader
embedded into theConfigAndCredentialsCacheManager
. The first destructor call has already de-allocated them (double-free).By virtue of using
std::unique_ptr::reset
, the deleter is invoked inExpected behavior
Under no circumstances should a bad timing of calling
ConfigAndCredentialsCacheManager
produce a crash.Current behavior
Crash, as described above.
Steps to Reproduce
Call
CleanupConfigAndCredentialsCacheManager()
(or ApiShutdown) late in the sequence, e.g. in the destructor of an object that gets deleted as part of the application exit sequence.It happened in this way in the unit test,
ApiShutdown
was called late (as evident in thegdb
trace above).Possible Solution
What to fix
The
AWSProfileConfigLoader
embeds astd::map
directly, not viastd::unique_ptr
. A function that performs a clean-up on a static global variable should be able to deal with timing issues such as the above.The crash can be avoided by clearing the map in the
AWSConfigFileProfileConfigLoader
(or the base class):With this change, under bad timing conditions as above, destructors are again invoked twice, but the code no longer segfaults as the
std::map
elements have not only been freed, but the map has also been emptied:Note that now the length of all
m_profiles
elements is reported as 0.AWS CPP SDK version used
1.9.170 - problem still present on master
Compiler and Version used
7.5.0-3ubuntu1~18.04
Operating System and version
ubuntu 18.04