Closed pkarneliuk closed 10 months ago
Thanks for a very detailed description, I managed to reproduce the issue. The reproducing example looks correct, so yeah, probably something is happening in the library internals.
This should be fixed in v1.32.1
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Describe the bug
Sporadic SEGFAULTs in
Aws::Iotsecuretunneling::SecureTunnel
internal routines during start/stop connection to invalid hostname. The test case is simple: 1) Create SecureTunnel instance in destination mode with invalid hostname, likedata.region.fake
; 2)Start()
SecureTunnel instance; 3) Wait error notification byOnConnectionFailure
callback aboutAWS_IO_DNS_INVALID_NAME
orAWS_IO_DNS_QUERY_FAILED
error; 4)Stop()
SecureTunnel instance; 5) Wait transition to Stopped state inOnStopped
callback; 6) Release SecureTunnel instanceThe test application may crash in different
lib-aws-c-*.so
routines.Expected Behavior
The process should not crash by SEGFAULT during/after invoking SecureTunnel::Start() SecureTunnel::Stop() methods.
Current Behavior
Aws::Iotsecuretunneling::SecureTunnel
may crash with SEGFAULT in internal routines on creation or start/stop connection to invalid hostname (low reproducibility). In theDebug
build the current code triggers assertions inlib-aws-c-*.so
libs:Crash 1: corrupted linked list in AwsEventLoop threads:
At the assertion in
AwsEventLoop 2
, themain
thread has calledStop()
method and waitsOnStopped
notification in callback by a promise:The detailed GDB backtrace per threads
```shell =thread-group-added,id="i1" GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or laterCrash 2: related to DNS resolving thread:
The detailed GDB backtrace per threads
```shell =thread-group-added,id="i1" GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or laterReproduction Steps
There is small test based on
samples/secure_tunneling/secure_tunnel
which can reproduce these crashes: https://github.com/pkarneliuk/aws-iot-device-sdk-cpp-v2/blob/main/samples/secure_tunneling/secure_tunnel/main.cppPossible Solution
It looks like SEGFAULTs (and assertions in Debug) caused by missing thread synchronization in
libaws-c-io.so
andlibaws-c-common.so
so data structures are corrupted.The assertion
"aws_event_loop_thread_is_callers_thread(event_loop)"
during DNS resolving may be caused by scheduling an event task to unexpected thread as some default behavior.Additional Information/Context
The Ubuntu 20.04.6 LTS x86_64 host has 8 cores, so there are 4 threads in internal default
Aws::Crt::Io::EventLoopGroup
instance.SDK version used
1.31.0
Environment details (OS name and version, etc.)
Ubuntu 20.04.6 LTS on x86_64 (8 cores)