aws / aws-iot-device-sdk-cpp-v2

Next generation AWS IoT Client SDK for C++ using the AWS Common Runtime
Apache License 2.0
185 stars 108 forks source link

SEGFAULTs in Aws::Iotsecuretunneling::SecureTunnel internals #669

Closed pkarneliuk closed 10 months ago

pkarneliuk commented 11 months ago

Describe the bug

Sporadic SEGFAULTs in Aws::Iotsecuretunneling::SecureTunnel internal routines during start/stop connection to invalid hostname. The test case is simple: 1) Create SecureTunnel instance in destination mode with invalid hostname, like data.region.fake; 2) Start() SecureTunnel instance; 3) Wait error notification by OnConnectionFailure callback about AWS_IO_DNS_INVALID_NAME or AWS_IO_DNS_QUERY_FAILED error; 4) Stop() SecureTunnel instance; 5) Wait transition to Stopped state in OnStopped callback; 6) Release SecureTunnel instance

The test application may crash in different lib-aws-c-*.so routines.

Expected Behavior

The process should not crash by SEGFAULT during/after invoking SecureTunnel::Start() SecureTunnel::Stop() methods.

Current Behavior

Aws::Iotsecuretunneling::SecureTunnel may crash with SEGFAULT in internal routines on creation or start/stop connection to invalid hostname (low reproducibility). In the Debug build the current code triggers assertions in lib-aws-c-*.so libs:

Crash 1: corrupted linked list in AwsEventLoop threads:

image

At the assertion in AwsEventLoop 2, the main thread has called Stop() method and waits OnStopped notification in callback by a promise: image

The detailed GDB backtrace per threads ```shell =thread-group-added,id="i1" GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word". Warning: Debuggee TargetArchitecture not detected, assuming x86_64. =cmd-param-changed,param="pagination",value="off" Stopped due to shared library event (no libraries added or removed) Loaded '/lib64/ld-linux-x86-64.so.2'. Symbols loaded. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7fff951b2578) at /build/aws-iot-device-sdk-cpp-v2/samples/secure_tunneling/secure_tunnel/main.cpp:81 81 { Loaded '/usr/local/lib/libIotSecureTunneling-cpp.so'. Symbols loaded. Loaded '/usr/local/lib/libaws-crt-cpp.so'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-common.so.1'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libstdc++.so.6'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libgcc_s.so.1'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libc.so.6'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-iot.so.0unstable'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-mqtt.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-event-stream.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-s3.so.0unstable'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-auth.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-http.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-io.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-cal.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-sdkutils.so.1.0.0'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libdl.so.2'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libm.so.6'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libpthread.so.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-checksums.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-compression.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libs2n.so.1'. Symbols loaded. Loaded '/usr/local/lib/libcrypto.so.3'. Symbols loaded. [New Thread 0x7f1212f00700 (LWP 3950171)] [New Thread 0x7f12126ff700 (LWP 3950172)] [New Thread 0x7f1211efe700 (LWP 3950173)] [New Thread 0x7f12116fd700 (LWP 3950174)] [New Thread 0x7f1210edb700 (LWP 3950175)] Thread 3 "AwsEventLoop 2" received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 0x7f12126ff700 (LWP 3950172)] __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 Loaded '/lib/x86_64-linux-gnu/libnss_files.so.2'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libnss_dns.so.2'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libresolv.so.2'. Symbols loaded. Execute debugger commands using "-exec ", for example "-exec info registers" will list registers in use (when GDB is the debugger) -exec info threads Id Target Id Frame 1 Thread 0x7f1212f03c40 (LWP 3950149) "secure-tunnel" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 2 Thread 0x7f1212f00700 (LWP 3950171) "AwsEventLoop 1" 0x00007f1213aca68e in epoll_wait (epfd=4, events=0x7f1212eff880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 * 3 Thread 0x7f12126ff700 (LWP 3950172) "AwsEventLoop 2" __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 4 Thread 0x7f1211efe700 (LWP 3950173) "AwsEventLoop 3" 0x00007f1213aca68e in epoll_wait (epfd=8, events=0x7f1211efd880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 5 Thread 0x7f12116fd700 (LWP 3950174) "AwsEventLoop 4" 0x00007f1213aca68e in epoll_wait (epfd=10, events=0x7f12116fc880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 6 Thread 0x7f1210edb700 (LWP 3950175) "AwsHostResolver" futex_abstimed_wait_cancelable (private=, abstime=0x7f1210edaa50, clockid=, expected=0, futex_word=0x7f11fc0019a4) at ../sysdeps/nptl/futex-internal.h:320 -exec thread 1 [Switching to thread 1 (Thread 0x7f1212f03c40 (LWP 3950149))] #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory. =thread-selected,id="1",frame={level="0",addr="0x00007f1213ac395d",func="syscall",args=[],file="../sysdeps/unix/sysv/linux/x86_64/syscall.S",fullname="/build/glibc-wuryBv/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/syscall.S",line="38",arch="i386:x86-64"} -exec bt #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007f1213ca5911 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration >, std::chrono::duration >) () from /lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x000055b0b0726de6 in std::__atomic_futex_unsigned<2147483648u>::_M_load_and_test_until (this=0x55b0b0fc8770, __assumed=0, __operand=1, __equal=true, __mo=std::memory_order_acquire, __has_timeout=false, __s=std::chrono::duration = { 0s }, __ns=std::chrono::duration = { 0ns }) at /usr/include/c++/11/bits/atomic_futex.h:110 #3 0x000055b0b07261ff in std::__atomic_futex_unsigned<2147483648u>::_M_load_and_test (this=0x55b0b0fc8770, __assumed=0, __operand=1, __equal=true, __mo=std::memory_order_acquire) at /usr/include/c++/11/bits/atomic_futex.h:159 #4 0x000055b0b0724b90 in std::__atomic_futex_unsigned<2147483648u>::_M_load_when_equal (__mo=std::memory_order_acquire, __val=1, this=0x55b0b0fc8770) at /usr/include/c++/11/bits/atomic_futex.h:213 #5 std::__future_base::_State_baseV2::wait (this=0x55b0b0fc8760) at /usr/include/c++/11/future:336 #6 0x000055b0b0726b38 in std::__basic_future::_M_get_result (this=0x7fff951b1f60) at /usr/include/c++/11/future:719 #7 0x000055b0b0725f1b in std::future::get (this=0x7fff951b1f60) at /usr/include/c++/11/future:805 #8 0x000055b0b0722f3d in test () at /build/aws-iot-device-sdk-cpp-v2/samples/secure_tunneling/secure_tunnel/main.cpp:74 #9 0x000055b0b07231d6 in main (argc=1, argv=0x7fff951b2578) at /build/aws-iot-device-sdk-cpp-v2/samples/secure_tunneling/secure_tunnel/main.cpp:87 -exec thread 3 [Switching to thread 3 (Thread 0x7f12126ff700 (LWP 3950172))] #0 __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. =thread-selected,id="3",frame={level="0",addr="0x00007f12139ee00b",func="__GI_raise",args=[{name="sig",value=""}],file="../sysdeps/unix/sysv/linux/raise.c",fullname="/build/glibc-wuryBv/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c",line="50",arch="i386:x86-64"} -exec bt #0 __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f1213e7361a in aws_debug_break () at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/posix/system_info.c:164 #2 0x00007f1213e4d635 in aws_fatal_assert (cond_str=0x7f1213e904a8 "aws_linked_list_node_prev_is_valid(before)", file=0x7f1213e90320 "/build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/include/aws/common/linked_list.inl", line=250) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/assert.c:14 #3 0x00007f1213e7b992 in aws_linked_list_insert_before (before=0x7f12126fe780, to_add=0x55b0b0eee728) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/include/aws/common/linked_list.inl:250 #4 0x00007f1213e7bb79 in aws_linked_list_push_back (list=0x7f12126fe770, node=0x55b0b0eee728) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/include/aws/common/linked_list.inl:275 #5 0x00007f1213e7c992 in s_run_all (scheduler=0x55b0b0df8e90, current_time=277935556911975, status=AWS_TASK_STATUS_RUN_READY) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/task_scheduler.c:242 #6 0x00007f1213e7c7fd in aws_task_scheduler_run_all (scheduler=0x55b0b0df8e90, current_time=277935556911975) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/task_scheduler.c:188 #7 0x00007f1213782e24 in aws_event_loop_thread (args=0x55b0b0df8ae0) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/linux/epoll_event_loop.c:666 #8 0x00007f1213e74b1e in thread_fn (arg=0x55b0b0df9080) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/posix/thread.c:177 #9 0x00007f12135ad609 in start_thread (arg=) at pthread_create.c:477 #10 0x00007f1213aca353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ```

Crash 2: related to DNS resolving thread:

image

The detailed GDB backtrace per threads ```shell =thread-group-added,id="i1" GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word". Warning: Debuggee TargetArchitecture not detected, assuming x86_64. =cmd-param-changed,param="pagination",value="off" Stopped due to shared library event (no libraries added or removed) Loaded '/lib64/ld-linux-x86-64.so.2'. Symbols loaded. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7ffe8dcc3438) at /build/aws-iot-device-sdk-cpp-v2/samples/secure_tunneling/secure_tunnel/main.cpp:81 81 { Loaded '/usr/local/lib/libIotSecureTunneling-cpp.so'. Symbols loaded. Loaded '/usr/local/lib/libaws-crt-cpp.so'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-common.so.1'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libstdc++.so.6'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libgcc_s.so.1'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libc.so.6'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-iot.so.0unstable'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-mqtt.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-event-stream.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-s3.so.0unstable'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-auth.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-http.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-io.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-cal.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-sdkutils.so.1.0.0'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libdl.so.2'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libm.so.6'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libpthread.so.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-checksums.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libaws-c-compression.so.1.0.0'. Symbols loaded. Loaded '/usr/local/lib/libs2n.so.1'. Symbols loaded. Loaded '/usr/local/lib/libcrypto.so.3'. Symbols loaded. [New Thread 0x7fc65fa1f700 (LWP 3751708)] [New Thread 0x7fc65f21e700 (LWP 3751709)] [New Thread 0x7fc65ea1d700 (LWP 3751710)] [New Thread 0x7fc65e21c700 (LWP 3751711)] [New Thread 0x7fc65d9fa700 (LWP 3751712)] Thread 6 "AwsHostResolver" received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 0x7fc65d9fa700 (LWP 3751712)] __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 Loaded '/lib/x86_64-linux-gnu/libnss_files.so.2'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libnss_dns.so.2'. Symbols loaded. Loaded '/lib/x86_64-linux-gnu/libresolv.so.2'. Symbols loaded. Execute debugger commands using "-exec ", for example "-exec info registers" will list registers in use (when GDB is the debugger) -exec info threads Id Target Id Frame 1 Thread 0x7fc65fa22c40 (LWP 3751663) "secure-tunnel" 0x00007fc660560b0b in tcache_put (tc_idx=0, chunk=0x557958307fc0) at malloc.c:2926 2 Thread 0x7fc65fa1f700 (LWP 3751708) "AwsEventLoop 1" 0x00007fc6605e968e in epoll_wait (epfd=4, events=0x7fc65fa1e880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 3 Thread 0x7fc65f21e700 (LWP 3751709) "AwsEventLoop 2" 0x00007fc6605e968e in epoll_wait (epfd=6, events=0x7fc65f21d880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 4 Thread 0x7fc65ea1d700 (LWP 3751710) "AwsEventLoop 3" 0x00007fc6605e968e in epoll_wait (epfd=8, events=0x7fc65ea1c880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 5 Thread 0x7fc65e21c700 (LWP 3751711) "AwsEventLoop 4" 0x00007fc6605e968e in epoll_wait (epfd=10, events=0x7fc65e21b880, maxevents=100, timeout=100000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 * 6 Thread 0x7fc65d9fa700 (LWP 3751712) "AwsHostResolver" __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 -exec thread 1 [Switching to thread 1 (Thread 0x7fc65fa22c40 (LWP 3751663))] #0 0x00007fc660560b0b in tcache_put (tc_idx=0, chunk=0x557958307fc0) at malloc.c:2926 2926 malloc.c: No such file or directory. =thread-selected,id="1",frame={level="0",addr="0x00007fc660560b0b",func="tcache_put",args=[{name="tc_idx",value="0"},{name="chunk",value="0x557958307fc0"}],file="malloc.c",fullname="/build/glibc-wuryBv/glibc-2.31/malloc/malloc.c",line="2926",arch="i386:x86-64"} -exec bt #0 0x00007fc660560b0b in tcache_put (tc_idx=0, chunk=0x557958307fc0) at malloc.c:2926 #1 _int_free (av=0x7fc6606b6b80 , p=0x557958307fc0, have_lock=0) at malloc.c:4208 #2 0x00007fc65fbebdb3 in ossl_namemap_name2num_n () from /usr/local/lib/libcrypto.so.3 #3 0x00007fc65fbc3856 in evp_is_a () from /usr/local/lib/libcrypto.so.3 #4 0x00007fc65fbcc81e in EVP_KEYMGMT_is_a () from /usr/local/lib/libcrypto.so.3 #5 0x00007fc65fb933db in ossl_decoder_ctx_setup_for_pkey () from /usr/local/lib/libcrypto.so.3 #6 0x00007fc65fb936f8 in OSSL_DECODER_CTX_new_for_pkey () from /usr/local/lib/libcrypto.so.3 #7 0x00007fc65fc91554 in x509_pubkey_ex_d2i_ex () from /usr/local/lib/libcrypto.so.3 #8 0x00007fc65faf42ab in asn1_item_embed_d2i () from /usr/local/lib/libcrypto.so.3 #9 0x00007fc65faf4e28 in asn1_template_noexp_d2i () from /usr/local/lib/libcrypto.so.3 #10 0x00007fc65faf4489 in asn1_item_embed_d2i () from /usr/local/lib/libcrypto.so.3 #11 0x00007fc65faf4e28 in asn1_template_noexp_d2i () from /usr/local/lib/libcrypto.so.3 #12 0x00007fc65faf4489 in asn1_item_embed_d2i () from /usr/local/lib/libcrypto.so.3 #13 0x00007fc65faf5603 in ASN1_item_d2i_ex () from /usr/local/lib/libcrypto.so.3 #14 0x00007fc65fc12699 in PEM_X509_INFO_read_bio_ex () from /usr/local/lib/libcrypto.so.3 #15 0x00007fc65fc64954 in X509_load_cert_crl_file_ex () from /usr/local/lib/libcrypto.so.3 #16 0x00007fc65fc64acd in by_file_ctrl_ex () from /usr/local/lib/libcrypto.so.3 #17 0x00007fc65fc80f97 in X509_STORE_load_file_ex () from /usr/local/lib/libcrypto.so.3 #18 0x00007fc65fc810e9 in X509_STORE_load_locations_ex () from /usr/local/lib/libcrypto.so.3 #19 0x00007fc65fffa93c in s2n_x509_trust_store_from_ca_file (store=0x5579582e75a0, ca_pem_filename=0x7fc6602d29f0 "/etc/ssl/certs/ca-certificates.crt", ca_dir=0x7fc6602d28d0 "/etc/ssl/certs") at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/s2n/tls/s2n_x509_validator.c:122 #20 0x00007fc65ff8609f in s2n_config_set_verification_ca_location (config=0x5579582e7470, ca_pem_filename=0x7fc6602d29f0 "/etc/ssl/certs/ca-certificates.crt", ca_dir=0x7fc6602d28d0 "/etc/ssl/certs") at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:516 #21 0x00007fc6602b8d97 in s_tls_ctx_new (alloc=0x7fc6609bec40 , options=0x7ffe8dcc2780, mode=S2N_CLIENT) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1585 #22 0x00007fc6602b92a9 in aws_tls_client_ctx_new (alloc=0x7fc6609bec40 , options=0x7ffe8dcc2780) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1663 #23 0x00007fc6604b8622 in aws_secure_tunnel_new (allocator=0x7fc6609bec40 , options=0x7ffe8dcc2980) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-c-iot/source/secure_tunneling.c:2570 #24 0x00007fc660c17c95 in Aws::Iotsecuretunneling::SecureTunnel::SecureTunnel(aws_allocator*, Aws::Crt::Io::ClientBootstrap*, Aws::Crt::Io::SocketOptions const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, aws_secure_tunneling_local_proxy_mode, std::__cxx11::basic_string, std::allocator > const&, Aws::Crt::Io::TlsConnectionOptions*, std::__cxx11::basic_string, std::allocator > const&, Aws::Crt::Http::HttpClientConnectionProxyOptions*, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function, std::function) (this=0x557958130410, allocator=0x7fc6609bec40 , clientBootstrap=0x5579581302d0, socketOptions=..., accessToken="token", clientToken="", localProxyMode=AWS_SECURE_TUNNELING_DESTINATION_MODE, endpointHost="data.region.fake", tslOptions=0x0, rootCa="", httpClientConnectionProxyOptions=0x0, onConnectionSuccess=..., onConnectionFailure=..., onConnectionComplete=..., onConnectionShutdown=..., onSendMessageComplete=..., onSendDataComplete=..., onMessageReceived=..., onDataReceive=..., onStreamStarted=..., onStreamStart=..., onStreamStopped=..., onStreamReset=..., onConnectionStarted=..., onConnectionReset=..., onSessionReset=..., onStopped=...) at /build/aws-iot-device-sdk-cpp-v2/secure_tunneling/source/SecureTunnel.cpp:613 #25 0x00007fc660c173d6 in Aws::Iotsecuretunneling::SecureTunnelBuilder::Build (this=0x7ffe8dcc2eb0) at /build/aws-iot-device-sdk-cpp-v2/secure_tunneling/source/SecureTunnel.cpp:502 #26 0x00005579567ceda8 in test () at /build/aws-iot-device-sdk-cpp-v2/samples/secure_tunneling/secure_tunnel/main.cpp:55 #27 0x00005579567cf1d6 in main (argc=1, argv=0x7ffe8dcc3438) at /build/aws-iot-device-sdk-cpp-v2/samples/secure_tunneling/secure_tunnel/main.cpp:87 -exec thread 6 [Switching to thread 6 (Thread 0x7fc65d9fa700 (LWP 3751712))] #0 __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. =thread-selected,id="6",frame={level="0",addr="0x00007fc66050d00b",func="__GI_raise",args=[{name="sig",value=""}],file="../sysdeps/unix/sysv/linux/raise.c",fullname="/build/glibc-wuryBv/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c",line="50",arch="i386:x86-64"} -exec bt #0 __GI_raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fc66099261a in aws_debug_break () at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/posix/system_info.c:164 #2 0x00007fc66096c635 in aws_fatal_assert (cond_str=0x7fc6602c1098 "aws_event_loop_thread_is_callers_thread(event_loop)", file=0x7fc6602c0f40 "/build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/event_loop.c", line=478) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/assert.c:14 #3 0x00007fc660296b6c in aws_event_loop_cancel_task (event_loop=0x55795812e350, task=0x557958301120) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/event_loop.c:478 #4 0x00007fc6604b7ac0 in s_reevaluate_service_task (secure_tunnel=0x557958301080) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-c-iot/source/secure_tunneling.c:2315 #5 0x00007fc6604b5daf in s_change_current_state (secure_tunnel=0x557958301080, next_state=AWS_STS_STOPPED) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-c-iot/source/secure_tunneling.c:1490 #6 0x00007fc6604b4a08 in s_on_websocket_shutdown (websocket=0x0, error_code=1059, user_data=0x557958301080) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-c-iot/source/secure_tunneling.c:1079 #7 0x00007fc6604b4aff in s_on_websocket_setup (setup=0x7fc65d9f97f0, user_data=0x557958301080) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-c-iot/source/secure_tunneling.c:1099 #8 0x00007fc6603479ea in s_ws_bootstrap_invoke_setup_callback (ws_bootstrap=0x7fc6580018e0, error_code=1059) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-http/source/websocket_bootstrap.c:413 #9 0x00007fc660347b56 in s_ws_bootstrap_on_http_setup (http_connection=0x0, error_code=1059, user_data=0x7fc6580018e0) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-http/source/websocket_bootstrap.c:440 #10 0x00007fc6602f7c3b in s_client_bootstrap_on_channel_setup (channel_bootstrap=0x5579581303a0, error_code=1059, channel=0x0, user_data=0x55795816e330) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-http/source/connection.c:789 #11 0x00007fc660290e06 in s_connection_args_setup_callback (args=0x7fc658000fc0, error_code=1059, channel=0x0) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/channel_bootstrap.c:194 #12 0x00007fc660292033 in s_on_host_resolved (resolver=0x557958130190, host_name=0x7fc650001a70, err_code=1059, host_addresses=0x0, user_data=0x7fc658000fc0) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/channel_bootstrap.c:636 #13 0x00007fc66029dc24 in aws_host_resolver_thread (arg=0x7fc6500018f0) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-io/source/host_resolver.c:1035 #14 0x00007fc660993b1e in thread_fn (arg=0x7fc650002c50) at /build/aws-iot-device-sdk-cpp-v2/crt/aws-crt-cpp/crt/aws-c-common/source/posix/thread.c:177 #15 0x00007fc6600cc609 in start_thread (arg=) at pthread_create.c:477 #16 0x00007fc6605e9353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ```

Reproduction Steps

There is small test based on samples/secure_tunneling/secure_tunnel which can reproduce these crashes: https://github.com/pkarneliuk/aws-iot-device-sdk-cpp-v2/blob/main/samples/secure_tunneling/secure_tunnel/main.cpp

/**
 * Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 * SPDX-License-Identifier: Apache-2.0.
 */
#include <aws/crt/Api.h>
#include <aws/iotdevicecommon/IotDevice.h>
#include <aws/iotsecuretunneling/SecureTunnel.h>

#include <future>
#include <iostream>

using namespace Aws::Crt;
using namespace Aws::Iotsecuretunneling;

void test()
{
    std::promise<bool> clientStopped;
    std::promise<bool> clientConnected;

    const auto setConnected = [&](bool value){
        try {
            clientConnected.set_value(value);
        }
        catch(const std::exception& e) {
            std::cerr << "Set Started value:" << value << " " << e.what() << std::endl;
        }
    };

    auto builder = SecureTunnelBuilder(
        aws_default_allocator(), "token", AWS_SECURE_TUNNELING_DESTINATION_MODE, "data.region.fake");
    // OR 
    // auto bootstrap = Aws::Crt::ApiHandle::GetOrCreateStaticDefaultClientBootstrap();
    // auto builder = SecureTunnelBuilder{aws_default_allocator(), *bootstrap, {},
    //                 "token", AWS_SECURE_TUNNELING_DESTINATION_MODE, "data.region.fake"};

    builder.WithOnConnectionStarted([&](SecureTunnel *,
                                        int errorCode,
                                        const ConnectionStartedEventData &) {
        std::cout << "Connection Started:" << ErrorDebugString(errorCode) << std::endl;
        setConnected(true);
    });
    builder.WithOnConnectionFailure([&](SecureTunnel *, int errorCode) {
        std::cout << "Connection Failure:" << ErrorDebugString(errorCode) << std::endl;
        setConnected(false);
    });
    builder.WithOnConnectionShutdown([&]() {
        std::cout << "Connection Shutdown" << std::endl;
    });
    builder.WithOnStopped([&](SecureTunnel *secureTunnel) {
        std::cout << "Secure Tunnel has entered Stopped State" << std::endl;
        clientStopped.set_value(true);
    });

    // Create Secure Tunnel using the options set with the builder
    auto secureTunnel = builder.Build();

    auto err = secureTunnel->Start();
    if (err)
    {
        std::cerr << "Start with: " << err << " " << ErrorDebugString(LastError()) << std::endl;
        exit(-1);
    }

    bool isConnected = clientConnected.get_future().get();
    std::cout << "isConnected: " << isConnected << std::endl;

    // Set the Secure Tunnel Client to desire a stopped state
    if (secureTunnel->Stop() == AWS_OP_ERR)
    {
        std::cerr << "Secure Tunnel Stop call failed: " << ErrorDebugString(LastError());
    }

    // The Secure Tunnel Client at this point will report they are stopped and can be safely removed.
    if (clientStopped.get_future().get())
    {
        secureTunnel = nullptr;
    }
}

int main(int argc, char *argv[])
{
    ApiHandle apiHandle;

    for(std::size_t i = 0, iterations = 10000; i < iterations; ++i)
    {
        std::cout << "Iteration: " << i << "/" << iterations << '\n';
        test();
    }

    return 0;
}

Possible Solution

It looks like SEGFAULTs (and assertions in Debug) caused by missing thread synchronization in libaws-c-io.so and libaws-c-common.so so data structures are corrupted.

The assertion "aws_event_loop_thread_is_callers_thread(event_loop)" during DNS resolving may be caused by scheduling an event task to unexpected thread as some default behavior.

Additional Information/Context

The Ubuntu 20.04.6 LTS x86_64 host has 8 cores, so there are 4 threads in internal default Aws::Crt::Io::EventLoopGroup instance.

SDK version used

1.31.0

Environment details (OS name and version, etc.)

Ubuntu 20.04.6 LTS on x86_64 (8 cores)

sfod commented 11 months ago

Thanks for a very detailed description, I managed to reproduce the issue. The reproducing example looks correct, so yeah, probably something is happening in the library internals.

bretambrose commented 10 months ago

https://github.com/awslabs/aws-c-io/pull/618

bretambrose commented 10 months ago

This should be fixed in v1.32.1

github-actions[bot] commented 10 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.