Closed PengZheng closed 8 months ago
To be honest, it is really annoying to see any ASAN report on the framework (pubsub ASAN report is more acceptable), especially in the last minutes before a shiny new release.
After celix_scheduledEvent_create
returns a newly created event, ASAN reports that we are trying to read its ID via celix_scheduledEvent_getId
from a freed memory region. From its size, it is most probably a Schedule Event. From the logging, we can tell there is only one ScheduleEvent ever existing: the first one-shot event. I re-checked the event handling path, and found no double-free.
After the framework startup, there is no further dynamic memory allocation/deallocation except for ScheduleEvent, thus it is not the framework who deallocated a newly created event. The interesting part is ASAN does not report use-after-free within celix_scheduledEvent_create
, which initialized the event:
From what I saw, it does not seem like a Celix bug. But let's watch for a longer time if it reproduces. I've encountered some weird memory corruption bugs when KSM is used on some embedded devices in the past. Considering such memory saving techs are often used in cloud environments, my unreliable wild guess is that this issue is one of them.
Another use-after-free is reported: https://github.com/apache/celix/actions/runs/6741052039/job/18325045856
[ RUN ] ScheduledEventTestSuite.OnceShotEventTest
[2023-11-03T03:58:12] [ debug] [celix_framework] Framework started event received -> registering framework.ready condition service
[2023-11-03T03:58:12] [ info] [celix_framework] [framework_start:476] Celix framework started
[2023-11-03T03:58:12] [ trace] [celix_framework] [framework_start:480] Celix framework started with uuid C113B138-D3CF-43A6-BEF1-A44853FBCA1F
[2023-11-03T03:58:12] [ debug] [celix_framework] [celix_framework_scheduleEvent:2540] Added scheduled event 'unnamed' (id=0) for bundle 'celix_framework' (id=0).
[2023-11-03T03:58:12] [ trace] [celix_framework] [celix_scheduledEvent_process:181] Processing scheduled event unnamed for bundle id 0
[2023-11-03T03:58:12] [ debug] [celix_framework] [celix_framework_processScheduledEvents:1400] Removing processed one-shot scheduled event 'unnamed' (id=0) for bundle if 0.
=================================================================
==18763==ERROR: AddressSanitizer: heap-use-after-free on address 0x6110000c6d88 at pc 0x00010b353b8b bp 0x7ff7b4f77430 sp 0x7ff7b4f77428
READ of size 8 at 0x6110000c6d88 thread T0
#0 0x10b353b8a in celix_scheduledEvent_getId+0x2a (test_framework:x86_64+0x1003cdb8a)
#1 0x10b31ccef in celix_bundleContext_scheduleEvent+0x10f (test_framework:x86_64+0x100396cef)
#2 0x10b2a2018 in ScheduledEventTestSuite_OnceShotEventTest_Test::TestBody()+0x2f8 (test_framework:x86_64+0x10031c018)
#3 0x10b38eab0 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x160 (test_framework:x86_64+0x100408ab0)
#4 0x10b38e6c1 in testing::Test::Run()+0x971 (test_framework:x86_64+0x1004086c1)
#5 0x10b393586 in testing::TestInfo::Run()+0x9f6 (test_framework:x86_64+0x10040d586)
#6 0x10b396606 in testing::TestSuite::Run()+0x346 (test_framework:x86_64+0x100410606)
#7 0x10b3d8870 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework:x86_64+0x100452870)
#8 0x10b3d6cec in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework:x86_64+0x100450cec)
#9 0x10b3d6a49 in testing::UnitTest::Run()+0x1a9 (test_framework:x86_64+0x100450a49)
#10 0x10b424abf in main+0xef (test_framework:x86_64+0x10049eabf)
#11 0x11716652d in start+0x1cd (dyld:x86_64+0x552d)
0x6110000c6d88 is located 8 bytes inside of 248-byte region [0x6110000c6d80,0x6110000c6e78)
freed by thread T401 here:
#0 0x10bb39019 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4b019)
#1 0x10b3216e8 in fw_eventDispatcher+0x498 (test_framework:x86_64+0x10039b6e8)
#2 0x7ff8136794e0 in _pthread_start+0x7c (libsystem_pthread.dylib:x86_64+0x64e0)
#3 0x7ff813674f6a in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x1f6a)
previously allocated by thread T0 here:
#0 0x10bb38ed0 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4aed0)
#1 0x10b353619 in celix_scheduledEvent_create+0x39 (test_framework:x86_64+0x1003cd619)
#2 0x10b32e7eb in celix_framework_scheduleEvent+0xcb (test_framework:x86_64+0x1003a87eb)
#3 0x10b31ccef in celix_bundleContext_scheduleEvent+0x10f (test_framework:x86_64+0x100396cef)
#4 0x10b2a2018 in ScheduledEventTestSuite_OnceShotEventTest_Test::TestBody()+0x2f8 (test_framework:x86_64+0x10031c018)
#5 0x10b38eab0 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x160 (test_framework:x86_64+0x100408ab0)
#6 0x10b38e6c1 in testing::Test::Run()+0x971 (test_framework:x86_64+0x1004086c1)
#7 0x10b393586 in testing::TestInfo::Run()+0x9f6 (test_framework:x86_64+0x10040d586)
#8 0x10b396606 in testing::TestSuite::Run()+0x346 (test_framework:x86_64+0x100410606)
#9 0x10b3d8870 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework:x86_64+0x100452870)
#10 0x10b3d6cec in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework:x86_64+0x100450cec)
#11 0x10b3d6a49 in testing::UnitTest::Run()+0x1a9 (test_framework:x86_64+0x100450a49)
#12 0x10b424abf in main+0xef (test_framework:x86_64+0x10049eabf)
#13 0x11716652d in start+0x1cd (dyld:x86_64+0x552d)
Thread T401 created by T0 here:
#0 0x10bb3299c in wrap_pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4499c)
#1 0x10b790981 in celixThread_create+0x11 (libcelix_utils.3.dylib:x86_64+0x13981)
#2 0x10b320c35 in fw_init+0x195 (test_framework:x86_64+0x10039ac35)
#3 0x10b321f02 in framework_start+0x132 (test_framework:x86_64+0x10039bf02)
#4 0x10b3452f0 in celix_frameworkFactory_createFramework+0x120 (test_framework:x86_64+0x1003bf2f0)
#5 0x10afb5919 in celix::createFramework(celix::Properties const&)+0x49 (test_framework:x86_64+0x10002f919)
#6 0x10b2cd06e in ScheduledEventTestSuite::ScheduledEventTestSuite()+0x3ae (test_framework:x86_64+0x10034706e)
#7 0x10b2ccc6b in testing::internal::TestFactoryImpl<ScheduledEventTestSuite_OnceShotEventTest_Test>::CreateTest()+0x1b (test_framework:x86_64+0x100346c6b)
#8 0x10b393a7c in testing::Test* testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, testing::Test* (testing::internal::TestFactoryBase::*)(), char const*)+0x15c (test_framework:x86_64+0x10040da7c)
#9 0x10b392d80 in testing::TestInfo::Run()+0x1f0 (test_framework:x86_64+0x10040cd80)
#10 0x10b396606 in testing::TestSuite::Run()+0x346 (test_framework:x86_64+0x100410606)
#11 0x10b3d8870 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework:x86_64+0x100452870)
#12 0x10b3d6cec in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework:x86_64+0x100450cec)
#13 0x10b3d6a49 in testing::UnitTest::Run()+0x1a9 (test_framework:x86_64+0x100450a49)
#14 0x10b424abf in main+0xef (test_framework:x86_64+0x10049eabf)
#15 0x11716652d in start+0x1cd (dyld:x86_64+0x552d)
SUMMARY: AddressSanitizer: heap-use-after-free (test_framework:x86_64+0x1003cdb8a) in celix_scheduledEvent_getId+0x2a
Shadow bytes around the buggy address:
0x1c2200018d60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x1c2200018d70: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
0x1c2200018d80: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x1c2200018d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c2200018da0: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c2200018db0: fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x1c2200018dc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x1c2200018dd0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x1c2200018de0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x1c2200018df0: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c2200018e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==18763==ABORTING
I inspected scheduled CI actions, and found the following failure. Though this is very unlikely to be a real bug, which I'll analyze later, I record it for further observation.
https://github.com/apache/celix/actions/runs/6242193992/job/16945710698