apache / celix

Apache Celix is a framework for C and C++14 to develop dynamic modular software applications using component and in-process service-oriented programming.
https://celix.apache.org/
Apache License 2.0
160 stars 85 forks source link

Use-after-free of ScheduledEvents #647

Closed PengZheng closed 8 months ago

PengZheng commented 10 months ago

I inspected scheduled CI actions, and found the following failure. Though this is very unlikely to be a real bug, which I'll analyze later, I record it for further observation.

https://github.com/apache/celix/actions/runs/6242193992/job/16945710698

[ RUN      ] ScheduledEventTestSuite.ManyScheduledEventTest
[2023-09-20T00:08:14] [  debug] [celix_framework] Framework started event received -> registering framework.ready condition service
[2023-09-20T00:08:14] [   info] [celix_framework] [framework_start:476] Celix framework started
[2023-09-20T00:08:14] [  trace] [celix_framework] [framework_start:480] Celix framework started with uuid BC664877-2CD1-4AC1-BD79-32D8B629A16F
[2023-09-20T00:08:14] [  debug] [celix_framework] [celix_framework_scheduleEvent:2540] Added scheduled event 'Scheduled event test' (id=0) for bundle 'celix_framework' (id=0).
[2023-09-20T00:08:14] [  trace] [celix_framework] [celix_scheduledEvent_process:190] Processing scheduled event Scheduled event test for bundle id 0
[2023-09-20T00:08:14] [  debug] [celix_framework] [celix_framework_processScheduledEvents:1400] Removing processed one-shot scheduled event 'Scheduled event test' (id=0) for bundle if 0.
=================================================================
==15597==ERROR: AddressSanitizer: heap-use-after-free on address 0x6110000eaa00 at pc 0x000102e26137 bp 0x7ff7bd493530 sp 0x7ff7bd493528
READ of size 8 at 0x6110000eaa00 thread T0
    #0 0x102e26136 in celix_scheduledEvent_getId+0x26 (test_framework_with_cxx14:x86_64+0x1003bc136)
    #1 0x102def15f in celix_bundleContext_scheduleEvent+0x10f (test_framework_with_cxx14:x86_64+0x10038515f)
    #2 0x102d78b13 in ScheduledEventTestSuite_ManyScheduledEventTest_Test::TestBody()+0x423 (test_framework_with_cxx14:x86_64+0x10030eb13)
    #3 0x102e61050 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x160 (test_framework_with_cxx14:x86_64+0x1003f7050)
    #4 0x102e60c61 in testing::Test::Run()+0x971 (test_framework_with_cxx14:x86_64+0x1003f6c61)
    #5 0x102e65b26 in testing::TestInfo::Run()+0x9f6 (test_framework_with_cxx14:x86_64+0x1003fbb26)
    #6 0x102e68ba6 in testing::TestSuite::Run()+0x346 (test_framework_with_cxx14:x86_64+0x1003feba6)
    #7 0x102eaae10 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework_with_cxx14:x86_64+0x100440e10)
    #8 0x102ea928c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework_with_cxx14:x86_64+0x10043f28c)
    #9 0x102ea8fe9 in testing::UnitTest::Run()+0x1a9 (test_framework_with_cxx14:x86_64+0x10043efe9)
    #10 0x102ef705f in main+0xef (test_framework_with_cxx14:x86_64+0x10048d05f)
    #11 0x112f5852d in start+0x1cd (dyld:x86_64+0x552d)

0x6110000eaa00 is located 0 bytes inside of 248-byte region [0x6110000eaa00,0x6110000eaaf8)
freed by thread T401 here:
    #0 0x1036a30a9 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4a0a9)
    #1 0x102df3b58 in fw_eventDispatcher+0x498 (test_framework_with_cxx14:x86_64+0x100389b58)
    #2 0x7ff81b1ec4e0 in _pthread_start+0x7c (libsystem_pthread.dylib:x86_64+0x64e0)
    #3 0x7ff81b1e7f6a in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x1f6a)

previously allocated by thread T0 here:
    #0 0x1036a2f60 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x49f60)
    #1 0x102e25bc9 in celix_scheduledEvent_create+0x39 (test_framework_with_cxx14:x86_64+0x1003bbbc9)
    #2 0x102e00c5b in celix_framework_scheduleEvent+0xcb (test_framework_with_cxx14:x86_64+0x100396c5b)
    #3 0x102def15f in celix_bundleContext_scheduleEvent+0x10f (test_framework_with_cxx14:x86_64+0x10038515f)
    #4 0x102d78b13 in ScheduledEventTestSuite_ManyScheduledEventTest_Test::TestBody()+0x423 (test_framework_with_cxx14:x86_64+0x10030eb13)
    #5 0x102e61050 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x160 (test_framework_with_cxx14:x86_64+0x1003f7050)
    #6 0x102e60c61 in testing::Test::Run()+0x971 (test_framework_with_cxx14:x86_64+0x1003f6c61)
    #7 0x102e65b26 in testing::TestInfo::Run()+0x9f6 (test_framework_with_cxx14:x86_64+0x1003fbb26)
    #8 0x102e68ba6 in testing::TestSuite::Run()+0x346 (test_framework_with_cxx14:x86_64+0x1003feba6)
    #9 0x102eaae10 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework_with_cxx14:x86_64+0x100440e10)
    #10 0x102ea928c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework_with_cxx14:x86_64+0x10043f28c)
    #11 0x102ea8fe9 in testing::UnitTest::Run()+0x1a9 (test_framework_with_cxx14:x86_64+0x10043efe9)
    #12 0x102ef705f in main+0xef (test_framework_with_cxx14:x86_64+0x10048d05f)
    #13 0x112f5852d in start+0x1cd (dyld:x86_64+0x552d)

Thread T401 created by T0 here:
    #0 0x10369ca3c in wrap_pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x43a3c)
    #1 0x10325d9a1 in celixThread_create+0x11 (libcelix_utils.2.dylib:x86_64+0x139a1)
    #2 0x102df30a5 in fw_init+0x195 (test_framework_with_cxx14:x86_64+0x1003890a5)
    #3 0x102df4372 in framework_start+0x132 (test_framework_with_cxx14:x86_64+0x10038a372)
    #4 0x102e17770 in celix_frameworkFactory_createFramework+0x120 (test_framework_with_cxx14:x86_64+0x1003ad770)
    #5 0x102a99d99 in celix::createFramework(celix::Properties const&)+0x49 (test_framework_with_cxx14:x86_64+0x10002fd99)
    #6 0x102d9efce in ScheduledEventTestSuite::ScheduledEventTestSuite()+0x3ae (test_framework_with_cxx14:x86_64+0x100334fce)
    #7 0x102da006b in testing::internal::TestFactoryImpl<ScheduledEventTestSuite_ManyScheduledEventTest_Test>::CreateTest()+0x1b (test_framework_with_cxx14:x86_64+0x10033606b)
    #8 0x102e6601c in testing::Test* testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, testing::Test* (testing::internal::TestFactoryBase::*)(), char const*)+0x15c (test_framework_with_cxx14:x86_64+0x1003fc01c)
    #9 0x102e65320 in testing::TestInfo::Run()+0x1f0 (test_framework_with_cxx14:x86_64+0x1003fb320)
    #10 0x102e68ba6 in testing::TestSuite::Run()+0x346 (test_framework_with_cxx14:x86_64+0x1003feba6)
    #11 0x102eaae10 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework_with_cxx14:x86_64+0x100440e10)
    #12 0x102ea928c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework_with_cxx14:x86_64+0x10043f28c)
    #13 0x102ea8fe9 in testing::UnitTest::Run()+0x1a9 (test_framework_with_cxx14:x86_64+0x10043efe9)
    #14 0x102ef705f in main+0xef (test_framework_with_cxx14:x86_64+0x10048d05f)
    #15 0x112f5852d in start+0x1cd (dyld:x86_64+0x552d)

SUMMARY: AddressSanitizer: heap-use-after-free (test_framework_with_cxx14:x86_64+0x1003bc136) in celix_scheduledEvent_getId+0x26
Shadow bytes around the buggy address:
  0x1c220001d4f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c220001d500: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x1c220001d510: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x1c220001d520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c220001d530: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c220001d540:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c220001d550: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x1c220001d560: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x1c220001d570: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c220001d580: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c220001d590: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==15597==ABORTING
PengZheng commented 10 months ago

To be honest, it is really annoying to see any ASAN report on the framework (pubsub ASAN report is more acceptable), especially in the last minutes before a shiny new release.

After celix_scheduledEvent_create returns a newly created event, ASAN reports that we are trying to read its ID via celix_scheduledEvent_getId from a freed memory region. From its size, it is most probably a Schedule Event. From the logging, we can tell there is only one ScheduleEvent ever existing: the first one-shot event. I re-checked the event handling path, and found no double-free.

After the framework startup, there is no further dynamic memory allocation/deallocation except for ScheduleEvent, thus it is not the framework who deallocated a newly created event. The interesting part is ASAN does not report use-after-free within celix_scheduledEvent_create, which initialized the event:

From what I saw, it does not seem like a Celix bug. But let's watch for a longer time if it reproduces. I've encountered some weird memory corruption bugs when KSM is used on some embedded devices in the past. Considering such memory saving techs are often used in cloud environments, my unreliable wild guess is that this issue is one of them.

PengZheng commented 8 months ago

Another use-after-free is reported: https://github.com/apache/celix/actions/runs/6741052039/job/18325045856

[ RUN      ] ScheduledEventTestSuite.OnceShotEventTest
[2023-11-03T03:58:12] [  debug] [celix_framework] Framework started event received -> registering framework.ready condition service
[2023-11-03T03:58:12] [   info] [celix_framework] [framework_start:476] Celix framework started
[2023-11-03T03:58:12] [  trace] [celix_framework] [framework_start:480] Celix framework started with uuid C113B138-D3CF-43A6-BEF1-A44853FBCA1F
[2023-11-03T03:58:12] [  debug] [celix_framework] [celix_framework_scheduleEvent:2540] Added scheduled event 'unnamed' (id=0) for bundle 'celix_framework' (id=0).
[2023-11-03T03:58:12] [  trace] [celix_framework] [celix_scheduledEvent_process:181] Processing scheduled event unnamed for bundle id 0
[2023-11-03T03:58:12] [  debug] [celix_framework] [celix_framework_processScheduledEvents:1400] Removing processed one-shot scheduled event 'unnamed' (id=0) for bundle if 0.
=================================================================
==18763==ERROR: AddressSanitizer: heap-use-after-free on address 0x6110000c6d88 at pc 0x00010b353b8b bp 0x7ff7b4f77430 sp 0x7ff7b4f77428
READ of size 8 at 0x6110000c6d88 thread T0
    #0 0x10b353b8a in celix_scheduledEvent_getId+0x2a (test_framework:x86_64+0x1003cdb8a)
    #1 0x10b31ccef in celix_bundleContext_scheduleEvent+0x10f (test_framework:x86_64+0x100396cef)
    #2 0x10b2a2018 in ScheduledEventTestSuite_OnceShotEventTest_Test::TestBody()+0x2f8 (test_framework:x86_64+0x10031c018)
    #3 0x10b38eab0 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x160 (test_framework:x86_64+0x100408ab0)
    #4 0x10b38e6c1 in testing::Test::Run()+0x971 (test_framework:x86_64+0x1004086c1)
    #5 0x10b393586 in testing::TestInfo::Run()+0x9f6 (test_framework:x86_64+0x10040d586)
    #6 0x10b396606 in testing::TestSuite::Run()+0x346 (test_framework:x86_64+0x100410606)
    #7 0x10b3d8870 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework:x86_64+0x100452870)
    #8 0x10b3d6cec in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework:x86_64+0x100450cec)
    #9 0x10b3d6a49 in testing::UnitTest::Run()+0x1a9 (test_framework:x86_64+0x100450a49)
    #10 0x10b424abf in main+0xef (test_framework:x86_64+0x10049eabf)
    #11 0x11716652d in start+0x1cd (dyld:x86_64+0x552d)

0x6110000c6d88 is located 8 bytes inside of 248-byte region [0x6110000c6d80,0x6110000c6e78)
freed by thread T401 here:
    #0 0x10bb39019 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4b019)
    #1 0x10b3216e8 in fw_eventDispatcher+0x498 (test_framework:x86_64+0x10039b6e8)
    #2 0x7ff8136794e0 in _pthread_start+0x7c (libsystem_pthread.dylib:x86_64+0x64e0)
    #3 0x7ff813674f6a in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x1f6a)

previously allocated by thread T0 here:
    #0 0x10bb38ed0 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4aed0)
    #1 0x10b353619 in celix_scheduledEvent_create+0x39 (test_framework:x86_64+0x1003cd619)
    #2 0x10b32e7eb in celix_framework_scheduleEvent+0xcb (test_framework:x86_64+0x1003a87eb)
    #3 0x10b31ccef in celix_bundleContext_scheduleEvent+0x10f (test_framework:x86_64+0x100396cef)
    #4 0x10b2a2018 in ScheduledEventTestSuite_OnceShotEventTest_Test::TestBody()+0x2f8 (test_framework:x86_64+0x10031c018)
    #5 0x10b38eab0 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x160 (test_framework:x86_64+0x100408ab0)
    #6 0x10b38e6c1 in testing::Test::Run()+0x971 (test_framework:x86_64+0x1004086c1)
    #7 0x10b393586 in testing::TestInfo::Run()+0x9f6 (test_framework:x86_64+0x10040d586)
    #8 0x10b396606 in testing::TestSuite::Run()+0x346 (test_framework:x86_64+0x100410606)
    #9 0x10b3d8870 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework:x86_64+0x100452870)
    #10 0x10b3d6cec in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework:x86_64+0x100450cec)
    #11 0x10b3d6a49 in testing::UnitTest::Run()+0x1a9 (test_framework:x86_64+0x100450a49)
    #12 0x10b424abf in main+0xef (test_framework:x86_64+0x10049eabf)
    #13 0x11716652d in start+0x1cd (dyld:x86_64+0x552d)

Thread T401 created by T0 here:
    #0 0x10bb3299c in wrap_pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4499c)
    #1 0x10b790981 in celixThread_create+0x11 (libcelix_utils.3.dylib:x86_64+0x13981)
    #2 0x10b320c35 in fw_init+0x195 (test_framework:x86_64+0x10039ac35)
    #3 0x10b321f02 in framework_start+0x132 (test_framework:x86_64+0x10039bf02)
    #4 0x10b3452f0 in celix_frameworkFactory_createFramework+0x120 (test_framework:x86_64+0x1003bf2f0)
    #5 0x10afb5919 in celix::createFramework(celix::Properties const&)+0x49 (test_framework:x86_64+0x10002f919)
    #6 0x10b2cd06e in ScheduledEventTestSuite::ScheduledEventTestSuite()+0x3ae (test_framework:x86_64+0x10034706e)
    #7 0x10b2ccc6b in testing::internal::TestFactoryImpl<ScheduledEventTestSuite_OnceShotEventTest_Test>::CreateTest()+0x1b (test_framework:x86_64+0x100346c6b)
    #8 0x10b393a7c in testing::Test* testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, testing::Test* (testing::internal::TestFactoryBase::*)(), char const*)+0x15c (test_framework:x86_64+0x10040da7c)
    #9 0x10b392d80 in testing::TestInfo::Run()+0x1f0 (test_framework:x86_64+0x10040cd80)
    #10 0x10b396606 in testing::TestSuite::Run()+0x346 (test_framework:x86_64+0x100410606)
    #11 0x10b3d8870 in testing::internal::UnitTestImpl::RunAllTests()+0x1420 (test_framework:x86_64+0x100452870)
    #12 0x10b3d6cec in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x15c (test_framework:x86_64+0x100450cec)
    #13 0x10b3d6a49 in testing::UnitTest::Run()+0x1a9 (test_framework:x86_64+0x100450a49)
    #14 0x10b424abf in main+0xef (test_framework:x86_64+0x10049eabf)
    #15 0x11716652d in start+0x1cd (dyld:x86_64+0x552d)

SUMMARY: AddressSanitizer: heap-use-after-free (test_framework:x86_64+0x1003cdb8a) in celix_scheduledEvent_getId+0x2a
Shadow bytes around the buggy address:
  0x1c2200018d60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c2200018d70: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x1c2200018d80: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x1c2200018d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c2200018da0: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c2200018db0: fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c2200018dc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x1c2200018dd0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x1c2200018de0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c2200018df0: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2200018e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18763==ABORTING