apache / celix

Apache Celix is a framework for C and C++14 to develop dynamic modular software applications using component and in-process service-oriented programming.
https://celix.apache.org/
Apache License 2.0
162 stars 86 forks source link

Feature/585 celix conditions #588

Closed pnoltes closed 1 year ago

pnoltes commented 1 year ago

This PR introduces the celix_condition service interface and several framework-specific celix_condition services. The celix_condition is based on the OSGi Condition Specification, but has been adapted for Apache Celix / C.

This PR includes:

PengZheng commented 1 year ago

Please note that ScheduledEventTestSuite.CxxWaitForScheduledEvent failed.

PengZheng commented 1 year ago

The crash of CelixFrameworkUtilsErrorInjectionTestSuite.testIsBundleUrlValid is very interesting.

https://github.com/apache/celix/actions/runs/5525141973/jobs/10078372242

[2023-07-11T21:47:10] AddressSanitizer[  error] [celix_framework] [celix_framework_utils_resolveFileBundleUrl:96] Failed(No such file or directory) to resolve bundle location 'non-existing.zip', taking into account the cwd and Celix bundle path 'bundles'.
:DEADLYSIGNAL
=================================================================
[2023-07-11T21:47:10] [  error] [celix_framework] [celix_framework_utils_locateEmbeddedBundle:147] Framework exception(0x11177): "Failed to locate embedded bundle symbols for bundle 'embedded://simple_test_bundle1'.";
 Cause: Inject dl error
[2023-07-11T21:47:10] [  error] [celix_framework] [celix_framework_utils_locateEmbeddedBundle:147] Cannot allocate memory(0xc): "Failed to locate embedded bundle symbols for bundle 'embedded://simple_test_bundle1'.";
 Cause: Cannot allocate memory
[2023-07-11T21:47:10] [  error] [celix_framework] [celix_framework_utils_locateEmbeddedBundle:147] Cannot allocate memory(0xc): "Failed to locate embedded bundle symbols for bundle 'embedded://simple_test_bundle1'.";
 Cause: Cannot allocate memory
==35135==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f1cd907e839 bp 0x7f1cd3cfe9c0 sp 0x7f1cd3cfe128 T13)
==35135==The signal is caused by a READ memory access.
==35135==Hint: address points to the zero page.
    #0 0x7f1cd907e838 in __sanitizer::internal_strnlen(char const*, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cc:212
    #1 0x7f1cd8fce234 in __interceptor_strndup ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:379
    #2 0x559452ee5d88 in serviceRegistration_createInternal /home/runner/work/celix/celix/libs/framework/src/service_registration.c:54
    #3 0x559452ee5a6e in serviceRegistration_create /home/runner/work/celix/celix/libs/framework/src/service_registration.c:35
    #4 0x559452ee86b1 in serviceRegistry_registerServiceInternal /home/runner/work/celix/celix/libs/framework/src/service_registry.c:194
    #5 0x559452eec2f4 in celix_serviceRegistry_registerService /home/runner/work/celix/celix/libs/framework/src/service_registry.c:734
    #6 0x559452ed07ed in fw_handleEventRequest /home/runner/work/celix/celix/libs/framework/src/framework.c:1374
    #7 0x559452ed1225 in fw_handleEvents /home/runner/work/celix/celix/libs/framework/src/framework.c:1432
    #8 0x559452ed28a9 in fw_eventDispatcher /home/runner/work/celix/celix/libs/framework/src/framework.c:1597
    #9 0x7f1cd8414608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8608)
    #10 0x7f1cd7fed132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132)

It was caused by celix_utils_strdup failure in one of celix_framework_registerServiceAsync calls introduced by this PR.

Of course, CELIX_EI_UNKNOWN_CALLER should be avoided at the first place. This crash tells us more: an unexpected service registration together with corresponding events process could interact with tests more subtly than we thought.

If we remove "components.ready" for now, adding celix_framework_waitForEmptyEventQueue after a framework instance is created should protect us from most hazards in tests. In production, we shall never do that since it will hurt startup speed.

If we keep "components.ready", this will be more difficult to fix.

pnoltes commented 1 year ago

Please note that ScheduledEventTestSuite.CxxWaitForScheduledEvent failed.

I though I had this covered, but I will look into this.

pnoltes commented 1 year ago

The crash of CelixFrameworkUtilsErrorInjectionTestSuite.testIsBundleUrlValid is very interesting.

https://github.com/apache/celix/actions/runs/5525141973/jobs/10078372242

[2023-07-11T21:47:10] AddressSanitizer[  error] [celix_framework] [celix_framework_utils_resolveFileBundleUrl:96] Failed(No such file or directory) to resolve bundle location 'non-existing.zip', taking into account the cwd and Celix bundle path 'bundles'.
:DEADLYSIGNAL
=================================================================
[2023-07-11T21:47:10] [  error] [celix_framework] [celix_framework_utils_locateEmbeddedBundle:147] Framework exception(0x11177): "Failed to locate embedded bundle symbols for bundle 'embedded://simple_test_bundle1'.";
 Cause: Inject dl error
[2023-07-11T21:47:10] [  error] [celix_framework] [celix_framework_utils_locateEmbeddedBundle:147] Cannot allocate memory(0xc): "Failed to locate embedded bundle symbols for bundle 'embedded://simple_test_bundle1'.";
 Cause: Cannot allocate memory
[2023-07-11T21:47:10] [  error] [celix_framework] [celix_framework_utils_locateEmbeddedBundle:147] Cannot allocate memory(0xc): "Failed to locate embedded bundle symbols for bundle 'embedded://simple_test_bundle1'.";
 Cause: Cannot allocate memory
==35135==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f1cd907e839 bp 0x7f1cd3cfe9c0 sp 0x7f1cd3cfe128 T13)
==35135==The signal is caused by a READ memory access.
==35135==Hint: address points to the zero page.
    #0 0x7f1cd907e838 in __sanitizer::internal_strnlen(char const*, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cc:212
    #1 0x7f1cd8fce234 in __interceptor_strndup ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:379
    #2 0x559452ee5d88 in serviceRegistration_createInternal /home/runner/work/celix/celix/libs/framework/src/service_registration.c:54
    #3 0x559452ee5a6e in serviceRegistration_create /home/runner/work/celix/celix/libs/framework/src/service_registration.c:35
    #4 0x559452ee86b1 in serviceRegistry_registerServiceInternal /home/runner/work/celix/celix/libs/framework/src/service_registry.c:194
    #5 0x559452eec2f4 in celix_serviceRegistry_registerService /home/runner/work/celix/celix/libs/framework/src/service_registry.c:734
    #6 0x559452ed07ed in fw_handleEventRequest /home/runner/work/celix/celix/libs/framework/src/framework.c:1374
    #7 0x559452ed1225 in fw_handleEvents /home/runner/work/celix/celix/libs/framework/src/framework.c:1432
    #8 0x559452ed28a9 in fw_eventDispatcher /home/runner/work/celix/celix/libs/framework/src/framework.c:1597
    #9 0x7f1cd8414608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8608)
    #10 0x7f1cd7fed132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132)

It was caused by celix_utils_strdup failure in one of celix_framework_registerServiceAsync calls introduced by this PR.

Of course, CELIX_EI_UNKNOWN_CALLER should be avoided at the first place. This crash tells us more: an unexpected service registration together with corresponding events process could interact with tests more subtly than we thought.

If we remove "components.ready" for now, adding celix_framework_waitForEmptyEventQueue after a framework instance is created should protect us from most hazards in tests. In production, we shall never do that since it will hurt startup speed.

If we keep "components.ready", this will be more difficult to fix.

Yeah indeed interesting and good that you already analyzed the root cause. I did introduce a "CELIX_FRAMEWORK_CONDITION_SERVICES_ENABLED" config property so that the condition services can be disabled when testing. But this is a warning that we should be careful with timing sensitive actions in the framework, especially in combination with error injection test suites.

codecov-commenter commented 1 year ago

Codecov Report

Merging #588 (ab9f1f8) into master (d40cada) will increase coverage by 0.04%. The diff coverage is 99.65%.

:exclamation: Current head ab9f1f8 differs from pull request most recent head 43a9131. Consider uploading reports for the commit 43a9131 to get more accurate results

@@            Coverage Diff             @@
##           master     #588      +/-   ##
==========================================
+ Coverage   78.56%   78.60%   +0.04%     
==========================================
  Files         234      237       +3     
  Lines       35431    35623     +192     
==========================================
+ Hits        27836    28002     +166     
- Misses       7595     7621      +26     
Files Changed Coverage Δ
libs/framework/src/framework.c 84.95% <98.86%> (+1.97%) :arrow_up:
...nts_ready_check/src/celix_components_ready_check.c 100.00% <100.00%> (ø)
...check/src/celix_components_ready_check_activator.c 100.00% <100.00%> (ø)
bundles/shell/shell/src/query_command.c 79.36% <100.00%> (-4.64%) :arrow_down:
libs/framework/include/celix/Framework.h 100.00% <100.00%> (ø)
libs/framework/include/celix/FrameworkFactory.h 100.00% <100.00%> (ø)
libs/framework/include/celix/UseServiceBuilder.h 100.00% <100.00%> (ø)
libs/framework/src/celix_framework_bundle.c 100.00% <100.00%> (ø)
libs/framework/src/celix_framework_factory.c 100.00% <100.00%> (+12.50%) :arrow_up:

... and 8 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

pnoltes commented 1 year ago

Added a ENABLE_TESTING_ON_CI cmake/conan option that is enabled when building on CI servers.

The ENABLE_TESTING_ON_CI is used in the scheduled event and components_ready_check tests to increase to allowed timing error margin when running on a CI environment.

Note that on my physical systems the error margin is a factor 10 stricter, but - based on the flaky test results - this is not possible on the virtual CI servers. And because the goal of the scheduled events is not to support very precise or high frequent scheduled events, IMO relaxing the timing requirements on the CI servers is acceptable.

pnoltes commented 1 year ago

Separated the "components.,ready" from the framework bundle to a separate bundle named "components_ready_check". Also decreased the component.ready check frequency, which was (incorrectly) too high.