apache / celix

Apache Celix is a framework for C and C++14 to develop dynamic modular software applications using component and in-process service-oriented programming.
https://celix.apache.org/
Apache License 2.0
160 stars 85 forks source link

Race Condition and ABBA Deadlock #645

Open PengZheng opened 10 months ago

PengZheng commented 10 months ago

Checking whether a bundle is active in celix_bundleContext_useBundles callback suffers from check-then-act race condition. This race condition is a minor one, but immediate uninstall of a bundle after framework startup may crash the program.

codecov-commenter commented 10 months ago

Codecov Report

Merging #645 (0c2c973) into master (91ff81c) will increase coverage by 0.02%. The diff coverage is 100.00%.

:exclamation: Current head 0c2c973 differs from pull request most recent head beb910e. Consider uploading reports for the commit beb910e to get more accurate results

@@            Coverage Diff             @@
##           master     #645      +/-   ##
==========================================
+ Coverage   81.58%   81.61%   +0.02%     
==========================================
  Files         260      260              
  Lines       34679    34678       -1     
==========================================
+ Hits        28294    28303       +9     
+ Misses       6385     6375      -10     
Files Changed Coverage Δ
libs/framework/src/dm_dependency_manager_impl.c 83.04% <100.00%> (+2.09%) :arrow_up:

... and 3 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

PengZheng commented 10 months ago

This PR manifests a deadlock in test_components_ready. More investigation is needed.

It turns out to be an insidious ABBA deadlock:

  1. When celix_componentsReadyCheck_destroy is called, ready_check's fsmMutex is locked, and celix_componentsReadyCheck_destroy will try to use the Celix event loop.
  2. When celix_componentReadyCheck_check calls celix_dependencyManager_allComponentsActive, which in turn tries to check ready_check's readiness, the Celix event loop is occupied and the current thread tries to acquire ready_check's fsmMutex.

We need to work out a fix for this deadlock before fixing the race condition. @pnoltes A proper fix might be to exclude ready_check from readiness checking so that ready_check's fsmMutex don't have to be acquired for that purpose. For example, the following interface should allow us to deselect ready_check:

CELIX_FRAMEWORK_EXPORT size_t celix_framework_useActiveSelectedBundles(celix_framework_t* fw,
                                                                       bool includeFrameworkBundle,
                                                                       void* callbackHandle,
                                                                       bool (*select)(void* handle, const celix_bundle_t* bnd),
                                                                       void (*use)(void* handle, const celix_bundle_t* bnd));

More generally, useBundlesWithOptions could be added.

Such API changes should be postponed after 2.4.0.

The above proposed fix will not work, because the same ABBA deadlock could happen for other bundles when stopping them.

For now I'll leave this PR open as a reminder.