apache / celix

Apache Celix is a framework for C and C++14 to develop dynamic modular software applications using component and in-process service-oriented programming.
https://celix.apache.org/
Apache License 2.0
160 stars 85 forks source link

Workaround the lifecycle management issue of IImportRegistration. #657

Closed PengZheng closed 10 months ago

PengZheng commented 10 months ago

It avoids a testing crash.

Considering we just make bundle uninstall product-ready in 2.4.0 and OSGi permits uninstalling a bundle independent of other bundles, we can delay #653 until the 3.0.0 release.

codecov-commenter commented 10 months ago

Codecov Report

Merging #657 (9d1f39d) into master (e7aee12) will increase coverage by 0.00%. The diff coverage is 100.00%.

:exclamation: Current head 9d1f39d differs from pull request most recent head de76411. Consider uploading reports for the commit de76411 to get more accurate results

@@           Coverage Diff           @@
##           master     #657   +/-   ##
=======================================
  Coverage   81.60%   81.61%           
=======================================
  Files         260      260           
  Lines       34675    34677    +2     
=======================================
+ Hits        28297    28301    +4     
+ Misses       6378     6376    -2     
Files Coverage Δ
...xx_remote_services/admin/src/RemoteServiceAdmin.cc 78.70% <ø> (ø)
...s/remote_services/topology_manager/tms_tst/main.cc 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

PengZheng commented 10 months ago

The last commit enabling RelWithDebInfo manifests additional memory leaks:

     25 - run_test_tm_scoped (Failed)
     26 - run_test_rsa_dfi (Failed)
     41 - pubsub_websocket_v2_tests (Failed)

I can verify that loading and unloading remote_service_admin_dfi will lead to ASAN report and that adding CELIX_LOAD_BUNDLES_WITH_NODELETE=true will make the issue disappear.

I suspect they relate to some global shared object resource.

PengZheng commented 10 months ago

One leak turns out to be caused by mg_init_library:

This function is new in version 1.9 (as dummy implementation) and effective only from version 1.10. For compatibility reasons, other functions (such as mg_start();) will initialize the required features as well, but they will no longer do a de-initialization, leaving a memory leak when the library is unloaded.

https://github.com/civetweb/civetweb/blob/master/docs/api/mg_init_library.md

This shows that doing global initialization (for CURL/civetweb) inside a customized launcher library is a must. @xuzhenbao It also shows that moving away from do global initialization inside the framework is correct, and that we shall emphasize this in the documentation to help our users avoid the same mistakes. @pnoltes

PengZheng commented 10 months ago

Another leak in run_test_tm_scoped is caused by tramp_table_alloc in libffi. The memory allocated is attached to global tramp_globals in libffi, and libffi provides no explicit way to deallocate it. If we keep libffi in address space during the lifetime of the PROGRAM, then there is no risk of memory leak. Otherwise if we keep loading/unloading libffi, then the memory leaks are real.

The solution is simple, link libffi to the executable:

target_link_libraries(test_tm_scoped PRIVATE
        Celix::framework
        GTest::gtest
        jansson::jansson
        calculator_api
        Celix::rsa_common
        CURL::libcurl
        civetweb::civetweb
        libffi::libffi
)

Again, we should remind our users to link libffi to their custom launcher if they need Celix components depend on libffi.

PengZheng commented 10 months ago

The failure of run_test_rsa_dfi brings libxml2 into attention. As noted in its documentation of xmlCleanupParser:

It's sometimes very hard to guess if libxml2 is in use in the application, some libraries or plugins may use it without notice. In case of doubt abstain from calling this function or do it just before calling exit() to avoid leak reports from valgrind !

https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-parser.html#xmlCleanupParser And we are a plugin framework! Therefore, we should warn our user of this kind of pitfall.

PengZheng commented 10 months ago

Whew! I have fixed these leaks lurking in our tests for a very long time. Note that to debug such issue, linux ASLR should be turned off by echo 0 | sudo tee /proc/sys/kernel/randomize_va_space so that we can ask debugger which shared object these leakers reported by ASAN belong to at a appropriate time when the offending shared object is still in address space.

IMHO, we should warn our users very loudly in our documentation of these shared object initialization/de-initialization pitfalls. Also, it should be pointed out to our users that the standard place to deal with these is a customized launcher. Last but not the lease, libffi is very special in that there is no way to clean its global allocation (correct me if I were wrong), thus it should be kept in address space if possible (by linking it to the executable). Keep loading and unloading libffi will wreak a havoc.

WDYT? @pnoltes @rlenferink

pnoltes commented 10 months ago

Whew! I have fixed these leaks lurking in our tests for a very long time. Note that to debug such issue, linux ASLR should be turned off by echo 0 | sudo tee /proc/sys/kernel/randomize_va_space so that we can ask debugger which shared object these leakers reported by ASAN belong to at a appropriate time when the offending shared object is still in address space.

IMHO, we should warn our users very loudly in our documentation of these shared object initialization/de-initialization pitfalls. Also, it should be pointed out to our users that the standard place to deal with these is a customized launcher. Last but not the lease, libffi is very special in that there is no way to clean its global allocation (correct me if I were wrong), thus it should be kept in address space if possible (by linking it to the executable). Keep loading and unloading libffi will wreak a havoc.

WDYT? @pnoltes @rlenferink

I agree, both in warning for init/deinit library requirements in the documentation and that this should be solved in a custom launcher.

An alternative to consider is to provided init/deinit libs that use the __attribute__((constructor)) / __attribute__((destructor)) to initialize / deinitialize libraries. These library then needs to be linked against the generated launcher.

But IMO it is clearer to do this in the launcher. Maybe some effort can be made to make it possible that a custom launcher can easily use the PROPERTIES provided in the add_celix_container (as is currently done in the generated main.c/main.cc file).

pnoltes commented 10 months ago

IMO this PR is also needed for the 2.4.0 release, so I will wait for a merge before starting a new vote.

PengZheng commented 10 months ago

An alternative to consider is to provided init/deinit libs that use the attribute((constructor)) / attribute((destructor)) to initialize / deinitialize libraries. These library then needs to be linked against the generated launcher.

But IMO it is clearer to do this in the launcher. Maybe some effort can be made to make it possible that a custom launcher can easily use the PROPERTIES provided in the add_celix_container (as is currently done in the generated main.c/main.cc file).

Indeed, manual initialization in launcher is less than ideal. The biggest trouble is these initialization functions have options, e.g. mg_init_library. Different bundles may have different requirements, and thus set different options.

PengZheng commented 10 months ago

It's interesting to observe that libxml2 does have its destructor defined:

https://github.com/GNOME/libxml2/blob/c7ff438b830d8ebeac684f3f8ad3229587c88373/threads.c#L665-L677

#if defined(HAVE_ATTRIBUTE_DESTRUCTOR) && !defined(LIBXML_STATIC) && \
    !defined(_WIN32)
static void
ATTRIBUTE_DESTRUCTOR
xmlDestructor(void) {
    /*
     * Calling custom deallocation functions in a destructor can cause
     * problems, for example with Nokogiri.
     */
    if (xmlFree == free)
        xmlCleanupParser();
}
#endif

Thus there is no need to call xmlCleanupParser in our application code, unless we are using static libxml2.