Closed PengZheng closed 10 months ago
Merging #657 (9d1f39d) into master (e7aee12) will increase coverage by
0.00%
. The diff coverage is100.00%
.:exclamation: Current head 9d1f39d differs from pull request most recent head de76411. Consider uploading reports for the commit de76411 to get more accurate results
@@ Coverage Diff @@
## master #657 +/- ##
=======================================
Coverage 81.60% 81.61%
=======================================
Files 260 260
Lines 34675 34677 +2
=======================================
+ Hits 28297 28301 +4
+ Misses 6378 6376 -2
Files | Coverage Δ | |
---|---|---|
...xx_remote_services/admin/src/RemoteServiceAdmin.cc | 78.70% <ø> (ø) |
|
...s/remote_services/topology_manager/tms_tst/main.cc | 100.00% <100.00%> (ø) |
... and 4 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
The last commit enabling RelWithDebInfo manifests additional memory leaks:
25 - run_test_tm_scoped (Failed)
26 - run_test_rsa_dfi (Failed)
41 - pubsub_websocket_v2_tests (Failed)
I can verify that loading and unloading remote_service_admin_dfi will lead to ASAN report and that adding CELIX_LOAD_BUNDLES_WITH_NODELETE=true
will make the issue disappear.
I suspect they relate to some global shared object resource.
One leak turns out to be caused by mg_init_library
:
This function is new in version 1.9 (as dummy implementation) and effective only from version 1.10. For compatibility reasons, other functions (such as mg_start();) will initialize the required features as well, but they will no longer do a de-initialization, leaving a memory leak when the library is unloaded.
https://github.com/civetweb/civetweb/blob/master/docs/api/mg_init_library.md
This shows that doing global initialization (for CURL/civetweb) inside a customized launcher library is a must. @xuzhenbao It also shows that moving away from do global initialization inside the framework is correct, and that we shall emphasize this in the documentation to help our users avoid the same mistakes. @pnoltes
Another leak in run_test_tm_scoped
is caused by tramp_table_alloc
in libffi.
The memory allocated is attached to global tramp_globals
in libffi, and libffi provides no explicit way to deallocate it.
If we keep libffi in address space during the lifetime of the PROGRAM, then there is no risk of memory leak.
Otherwise if we keep loading/unloading libffi, then the memory leaks are real.
The solution is simple, link libffi to the executable:
target_link_libraries(test_tm_scoped PRIVATE
Celix::framework
GTest::gtest
jansson::jansson
calculator_api
Celix::rsa_common
CURL::libcurl
civetweb::civetweb
libffi::libffi
)
Again, we should remind our users to link libffi to their custom launcher if they need Celix components depend on libffi.
The failure of run_test_rsa_dfi
brings libxml2 into attention.
As noted in its documentation of xmlCleanupParser
:
It's sometimes very hard to guess if libxml2 is in use in the application, some libraries or plugins may use it without notice. In case of doubt abstain from calling this function or do it just before calling exit() to avoid leak reports from valgrind !
https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-parser.html#xmlCleanupParser And we are a plugin framework! Therefore, we should warn our user of this kind of pitfall.
Whew! I have fixed these leaks lurking in our tests for a very long time.
Note that to debug such issue, linux ASLR should be turned off by echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
so that we can ask debugger which shared object these leakers reported by ASAN belong to at a appropriate time when the offending shared object is still in address space.
IMHO, we should warn our users very loudly in our documentation of these shared object initialization/de-initialization pitfalls.
Also, it should be pointed out to our users that the standard place to deal with these is a customized launcher.
Last but not the lease, libffi
is very special in that there is no way to clean its global allocation (correct me if I were wrong), thus it should be kept in address space if possible (by linking it to the executable). Keep loading and unloading libffi will wreak a havoc.
WDYT? @pnoltes @rlenferink
Whew! I have fixed these leaks lurking in our tests for a very long time. Note that to debug such issue, linux ASLR should be turned off by
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
so that we can ask debugger which shared object these leakers reported by ASAN belong to at a appropriate time when the offending shared object is still in address space.IMHO, we should warn our users very loudly in our documentation of these shared object initialization/de-initialization pitfalls. Also, it should be pointed out to our users that the standard place to deal with these is a customized launcher. Last but not the lease,
libffi
is very special in that there is no way to clean its global allocation (correct me if I were wrong), thus it should be kept in address space if possible (by linking it to the executable). Keep loading and unloading libffi will wreak a havoc.WDYT? @pnoltes @rlenferink
I agree, both in warning for init/deinit library requirements in the documentation and that this should be solved in a custom launcher.
An alternative to consider is to provided init/deinit libs that use the __attribute__((constructor))
/ __attribute__((destructor))
to initialize / deinitialize libraries. These library then needs to be linked against the generated launcher.
But IMO it is clearer to do this in the launcher. Maybe some effort can be made to make it possible that a custom launcher can easily use the PROPERTIES
provided in the add_celix_container
(as is currently done in the generated main.c/main.cc file).
IMO this PR is also needed for the 2.4.0 release, so I will wait for a merge before starting a new vote.
An alternative to consider is to provided init/deinit libs that use the attribute((constructor)) / attribute((destructor)) to initialize / deinitialize libraries. These library then needs to be linked against the generated launcher.
But IMO it is clearer to do this in the launcher. Maybe some effort can be made to make it possible that a custom launcher can easily use the PROPERTIES provided in the add_celix_container (as is currently done in the generated main.c/main.cc file).
Indeed, manual initialization in launcher is less than ideal.
The biggest trouble is these initialization functions have options, e.g. mg_init_library
. Different bundles may have different requirements, and thus set different options.
It's interesting to observe that libxml2 does have its destructor defined:
https://github.com/GNOME/libxml2/blob/c7ff438b830d8ebeac684f3f8ad3229587c88373/threads.c#L665-L677
#if defined(HAVE_ATTRIBUTE_DESTRUCTOR) && !defined(LIBXML_STATIC) && \
!defined(_WIN32)
static void
ATTRIBUTE_DESTRUCTOR
xmlDestructor(void) {
/*
* Calling custom deallocation functions in a destructor can cause
* problems, for example with Nokogiri.
*/
if (xmlFree == free)
xmlCleanupParser();
}
#endif
Thus there is no need to call xmlCleanupParser
in our application code, unless we are using static libxml2.
It avoids a testing crash.
Considering we just make bundle uninstall product-ready in 2.4.0 and OSGi permits uninstalling a bundle independent of other bundles, we can delay #653 until the 3.0.0 release.