apache / celix

Apache Celix is a framework for C and C++14 to develop dynamic modular software applications using component and in-process service-oriented programming.
https://celix.apache.org/
Apache License 2.0
158 stars 84 forks source link

Unlimited Spin of Glibc's Read-write-lock Implementation #739

Open PengZheng opened 3 months ago

PengZheng commented 3 months ago

Last week, I investigated a Read-write-lock implementation issue affecting ALL versions of Glibc since 2.25, which is detailed in the following ML thread: https://sourceware.org/pipermail/libc-alpha/2024-March/155278.html

In summary, a reader of high RT priority that can not acquire its lock can do unlimited spin (eating all available CPUs) while a writer that holds its lock can not stop the reader from spinning because it has no chance to run.

Considering that rwlock is used in the central piece of our framework and glibc is the most extensively used C library, we should pay close attention to the progress of this issue.

Note that musl does not suffer from this issue, since it only does limited spin (up to 100 times, check the following email for an example). Neither is uclibc affected.

Event if Glibc addresses this issue quickly, we should warn our users of this issue. If it were ignored, then we may need to implement our own rdlock in the worst case. @pnoltes @xuzhenbao

Bug Report: https://sourceware.org/bugzilla/show_bug.cgi?id=31477

PengZheng commented 3 months ago

The unlimited spin is introduced by this commit: https://sourceware.org/git/?p=glibc.git;a=commit;h=cc25c8b4c1196a8c29e9a45b1e096b99a87b7f8c

The current glibc rwlock is completely unusable together with real time priority tasks, though it is OK to use with SCHED_OTHER. Considering the current design is super complex, I don't expect a upstream fix will be available in a year or two. The workaround in my day job is to revert it to Ulrich Drepper's original design and implementation.

PengZheng commented 1 month ago

It seems that the glibc upstream is not interested in fixing it, so here is my fix for glibc 2.29: https://github.com/PengZheng/glibc/commits/release/2.29/rw_fix/ 03a1fca315a07800639acc5b333d5c08cc00fba9

pnoltes commented 1 month ago

Interesting issues.

I could be good to warn users, maybe in the CHANGES.md (known issues), but this can of course also occur in already released Apache Celix versions.

But this also triggers me that we are currently not building and testing using musl or uclibc, is that something we should also consider?

PengZheng commented 1 month ago

But this also triggers me that we are currently not building and testing using musl or uclibc, is that something we should also consider?

Yes. As for uclibc, we may use uclibc-ng instead, which is still actively maintained. IIRC, toolchains using uclibc does not have complete support of C++14.

I am also considering RTOS support, before which we need to support both static bundle and overall static build.