Open sim642 opened 3 years ago
In https://github.com/goblint/analyzer/pull/618 I realized that these benchmarks have some problems and not immediately usable.
A README says:
Thread creation is modeled by a function
pthread_create
with default signature, which creates a single thread, orpthread_create_N
, which creates several instances at once. Locking is organized by functionsldv_mutex_model_lock
/ldv_mutex_model_unlock
andldv_spin_model_lock
/ldv_spin_model_unlock
.
The benchmarks don't define these LDV locking/unlocking functions themselves in terms of pthread, but rather assume the analyzer itself supports them. Similarly they use pthread_create_N
(and a corresponding pthread_join_N
) which are nonstandard and must be modeled as non-unique creations.
The benchmarks are originally for CPAlockator, which is based on CPAchecker, which has a related configuration file: https://github.com/sosy-lab/cpachecker/blob/trunk/config/includes/lockator/linux.properties. As seen there, it adds the above mentioned locking functions (and a few which weren't mentioned) for special lock handling.
Regarding the special thread creation, that is hard-coded into CPAchecker itself: https://github.com/sosy-lab/cpachecker/blob/2cfbe22b515fe9fedbd7629c62e9b93ddeda0a9a/src/org/sosy_lab/cpachecker/cfa/postprocessing/function/ThreadCreateTransformer.java#L68-L73.
Moreover, it defines a bunch of skippedvariables
for other LDV-specific functions that also don't have definitions.
Somehow Klever has been used to generate these benchmarks from Linux source code. It's very much not described anywhere, how it does the environment modeling etc, but I managed to find this: https://github.com/ldv-klever/klever/blob/master/presets/jobs/specifications/linux/concurrency%20safety/synchronization%20primitives.aspect. That somehow defines the transformations that are applied to original source code to replace them with LDV-specific functions instead.
Since Goblint has been used to analyze non-transformed Linux source code, it actually has a bunch of LibraryFunctions
handling for the original names of the functions (such that locking, etc. is recognized), but not for the LDV-transformed names.
All the benchmarks have the following header comment:
/* Generated by Frama-C */
I'm not sure what that exactly means, but my best guess is that the combining/merging mechanism from Frama-C has been used (a la cilly
). Although it might also have involved something additional (simplifications?).
For example, linux-4.2.6-races/Unsafes/u__linux-concurrency_safety__drivers---net---irda---ksdazzle-sir.ko.cil.i
contains in usb_endpoint_dir_in
the following:
__retres = (int)epd->bEndpointAddress < 0;
Crucially, bEndpointAddress
is of unsigned integer type, so __retres
is always false, which in turn makes most of the code dead.
And that's not the only instance of such useless expression appearing.
When looking for the same function in the source of that version of Linux, the code is completely different:
return ((epd->bEndpointAddress & USB_ENDPOINT_DIR_MASK) == USB_DIR_IN);
These bit operations on an unsigned integer aren't trivially constant, so somewhere along the way the semantics have been changed.
I recently read the following paper about CPALockator: Analysis of Correct Synchronization of Operating System Components. Basically CPALockator is a CPAchecker fork/branch/something that also does thread-modular analysis.
In the evaluation section, they mention having constructed a set of 425 data race detection benchmarks from Linux 4.2.6 device drivers. There's no mention of the availability of this benchmark set, but I managed to find this, which seems like it's the same set: https://gitlab.com/sosy-lab/software/ldv-benchmarks/-/tree/master/linux-4.2.6-races.
I wasn't aware of this earlier and I'm not sure why this didn't get added to sv-benchmarks for data-race. Nevertheless, this might be a useful set of benchmarks for us to look at because:
The same repository also seems to have some data-race benchmarks for Linux 4.18: https://gitlab.com/sosy-lab/software/ldv-benchmarks/-/tree/master/linux-4.18-races. And a handful of commit-based data-race benchmarks in Linux (although most don't have before vs after like the README claims): https://gitlab.com/sosy-lab/software/ldv-benchmarks/-/tree/main/ldv-commits-races.
Some of the benchmarks are in
Unknowns
subdirectories but still contain task definitions with expected verdicts (not sure based on what). These should probably be excluded to be on the safe side.