sync objects on stack cause both false positives and false negatives

GoogleCodeExporter commented 9 years ago

Tsan does not reset state of sync object on stack when they go out of scope. A 
brand new sync object created at the same address in new frame inherits all 
state from the previous one.

This can cause false negatives for race detector, because tsan sees fake HB 
edges between the old and the new sync objects.

This can also cause false positives for deadlock detector, because tsan 
effectively merges two unrelated mutexes and all edges between them.

Several aspects to consider:
- the same actually applies to heap, if the program recreates a new sync object 
on the same place by explicitly calling dtor/ctor
- we can't remove sync objects from meta map if there can be concurrent 
accesses, because linked lists in the meta map are append-only
- sync objects can be recreated on stack/globals in the same way as for heap
- we can't reset state of sync objects in ctor (e.g. pthread_mutex_ctor) in all 
cases because atomics do not have ctors and because of global 
LINKER_INITIALIZED mutexes
- we can't reset remove sync objects from meta map in dtors even for stack, 
because if it is an explicit call to dtor, there can be concurrent accesses to 
the same 8-bytes from a different thread and the meta map is append-only

There are several solutions of varying complexity depending on what from the 
above we want to fix.
We can remember SP in func_entry and reset sync objects in func_exit. Care must 
be taken to not slow down execution in common case.
We can reset some sync objects in ctors, e.g. pthread_mutex_init should be safe.
We can reset some sync objects in dtors (but not remove them from meta map), 
e.g. pthread_mutex_destroy should be safe.

Original issue reported on code.google.com by dvyu...@google.com on 17 Dec 2014 at 9:51

GoogleCodeExporter commented 9 years ago

Will it help to move the local vars to a fake stack?

Does the problem apply to the case when several mutexes created in local scopes 
share a single stack slot?

Original comment by gli...@google.com on 17 Dec 2014 at 9:58

GoogleCodeExporter commented 9 years ago

> Will it help to move the local vars to a fake stack?

I don't see how it changes anything.

> Does the problem apply to the case when several mutexes created in local 
scopes share a single stack slot?

Yes, and it is even more difficult to handle. However, better handling of 
thread_mutex_init/destroy will help to some degree.

Original comment by dvyu...@google.com on 17 Dec 2014 at 10:00

GoogleCodeExporter commented 9 years ago

Submitted a test case for one of the possible reincarnation of the issue:
http://llvm.org/viewvc/llvm-project?view=revision&revision=224422
Other possible reincarnations are: reuse of globals/heap, explicit dtor/ctor, 
reuse of stack space in the same stack frame.

Original comment by dvyu...@google.com on 17 Dec 2014 at 10:21

GoogleCodeExporter commented 9 years ago

See also issue 70

Original comment by dvyu...@google.com on 17 Dec 2014 at 10:26

Ramki-Ravindran / thread-sanitizer

sync objects on stack cause both false positives and false negatives #87