charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
203 stars 49 forks source link

Data races in global variable initialization, from command-line arguments and otherwise #538

Closed PhilMiller closed 9 years ago

PhilMiller commented 10 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/538


// Miscellaneous global variable init
SUMMARY: ThreadSanitizer: data race isomalloc.c:2033 init_ranges
SUMMARY: ThreadSanitizer: data race isomalloc.c:2247 init_ranges
SUMMARY: ThreadSanitizer: data race convcore.c:319 CmiArgInit
SUMMARY: ThreadSanitizer: data race convcore.c:951 CmiTimerInit
SUMMARY: ThreadSanitizer: data race memory.c:305 CmiMemoryInit
SUMMARY: ThreadSanitizer: data race threads.c:1586 CthInit
SUMMARY: ThreadSanitizer: data race sockRoutines.c:51 skt_set_abort
SUMMARY: ThreadSanitizer: data race NullLB.C:70 NullLB::init()

// Command-line argument initialization
SUMMARY: ThreadSanitizer: data race convcore.c:2124 CsdInit
SUMMARY: ThreadSanitizer: data race convcore.c:941 CmiTimerInit
SUMMARY: ThreadSanitizer: data race conv-ccs.c:568 CcsInit
SUMMARY: ThreadSanitizer: data race cputopology.C:398 LrtsInitCpuTopo
SUMMARY: ThreadSanitizer: data race ck.C:2578 CkMessageWatcherInit(char**, CkCoreState*)

SUMMARY: ThreadSanitizer: data race register.C:142 CkDisableTracing // init calls getting tracing disabled

These probably need locks or checks for rank-0 only

PhilMiller commented 5 years ago

Original date: 2014-07-28 22:57:05


Forgot one, now added above.

ericjbohm commented 5 years ago

Original date: 2014-10-14 16:13:13


Most of these have been easy to resolve by extending an existing exclusion zone (typically CmiRank==0) to include the adjacent initialization. A few might require the introduction of stronger ordering methods.

PhilMiller commented 5 years ago

Original date: 2014-11-09 16:56:51


Ping. We'd like to have a fix for this merged before the core meeting. This should be fairly simple, and would make a substantial difference for people who want to use ThreadSanitizer.

ericjbohm commented 5 years ago

Original date: 2014-11-10 16:49:58


I have most of these fixed. Should it go in one commit?

PhilMiller commented 5 years ago

Original date: 2014-11-10 17:06:57


One commit should be fine, unless there's something about some particular change that might need more attention

ericjbohm commented 5 years ago

Original date: 2014-11-11 19:34:05


Most of these were pretty straightforward to implement, but there was a lot of code reading to determine the minimal safe intervention. Also found that fixes for the initialization of conv-ccs made the sockroutines problem go away and had to go digging to make sure that was correct.

ericjbohm commented 5 years ago

Original date: 2014-12-02 18:51:24


I ran into a bug with the isomalloc fix that may take some time to unravel, that was where I left off with this before SC14. I haven't had a chance to get back to it yet.

PhilMiller commented 5 years ago

Original date: 2015-02-10 21:20:52


The NullLB fix isn't quite right, since other threads are still going to read from _theNullLB, and they'll just see a zero group ID, which causes a crash. I guess that assignment should be locked rather than gated on rank == 0.

ericjbohm commented 5 years ago

Original date: 2015-09-01 18:17:03


This was fixed months ago.

PhilMiller commented 5 years ago

Original date: 2015-11-11 03:30:57


https://charm.cs.illinois.edu/gerrit/429 https://github.com/UIUC-PPL/charm/commit/be9a0cef3e724e97a3abbc5bf19cd8ca53670cb5

I'll open a separate issue to follow up on my comment about potential problems with _theNullLB