epics-modules / mrfioc2

EPICS driver for Micro Research Finland event timing system devices
http://epics-modules.github.io/mrfioc2/
Other
8 stars 30 forks source link

Changing :TimeSrc-Sel at runtime causes EVR to lock-up #25

Closed hinxx closed 4 years ago

hinxx commented 4 years ago

I'm starting the EVR IOC with :TimeSrc-Sel PV set to 2. After I change the PV value to 0 the IOC locks-up; OPI goes pink

Here is what I get from gdb backtrace - provoked by trying to exit the IOC shell:

epics> exit
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bc8de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007ffff7bc8cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000005980c6 in mutexLock (id=0x842770) at ../../src/osi/os/posix/osdMutex.c:46
#4  epicsMutexOsdLock (pmutex=0x842770) at ../../src/osi/os/posix/osdMutex.c:130
#5  0x0000000000590f5c in epicsMutex::lock (this=this@entry=0x842750) at ../../src/osi/epicsMutex.cpp:275
#6  0x000000000048d5a4 in epicsGuard (mutexIn=..., this=<synthetic pointer>) at /data/bdee/devel/evr_dev/epics-base/include/epicsGuard.h:70
#7  mrf::Object::visitObjects (cb=0x466de0 <disableIRQ(mrf::Object*, void*)>, arg=0x0) at ../object.cpp:153
#8  0x000000000058c063 in epicsExitCallAtExitsPvt (pep=<optimized out>) at ../../src/misc/epicsExit.c:97
#9  epicsExitCallAtExits () at ../../src/misc/epicsExit.c:115
#10 0x000000000058c408 in epicsExit (status=0) at ../../src/misc/epicsExit.c:185
#11 0x0000000000465c6d in main (argc=<optimized out>, argv=<optimized out>) at ../superMain.cpp:21
(gdb) c
Continuing.

^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) info threads 
  Id   Target Id         Frame 
  35   Thread 0x7fffaf97a700 (LWP 24053) "TimeStampTimeou" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  34   Thread 0x7fffaf9fb700 (LWP 24052) "SoftTimeSrc" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  33   Thread 0x7fffafbfc700 (LWP 24044) "CAS-client" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  32   Thread 0x7fffafcfd700 (LWP 24043) "CAS-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  31   Thread 0x7fffafefe700 (LWP 24030) "CAS-client" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  30   Thread 0x7fffaffff700 (LWP 24029) "CAS-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23   Thread 0x7ffff4164700 (LWP 23993) "CAC-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22   Thread 0x7ffff41e5700 (LWP 23992) "CAS-beacon" 0x00007ffff7bcde3d in nanosleep () from /lib64/libpthread.so.0
  21   Thread 0x7ffff42e6700 (LWP 23991) "CAS-UDP" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  20   Thread 0x7ffff43e7700 (LWP 23990) "CAS-TCP" 0x00007ffff7bcd97d in accept () from /lib64/libpthread.so.0
  19   Thread 0x7ffff45e8700 (LWP 23988) "scan-0.1" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  18   Thread 0x7ffff47e9700 (LWP 23987) "scan-0.2" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  17   Thread 0x7ffff49ea700 (LWP 23986) "scan-0.5" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7ffff4beb700 (LWP 23985) "scan-1" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  15   Thread 0x7ffff4dec700 (LWP 23984) "scan-2" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x7ffff4fed700 (LWP 23983) "scan-5" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  13   Thread 0x7ffff51ee700 (LWP 23982) "scan-10" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  12   Thread 0x7ffff53ef700 (LWP 23981) "scanOnce" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7ffff57d0700 (LWP 23980) "dbCaLink" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7ffff59d1700 (LWP 23979) "cbHigh" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7ffff5bd2700 (LWP 23978) "cbMedium" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7ffff5dd3700 (LWP 23977) "cbLow" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  7    Thread 0x7ffff5ed4700 (LWP 23976) "timerQueue" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7ffff5f55700 (LWP 23975) "taskwd" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7ffff7f12700 (LWP 23974) "PCIISR0000:0a:0" 0x00007ffff7bcd6fd in read () from /lib64/libpthread.so.0
  4    Thread 0x7ffff6756700 (LWP 23973) "EVRFIFO" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  3    Thread 0x7ffff7fd3700 (LWP 23971) "errlog" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7ffff7fd5740 (LWP 23966) "evrApp" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) thread 34
[Switching to thread 34 (Thread 0x7fffaf9fb700 (LWP 24052))]
#0  0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bc8de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007ffff7bc8cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000005980c6 in mutexLock (id=0x8f7a70) at ../../src/osi/os/posix/osdMutex.c:46
#4  epicsMutexOsdLock (pmutex=0x8f7a70) at ../../src/osi/os/posix/osdMutex.c:130
#5  0x0000000000590f5c in epicsMutex::lock (this=this@entry=0x8f0a90) at ../../src/osi/epicsMutex.cpp:275
#6  0x0000000000469f6f in scopedLock (lock=true, mutex=..., this=<synthetic pointer>) at ../../../include/mrfCommon.h:212
#7  EVRMRM::setEvtCode (this=0x8f09f0, code=125) at ../drvem.cpp:1009
#8  0x000000000048b538 in runSrc (this=0x8fc7b0) at ../mrmtimesrc.cpp:110
#9  epicsThreadRunableMethod<TimeStampSource::Impl, &TimeStampSource::Impl::runSrc>::run (this=<optimized out>) at ../../../include/mrfCommon.h:301
#10 0x000000000058fd59 in epicsThreadCallEntryPoint (pPvt=0x7fffa002bdb0) at ../../src/osi/epicsThread.cpp:95
#11 0x0000000000595afc in start_routine (arg=0x7fffa002bf40) at ../../src/osi/os/posix/osdThread.c:411
#12 0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff6a7f02d in clone () from /lib64/libc.so.6
(gdb) thread 35
[Switching to thread 35 (Thread 0x7fffaf97a700 (LWP 24053))]
#0  0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000598719 in epicsEventWaitWithTimeout (pevent=0x8fc8a0, timeout=1.1000000000000001) at ../../src/osi/os/posix/osdEvent.c:130
#2  0x000000000059138c in epicsEvent::wait (this=this@entry=0x8fc7c0, timeOut=timeOut@entry=1.1000000000000001) at ../../src/osi/epicsEvent.cpp:75
#3  0x000000000048b45e in runTimeout (this=0x8fc7b0) at ../mrmtimesrc.cpp:76
#4  epicsThreadRunableMethod<TimeStampSource::Impl, &TimeStampSource::Impl::runTimeout>::run (this=<optimized out>) at ../../../include/mrfCommon.h:301
#5  0x000000000058fd59 in epicsThreadCallEntryPoint (pPvt=0x7fff900008c0) at ../../src/osi/epicsThread.cpp:95
#6  0x0000000000595afc in start_routine (arg=0x7fff90000a50) at ../../src/osi/os/posix/osdThread.c:411
#7  0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff6a7f02d in clone () from /lib64/libc.so.6
(gdb) 

I can start the IOC with :TimeSrc-Sel PV set to 0 and then change it to 2 and back to 0 and observe the same lock-up. Changing :TimeSrc-Sel PV between 0 and 1 does not cause a lock-up. As soon as value 2 is set and attempt is made to change to 0 or 1 lock-up occurs.

mdavidsaver commented 4 years ago

Could you test with with master from https://github.com/mdavidsaver/mrfioc ? I think that a0d6f714cb23cde3c4ec4078fa2f6d9c3abb57a2 should fix a deadlock (joining a worker thread with a mutex held is a bad idea)

hinxx commented 4 years ago

Cloned your repo, compile, against epics-base 7.0.3, fails with:

/usr/bin/g++  -D_GNU_SOURCE -D_DEFAULT_SOURCE         -DUSE_TYPED_RSET   -D_X86_64_  -DUNIX  -Dlinux     -O3 -g   -Wall      -mtune=generic      -m64 -fPIC -I. -I../O.Common -I. -I. -I.. -I../../../include/compiler/gcc -I../../../include/os/Linux -I../../../include -I/data/bdee/devel/evr_dev/epics-base/include/compiler/gcc -I/data/bdee/devel/evr_dev/epics-base/include/os/Linux -I/data/bdee/devel/evr_dev/epics-base/include  -I/data/bdee/devel/evr_dev/autosave/include/os/Linux -I/data/bdee/devel/evr_dev/autosave/include   -I/data/bdee/devel/evr_dev/devlib2/include   -I/data/bdee/devel/evr_dev/mrfioc2/include           -I../../../mrfCommon/src -I../../../evrMrmApp/src    -c ../mrmtimesrc.cpp
../mrmtimesrc.cpp: In member function ‘void TimeStampSource::softSecondsSrc(bool)’:
../mrmtimesrc.cpp:250:21: error: ‘class std::auto_ptr<epicsThread>’ has no member named ‘swap’
             cleanup.swap(impl->softsrc);
                     ^
make[3]: *** [mrmtimesrc.o] Error 1
make[3]: Leaving directory `/data/bdee/devel/evr_dev/md-mrfioc2/mrmShared/src/O.linux-x86_64'
make[2]: *** [install.linux-x86_64] Error 2
make[2]: Leaving directory `/data/bdee/devel/evr_dev/md-mrfioc2/mrmShared/src'
make[1]: *** [src.install] Error 2
make[1]: Leaving directory `/data/bdee/devel/evr_dev/md-mrfioc2/mrmShared'
make: *** [mrmShared.install] Error 2
mdavidsaver commented 4 years ago

Sorry about that, now fixed. If I'd waited 10 more minutes I would have seen the CI failure.

hinxx commented 4 years ago

Compiles. Still locks-up, though.

Info from gdb:

epics> 
Program received signal SIGINT, Interrupt.
0x00007ffff7bcd6fd in read () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 ncurses-libs-5.9-14.20130511.el7_4.x86_64 readline-6.2-10.el7.x86_64
(gdb) info threads 
  Id   Target Id         Frame 
  39   Thread 0x7fffaf376700 (LWP 20604) "TimeStampTimeou" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  38   Thread 0x7fffaf3f7700 (LWP 20602) "SoftTimeSrc" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  37   Thread 0x7fffaf5f8700 (LWP 20586) "CAS-client" 0x00007ffff7bcda8b in recv () from /lib64/libpthread.so.0
  36   Thread 0x7fffaf6f9700 (LWP 20585) "CAS-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  35   Thread 0x7fffaf8fa700 (LWP 20580) "CAS-client" 0x00007ffff7bcda8b in recv () from /lib64/libpthread.so.0
  34   Thread 0x7fffaf9fb700 (LWP 20578) "CAS-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  33   Thread 0x7fffafefe700 (LWP 20576) "CAS-client" 0x00007ffff7bcda8b in recv () from /lib64/libpthread.so.0
  32   Thread 0x7fffaffff700 (LWP 20575) "CAS-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27   Thread 0x7fffafbfc700 (LWP 20552) "CAS-client" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26   Thread 0x7fffafcfd700 (LWP 20551) "CAS-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23   Thread 0x7ffff4164700 (LWP 20539) "CAC-event" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22   Thread 0x7ffff41e5700 (LWP 20538) "CAS-beacon" 0x00007ffff7bcde3d in nanosleep () from /lib64/libpthread.so.0
  21   Thread 0x7ffff42e6700 (LWP 20537) "CAS-UDP" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  20   Thread 0x7ffff43e7700 (LWP 20536) "CAS-TCP" 0x00007ffff7bcd97d in accept () from /lib64/libpthread.so.0
  19   Thread 0x7ffff45e8700 (LWP 20534) "scan-0.1" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  18   Thread 0x7ffff47e9700 (LWP 20533) "scan-0.2" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  17   Thread 0x7ffff49ea700 (LWP 20532) "scan-0.5" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7ffff4beb700 (LWP 20531) "scan-1" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  15   Thread 0x7ffff4dec700 (LWP 20530) "scan-2" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x7ffff4fed700 (LWP 20529) "scan-5" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  13   Thread 0x7ffff51ee700 (LWP 20528) "scan-10" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  12   Thread 0x7ffff53ef700 (LWP 20527) "scanOnce" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7ffff57d0700 (LWP 20526) "dbCaLink" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7ffff59d1700 (LWP 20525) "cbHigh" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7ffff5bd2700 (LWP 20524) "cbMedium" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7ffff5dd3700 (LWP 20523) "cbLow" 0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
  7    Thread 0x7ffff5ed4700 (LWP 20522) "timerQueue" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7ffff5f55700 (LWP 20521) "taskwd" 0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7ffff7f12700 (LWP 20520) "PCIISR0000:0a:0" 0x00007ffff7bcd6fd in read () from /lib64/libpthread.so.0
  4    Thread 0x7ffff6756700 (LWP 20519) "EVRFIFO" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7ffff7fd3700 (LWP 20517) "errlog" 0x00007ffff7bca965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7ffff7fd5740 (LWP 20512) "evrApp" 0x00007ffff7bcd6fd in read () from /lib64/libpthread.so.0
(gdb) thread 38
[Switching to thread 38 (Thread 0x7fffaf3f7700 (LWP 20602))]
#0  0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcd4ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bc8de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007ffff7bc8cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000005981b6 in mutexLock (id=0x8f7b90) at ../../src/osi/os/posix/osdMutex.c:46
#4  epicsMutexOsdLock (pmutex=0x8f7b90) at ../../src/osi/os/posix/osdMutex.c:130
#5  0x000000000059104c in epicsMutex::lock (this=this@entry=0x8f0bb0) at ../../src/osi/epicsMutex.cpp:275
#6  0x0000000000469f6f in scopedLock (lock=true, mutex=..., this=<synthetic pointer>) at ../../../include/mrfCommon.h:212
#7  EVRMRM::setEvtCode (this=0x8f0b10, code=125) at ../drvem.cpp:1014
#8  0x000000000048b628 in runSrc (this=0x8fc8d0) at ../mrmtimesrc.cpp:110
#9  epicsThreadRunableMethod<TimeStampSource::Impl, &TimeStampSource::Impl::runSrc>::run (this=<optimized out>) at ../../../include/mrfCommon.h:301
#10 0x000000000058fe49 in epicsThreadCallEntryPoint (pPvt=0x7fff94035ba0) at ../../src/osi/epicsThread.cpp:95
#11 0x0000000000595bec in start_routine (arg=0x7fff94035d30) at ../../src/osi/os/posix/osdThread.c:411
#12 0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff6a7f02d in clone () from /lib64/libc.so.6
(gdb) thread 39
[Switching to thread 39 (Thread 0x7fffaf376700 (LWP 20604))]
#0  0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000598809 in epicsEventWaitWithTimeout (pevent=0x8fc9c0, timeout=1.1000000000000001) at ../../src/osi/os/posix/osdEvent.c:130
#2  0x000000000059147c in epicsEvent::wait (this=this@entry=0x8fc8e0, timeOut=timeOut@entry=1.1000000000000001) at ../../src/osi/epicsEvent.cpp:75
#3  0x000000000048b54e in runTimeout (this=0x8fc8d0) at ../mrmtimesrc.cpp:76
#4  epicsThreadRunableMethod<TimeStampSource::Impl, &TimeStampSource::Impl::runTimeout>::run (this=<optimized out>) at ../../../include/mrfCommon.h:301
#5  0x000000000058fe49 in epicsThreadCallEntryPoint (pPvt=0x7fff840008c0) at ../../src/osi/epicsThread.cpp:95
#6  0x0000000000595bec in start_routine (arg=0x7fff84000a50) at ../../src/osi/os/posix/osdThread.c:411
#7  0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff6a7f02d in clone () from /lib64/libc.so.6

I added some printfs (my diff of the code is below):

softSecondsSrc:238 AAAA
[New Thread 0x7fffaf3f7700 (LWP 20602)]
softSecondsSrc:248 BBBB
[New Thread 0x7fffaf376700 (LWP 20604)]
softSecondsSrc:251 CCCC
softSecondsSrc:254 DDDD
softSecondsSrc:258 EEEE

Location of the printf()s, diff:

diff --git a/mrmShared/src/mrmtimesrc.cpp b/mrmShared/src/mrmtimesrc.cpp
index 951e086..16f1ac1 100644
--- a/mrmShared/src/mrmtimesrc.cpp
+++ b/mrmShared/src/mrmtimesrc.cpp
@@ -231,8 +231,11 @@ double TimeStampSource::deltaSeconds() const
 void TimeStampSource::softSecondsSrc(bool enable)
 {
 #ifdef HAVE_CNS
+    mrf::auto_ptr<epicsThread> cleanup;
+    {
         Guard G(impl->mutex);
         if(enable && !impl->softsrc.get()) {
+            printf("%s:%d AAAA\n", __func__, __LINE__);
             // start it
             impl->stopsrc = false;
             impl->softsrc.reset(new epicsThread(impl->softsrcRun,
@@ -242,16 +245,20 @@ void TimeStampSource::softSecondsSrc(bool enable)
             impl->softsrc->start();

             resyncSecond();
+            printf("%s:%d BBBB\n", __func__, __LINE__);

         } else if(!enable && impl->softsrc.get()) {
+            printf("%s:%d CCCC\n", __func__, __LINE__);
             impl->stopsrc = true;
-        {
-            UnGuard U(G);
-            impl->wakeup.signal();
-            impl->softsrc->exitWait();
+            cleanup = PTRMOVE(impl->softsrc);
+            printf("%s:%d DDDD\n", __func__, __LINE__);
         }
-        impl->softsrc.reset();
-
+    }
+    if(cleanup.get()) {
+        printf("%s:%d EEEE\n", __func__, __LINE__);
+        impl->wakeup.signal();
+        cleanup->exitWait();
+        printf("%s:%d FFFF\n", __func__, __LINE__);
     }
 #else
     if(enable)

As you can see the cleanup->exitWait(); does not finish, since no FFFF line is printed.

Note that I just dumped the two changed source files into my source tree which is ESS specific - https://github.com/icshwi/mrfioc2.git; Not sure what are all the differences compared to your tree and if they would matter at all in this case, as I'm not part of the group that maintains that fork. I can try to run IOC with your code to be sure I get the same behavior.

mdavidsaver commented 4 years ago

Given all the threads in __lll_lock_wait, there are more locks involved. In gdb, please run thread apply all backtrace and attach the (very long) output as a file to this issue.

hinxx commented 4 years ago

thread apply all backtrace.txt

mdavidsaver commented 4 years ago

I see what is happening. Trying to join a thread from a device support function was a mistake. It is also an unnecessary. So the fix I'll try will be to lazily start the worker, but not to stop it.

mdavidsaver commented 4 years ago

I believe this issue was fixed with a0d6f714cb23cde3c4ec4078fa2f6d9c3abb57a2, which was included in the recent 2.3.0 release.