epics-modules / autosave

APS BCDA synApps module: autosave
https://epics-modules.github.io/autosave/
Other
8 stars 31 forks source link

Segmentation fault when setting save_restoreSet_Debug to a non 0 value #17

Closed ebernalru closed 5 years ago

ebernalru commented 6 years ago

Dear all, I have a Segmentation fault when setting save_restoreSet_Debug to a non 0 value. The SegFault is at IOC exit. My test IOC is under: Debian GNU/Linux 8.10 (jessie), EPICS Base 3.15.3 and autosave: master branch (same error under R5.9 and R5.8) Here is the GDB output when exiting IOC,

BEGIN save_restoreShow
  Current date-time (yymmdd-hhmmss): [180216-122021] 
  Status: 'No Status' - 'Enabling list...'
  Debug level: 15
  File permissions: 0664
  Save/restore incomplete save sets? YES
  Write dated backup files? YES
  Number of sequence files to maintain: 3
  Time interval between sequence files: 60 seconds
  Time interval between .sav-file write failure and retry: 60 seconds
  NFS host: ''; address:''
  NFS mount point:
    '/home/vagrant/development/iocdata/autosave/ioctest'
  NFS mount status: not managed by save_restore
  I/O errors: 0
  request file path list:
    '/home/vagrant/development/iocdata/autosave/ioctest/'
  save file path:
    '/home/vagrant/development/iocdata/autosave/ioctest'
lockList: listLock=1
  info_settings.req: 
    macro string: ''
    Status PV: 
    Status: 'No Status' - 'Enabling list...'
    Last save time  :Fri Feb 16 12:20:17 2018
    Last backup time:Fri Feb 16 12:20:17 2018
    methods: [ TIMER+CHANGE ]
    path PV: None
    name PV: None
    backups: YES
    save_state = 0x8
    period: 4; trigger chan: ''; monitor period: 5
    last saved file - 
    0 channels not connected (or ca_get failed)

Thread 26 "save_restore" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdedf5700 (LWP 15561)]
0x0000000000000000 in ?? ()
(gdb) info threads
  Id   Target Id         Frame 
  1    Thread 0x7ffff7fd6740 (LWP 15534) "test" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3    Thread 0x7ffff7fd4700 (LWP 15536) "errlog" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  4    Thread 0x7ffff7f53700 (LWP 15538) "timerQueue" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  5    Thread 0x7ffff4d2c700 (LWP 15539) "timerQueue" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  6    Thread 0x7ffff4c2b700 (LWP 15540) "udpTestPortName" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7    Thread 0x7ffff4b2a700 (LWP 15541) "taskwd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  9    Thread 0x7ffff552d700 (LWP 15543) "test" 0x00007ffff6b33aed in poll () at ../sysdeps/unix/syscall-template.S:81
  10   Thread 0x7ffff49a8700 (LWP 15544) "timerQueue" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  11   Thread 0x7ffff48a7700 (LWP 15545) "cbLow" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  12   Thread 0x7ffff46a6700 (LWP 15546) "cbMedium" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  13   Thread 0x7ffff44a5700 (LWP 15547) "cbHigh" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  14   Thread 0x7ffff42a4700 (LWP 15548) "dbCaLink" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  15   Thread 0x7fffdffff700 (LWP 15549) "scanOnce" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  16   Thread 0x7fffdfdfe700 (LWP 15550) "scan-10" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  17   Thread 0x7fffdfbfd700 (LWP 15551) "scan-5" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  18   Thread 0x7fffdf9fc700 (LWP 15552) "scan-2" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  19   Thread 0x7fffdf7fb700 (LWP 15553) "scan-1" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  20   Thread 0x7fffdf5fa700 (LWP 15554) "scan-0.5" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  21   Thread 0x7fffdf3f9700 (LWP 15555) "scan-0.2" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  22   Thread 0x7fffdf1f8700 (LWP 15556) "scan-0.1" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  23   Thread 0x7fffdeff7700 (LWP 15557) "CAS-TCP" 0x00007ffff6b3d2dd in accept () at ../sysdeps/unix/syscall-template.S:81
  24   Thread 0x7ffff40a3700 (LWP 15558) "CAS-beacon" 0x00007ffff6b0df2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
  25   Thread 0x7fffdeef6700 (LWP 15559) "CAS-UDP" 0x00007ffff6b3d543 in recvfrom () at ../sysdeps/unix/syscall-template.S:81
* 26   Thread 0x7fffdedf5700 (LWP 15561) "save_restore" 0x0000000000000000 in ?? ()
  27   Thread 0x7ffff4aa9700 (LWP 15562) "CAC-event" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
(gdb) thread 26
[Switching to thread 26 (Thread 0x7fffdedf5700 (LWP 15561))]
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff683606e in ca_state (pChan=0x7fffc0000cf0) at ../oldChannelNotify.cpp:644
#2  0x00007ffff7bc56f9 in save_restoreShow (verbose=-1073663648, verbose@entry=1) at ../save_restore.c:2525
#3  0x00007ffff7bcaeb7 in save_restore () at ../save_restore.c:1305
#4  0x00007ffff6e435cc in start_routine (arg=0x6f98e0) at ../../../src/libCom/osi/os/posix/osdThread.c:403
#5  0x00007ffff65d8064 in start_thread (arg=0x7fffdedf5700) at pthread_create.c:309
#6  0x00007ffff6b3c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) frame 3
#3  0x00007ffff7bcaeb7 in save_restore () at ../save_restore.c:1305
1305            save_restoreShow(1);
(gdb) frame 2
#2  0x00007ffff7bc56f9 in save_restoreShow (verbose=-1073663648, verbose@entry=1) at ../save_restore.c:2525
2525                            pchannel->chid, pchannel->chid?ca_state_string[ca_state(pchannel->chid)]:"noChid",
(gdb) frame 1
#1  0x00007ffff683606e in ca_state (pChan=0x7fffc0000cf0) at ../oldChannelNotify.cpp:644
644     if ( pChan->io.connected ( guard ) ) {
(gdb) frame 0
#0  0x0000000000000000 in ?? ()
(gdb) 
anjohnson commented 6 years ago

The autosave module does register and use an epicsAtExit() routine — that's what triggered the output of the long BEGIN save_restoreShow message you posted. However I suspect it's being registered too early so the callback occurs after the CA client library has already cleaned up, which probably explains the NULL pointers in the stack trace.

If a client application wants to be able to use CA resources inside its exit routine, I believe it has to create its CA context before registering the exit routine (so it gets called first at exit time; exit calls are made in the reverse order of their registration).

timmmooney commented 6 years ago

I think the problem is that autosave disconnects all of its channel-access stuff, and then it calls save_restoreShow(1), which tries to print that stuff out. I modified save_restoreShow(), so it doesn't try to report channel-access stuff after having received an epicsAtExit callback.

Tim Mooney (mooney@aps.anl.gov) (630)252-5417 Beamline Controls Group (www.aps.anl.gov) Advanced Photon Source, Argonne National Lab


From: Enrique Bernal notifications@github.com Sent: Friday, February 16, 2018 1:32:36 PM To: epics-modules/autosave Cc: Subscribed Subject: [epics-modules/autosave] Segmentation fault when setting save_restoreSet_Debug to a non 0 value (#17)

Dear all, I have a Segmentation fault when setting save_restoreSet_Debug to a non 0 value. The SegFault is at IOC exit. My test IOC is under: Debian GNU/Linux 8.10 (jessie), EPICS Base 3.15.3 and autosave: master branch (same error under R5.9 and R5.8) Here is the GDB output when exiting IOC,

BEGIN save_restoreShow Current date-time (yymmdd-hhmmss): [180216-122021] Status: 'No Status' - 'Enabling list...' Debug level: 15 File permissions: 0664 Save/restore incomplete save sets? YES Write dated backup files? YES Number of sequence files to maintain: 3 Time interval between sequence files: 60 seconds Time interval between .sav-file write failure and retry: 60 seconds NFS host: ''; address:'' NFS mount point: '/home/vagrant/development/iocdata/autosave/ioctest' NFS mount status: not managed by save_restore I/O errors: 0 request file path list: '/home/vagrant/development/iocdata/autosave/ioctest/' save file path: '/home/vagrant/development/iocdata/autosave/ioctest' lockList: listLock=1 info_settings.req: macro string: '' Status PV: Status: 'No Status' - 'Enabling list...' Last save time :Fri Feb 16 12:20:17 2018 Last backup time:Fri Feb 16 12:20:17 2018 methods: [ TIMER+CHANGE ] path PV: None name PV: None backups: YES save_state = 0x8 period: 4; trigger chan: ''; monitor period: 5 last saved file - 0 channels not connected (or ca_get failed)

Thread 26 "save_restore" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffdedf5700 (LWP 15561)] 0x0000000000000000 in ?? () (gdb) info threads Id Target Id Frame 1 Thread 0x7ffff7fd6740 (LWP 15534) "test" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 3 Thread 0x7ffff7fd4700 (LWP 15536) "errlog" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 4 Thread 0x7ffff7f53700 (LWP 15538) "timerQueue" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 5 Thread 0x7ffff4d2c700 (LWP 15539) "timerQueue" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 6 Thread 0x7ffff4c2b700 (LWP 15540) "udpTestPortName" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 7 Thread 0x7ffff4b2a700 (LWP 15541) "taskwd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 9 Thread 0x7ffff552d700 (LWP 15543) "test" 0x00007ffff6b33aed in poll () at ../sysdeps/unix/syscall-template.S:81 10 Thread 0x7ffff49a8700 (LWP 15544) "timerQueue" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 11 Thread 0x7ffff48a7700 (LWP 15545) "cbLow" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 12 Thread 0x7ffff46a6700 (LWP 15546) "cbMedium" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 13 Thread 0x7ffff44a5700 (LWP 15547) "cbHigh" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 14 Thread 0x7ffff42a4700 (LWP 15548) "dbCaLink" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 15 Thread 0x7fffdffff700 (LWP 15549) "scanOnce" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 16 Thread 0x7fffdfdfe700 (LWP 15550) "scan-10" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 17 Thread 0x7fffdfbfd700 (LWP 15551) "scan-5" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 18 Thread 0x7fffdf9fc700 (LWP 15552) "scan-2" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 19 Thread 0x7fffdf7fb700 (LWP 15553) "scan-1" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 20 Thread 0x7fffdf5fa700 (LWP 15554) "scan-0.5" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 21 Thread 0x7fffdf3f9700 (LWP 15555) "scan-0.2" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 22 Thread 0x7fffdf1f8700 (LWP 15556) "scan-0.1" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 23 Thread 0x7fffdeff7700 (LWP 15557) "CAS-TCP" 0x00007ffff6b3d2dd in accept () at ../sysdeps/unix/syscall-template.S:81 24 Thread 0x7ffff40a3700 (LWP 15558) "CAS-beacon" 0x00007ffff6b0df2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 25 Thread 0x7fffdeef6700 (LWP 15559) "CAS-UDP" 0x00007ffff6b3d543 in recvfrom () at ../sysdeps/unix/syscall-template.S:81

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/epics-modules/autosave/issues/17, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGfnl_ai_UUhvf70dcPQKyLT1SBu2xdfks5tVdfUgaJpZM4SI0Tc.

ebernalru commented 6 years ago

Thanks for the quick reply, I tried your modification and It works on my test unit. Best regards.

keenanlang commented 5 years ago

Fixed in 12aaa9f