Closed graeme-winter closed 3 years ago
OK, this appears to only manifest itself when using xds_par
not xds
so it is almost certainly to do with (lack of) thread safety.
Total elapsed wall-clock time for XDS 7.7 sec
Process 78252 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x101e741e0)
frame #0: 0x0000000101e741e0
error: memory read failed for 0x101e74000
thread #3, stop reason = EXC_BAD_ACCESS (code=1, address=0x101e741e0)
frame #0: 0x0000000101e741e0
error: memory read failed for 0x101e74000
Target 0: (xds_par) stopped.
Looks like the error is purely inside the thread library?
WEAK SPOTS OMITTED 0
NUMBER OF DIFFRACTION SPOTS ACCEPTED 216
total elapsed wall-clock time for COLSPOT 2.1 sec
TASK cpu time (sec) elapsed wall-clock time (sec)
1 13.3 2.0
[generic_data_plugin] - INFO - 'call generic_close()'
Total elapsed wall-clock time for XDS 7.8 sec
Process 78877 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x101e96390)
frame #0: 0x0000000101e96390
error: memory read failed for 0x101e96200
Target 0: (xds_par) stopped.
(lldb) thread list
Process 78877 stopped
thread #1: tid = 0x1290416, 0x00007fff70e959de libsystem_kernel.dylib`__ulock_wait + 10, queue = 'com.apple.main-thread'
* thread #2: tid = 0x129042d, 0x0000000101e96390, stop reason = EXC_BAD_ACCESS (code=1, address=0x101e96390)
thread #3: tid = 0x129042e, 0x00007fff70e9686a libsystem_kernel.dylib`__psynch_cvwait + 10
thread #4: tid = 0x129042f, 0x00007fff70e9686a libsystem_kernel.dylib`__psynch_cvwait + 10
thread #5: tid = 0x1290430, 0x00007fff70e9686a libsystem_kernel.dylib`__psynch_cvwait + 10
thread #6: tid = 0x1290431, 0x00007fff70e9686a libsystem_kernel.dylib`__psynch_cvwait + 10
thread #7: tid = 0x1290432, 0x00007fff70e9686a libsystem_kernel.dylib`__psynch_cvwait + 10
thread #8: tid = 0x1290433, 0x00007fff70e9686a libsystem_kernel.dylib`__psynch_cvwait + 10
(lldb) t 2
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x101e96390)
frame #0: 0x0000000101e96390
error: memory read failed for 0x101e96200
(lldb) thread backtrace
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x101e96390)
* frame #0: 0x0000000101e96390
frame #1: 0x00007fff70f52660 libsystem_pthread.dylib`_pthread_tsd_cleanup + 476
frame #2: 0x00007fff70f55655 libsystem_pthread.dylib`_pthread_exit + 70
frame #3: 0x00007fff70f522f6 libsystem_pthread.dylib`_pthread_body + 137
frame #4: 0x00007fff70f55249 libsystem_pthread.dylib`_pthread_start + 66
frame #5: 0x00007fff70f5140d libsystem_pthread.dylib`thread_start + 13
the full stack for this thread which gave EXC_BAD_ACCESS is within libsystem_pthread 🤔
Of course, the process which fails is actually mcolspot_par
not xds_par
so intercept in forkxds and run inside a debugger - same end game though -
Grey-Area durin-segv :) $ lldb `which mcolspot_par`
(lldb) target create "/Users/graeme/xtal/XDS/mcolspot_par"
Current executable set to '/Users/graeme/xtal/XDS/mcolspot_par' (x86_64).
(lldb) run
Process 80562 launched: '/Users/graeme/xtal/XDS/mcolspot_par' (x86_64)
2^D
] master_file=/Volumes/Blue/Data/i03-ins-fdp-small-i/insu_13_1_master.h5
[generic_data_plugin] - INFO - generic_open
+ library = </Users/graeme/xtal/XDS/durin-plugin.so>
+ template_name = <
/Volumes/Blue/Data/i03-ins-fdp-small-i/insu_13_1_master.h5>
+ dll_filename = </Users/graeme/xtal/XDS/durin-plugin.so>
+ image_data_filename = <
/Volumes/Blue/Data/i03-ins-fdp-small-i/insu_13_1_master.h5>
[generic_data_plugin] - INFO - generic_get_header
INFO(1:5)=vendor/major version/minor version/patch/timestamp= 1 0 0 0 -1
generic_getfrm: data are from Dectris
[generic_data_plugin] - INFO - 'call generic_close()'
Process 80562 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x10fdf9390)
frame #0: 0x000000010fdf9390
error: memory read failed for 0x10fdf9200
Target 0: (mcolspot_par) stopped.
(lldb) thread list
Process 80562 stopped
thread #1: tid = 0x12c5251, 0x00007fff70e959de libsystem_kernel.dylib`__ulock_wait + 10, queue = 'com.apple.main-thread'
* thread #2: tid = 0x12c527c, 0x000000010fdf9390, stop reason = EXC_BAD_ACCESS (code=1, address=0x10fdf9390)
(lldb) thread backtrace
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x10fdf9390)
* frame #0: 0x000000010fdf9390
frame #1: 0x00007fff70f52660 libsystem_pthread.dylib`_pthread_tsd_cleanup + 476
frame #2: 0x00007fff70f55655 libsystem_pthread.dylib`_pthread_exit + 70
frame #3: 0x00007fff70f522f6 libsystem_pthread.dylib`_pthread_body + 137
frame #4: 0x00007fff70f55249 libsystem_pthread.dylib`_pthread_start + 66
frame #5: 0x00007fff70f5140d libsystem_pthread.dylib`thread_start + 13
https://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/LIB
Should try with a threaded driver application to see if problem can be recreated
Page above indicates that "this just happens" 🤔
Also driver code only FORTRAN which is not super helpful...
Compiled the test_generic_host
program and I can reproduce the issue e.g. with OMP_NUM_THREAD=1
the program works correctly
Grey-Area driver :( $ OMP_NUM_THREADS=1 ./test_generic_host < driver.in
enter parameter of LIB= keyword:
enter parameter of NAME_TEMPLATE_OF_DATA_FRAMES= keyword:
enter parameters of the DATA_RANGE= keyword:
master_file=/Users/graeme/data/i03-screen19/Protk_1/Protk_1_8_master.h5
[generic_data_plugin] - INFO - generic_open
+ library = </Users/graeme/xtal/XDS/durin-plugin.so>
+ template_name = </Users/graeme/data/i03-screen19/Protk_1/Protk_1_8_master.h5>
+ dll_filename = </Users/graeme/xtal/XDS/durin-plugin.so>
+ image_data_filename = </Users/graeme/data/i03-screen19/Protk_1/Protk_1_8_master.h5>
[generic_data_plugin] - INFO - generic_get_header
nx,ny,nbyte,qx,qy,number_of_frames= 4148 4362 2 0.000075 0.000075 150
INFO(1:5)=vendor/major version/minor version/patch/timestamp= 1 0 0 0 -1
generic_getfrm: data are from Dectris
average counts: 1.62359679
[generic_data_plugin] - INFO - 'call generic_close()'
but without I get a SEGV
Grey-Area driver :) $ OMP_NUM_THREADS=2 ./test_generic_host < driver.in
enter parameter of LIB= keyword:
enter parameter of NAME_TEMPLATE_OF_DATA_FRAMES= keyword:
enter parameters of the DATA_RANGE= keyword:
master_file=/Users/graeme/data/i03-screen19/Protk_1/Protk_1_8_master.h5
[generic_data_plugin] - INFO - generic_open
+ library = </Users/graeme/xtal/XDS/durin-plugin.so>
+ template_name = </Users/graeme/data/i03-screen19/Protk_1/Protk_1_8_master.h5>
+ dll_filename = </Users/graeme/xtal/XDS/durin-plugin.so>
+ image_data_filename = </Users/graeme/data/i03-screen19/Protk_1/Protk_1_8_master.h5>
[generic_data_plugin] - INFO - generic_get_header
nx,ny,nbyte,qx,qy,number_of_frames= 4148 4362 2 0.000075 0.000075 150
INFO(1:5)=vendor/major version/minor version/patch/timestamp= 1 0 0 0 -1
generic_getfrm: data are from Dectris
average counts: 1.62359643
[generic_data_plugin] - INFO - 'call generic_close()'
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x10dd589cc
#1 0x10dd57dc5
#2 0x7fff6bcb45fc
Segmentation fault: 11
Now resolved on the XDS side in the latest release, by changing the manner in which the plugin code is unloaded at the end of execution.
Though the plugin "works fine" keep getting
at the end of processing which is untidy at best