BIC-MNI / libminc

libminc is the core library and API of the MINC toolkit
Other
19 stars 29 forks source link

file I/O race conditions when running ctest in parallel #110

Open bcdarwin opened 4 years ago

bcdarwin commented 4 years ago

Possibly not too severe ... ?

31/52 Test #36: minc2-large-attribute-100k .......***Failed    0.00 sec
/build/source/libsrc2/volume.c:236 (from MINC): Unable to create file '3D_image_a.mnc'
Error reported on line #113, create_3D_image: 0
1 error reported
Creating 3D image with attribute 100000 ! (3D_image_a.mnc)

      Start 39: minc2-dimension-test
32/52 Test #37: minc2-large-attribute-1m .........***Failed    0.00 sec
/build/source/libsrc2/volume.c:236 (from MINC): Unable to create file '3D_image_a.mnc'
Error reported on line #113, create_3D_image: 0
1 error reported
Creating 3D image with attribute 1000000 ! (3D_image_a.mnc)
gdevenyi commented 4 years ago

"make test" in libminc with the develop-1.9.18 (HDF5 1.10.6) superbuild passes all tests, can you please compare your HDF5 build config to: https://github.com/BIC-MNI/minc-toolkit-v2/blob/develop-1.9.18/cmake-modules/BuildHDF5.cmake

Thanks.

bcdarwin commented 4 years ago

Looking at libhdf5.settings, a main difference seems to be use of -O3 but I haven't verified this yet ...

gdevenyi commented 4 years ago

Any updates on this? My autobuild dockers are having issues with a couple of the HDF5 runs, I'm wondering if its related: https://github.com/BIC-MNI/build_packages/pull/14

bcdarwin commented 4 years ago

Now instead failing as follows after disabling parallel building and bumping some dependencies (possibly more evidence this is a race condition or memory corruption):

37/50 Test #46: minc2-valid-test .................***Failed    0.02 sec
/build/source/libsrc2/volume.c:1399 (from MINC): Unable to open file '/build/source/build/testdir/3D_minc2.mnc'
Error reported on line #20, can't open input: -1
/build/source/libsrc2/volume.c:1399 (from MINC): Unable to open file '/build/source/build/testdir/3D_minc2_int.mnc'
Error reported on line #20, can't open input: -1
min -32768.000000 max 32767.000000
min -340282346638528859811704183484516925440.000000 max 340282346638528859811704183484516925440.000000
min 0.000000 max 255.000000
38/50 Test #45: minc2-slice-test .................   Passed    0.02 sec
gdevenyi commented 4 years ago

Interesting, any chance you could throw together a reproducer in a Docker container or such so we can play with it?

Does it error if you run the tests a second time? Maybe a strace might helpful as well.

vfonov commented 4 years ago

there are two tests in CMakeLists.txt that use files with the same names:

add_minc_test(minc2-slice-test            minc2-slice-test 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_int.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_float.mnc
                                          )

add_minc_test(minc2-valid-test            minc2-valid-test
                                          ${CMAKE_CURRENT_BINARY_DIR}/2D_minc2.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_int.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_float.mnc
                                          ${CMAKE_CURRENT_BINARY_DIR}/4D_minc2.mnc

So, if tests are executed in parallel, there will be a conflict

vfonov commented 4 years ago

Also,


add_minc_test(minc2-large-attribute-10k   minc2-large-attribute 10000)
add_minc_test(minc2-large-attribute-100k  minc2-large-attribute 100000)
add_minc_test(minc2-large-attribute-1m    minc2-large-attribute 1000000)
bcdarwin commented 4 years ago

Thanks Vlad! Looks like running the tests sequentially fixes things.