Closed paskino closed 1 year ago
Currently failing all pytest
unit tests like
ctest -V -R REG_TESTS_PYTHON
UpdateCTestConfiguration from :/opt/SIRF-SuperBuild/builds/SIRF/build/DartConfiguration.tcl
UpdateCTestConfiguration from :/opt/SIRF-SuperBuild/builds/SIRF/build/DartConfiguration.tcl
Test project /opt/SIRF-SuperBuild/builds/SIRF/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 6
Start 6: REG_TESTS_PYTHON
6: Test command: /opt/conda/bin/python3 "-m" "pytest" "--cov=sirf.Reg" "--cov-config=.coveragerc-Reg" "src/Registration/pReg"
6: Test timeout computed to be: 10000000
6: ERROR: usage: __main__.py [options] [file_or_dir] [file_or_dir] [...]
6: __main__.py: error: unrecognized arguments: --cov=sirf.Reg --cov-config=.coveragerc-Reg
6: inifile: None
6: rootdir: /opt/SIRF-SuperBuild/sources/SIRF
6:
1/1 Test #6: REG_TESTS_PYTHON .................***Failed 0.22 sec
errors are due to the missing pytest-cov
package
I moved to Ubuntu 22.04 and stumbled upon some errors with CUDA while building parallelproj
.
nvcc fatal : Unsupported gpu architecture 'compute_86+PTX'
I also upgraded CMake to 3.25.1 because with 3.17 there was the following error:
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly. Missing variable is: CMAKE_CUDA17_EXTENSION_COMPILE_OPTION
building which project?
building which project?
parallelproj
Possible @gschramm can help, but a quick look at the internet says this is a conflict between driver and toolkit version
using parallelproj v1.2.11 solved the issue, now the next one is
Linking CXX executable ismrmrd_test_xml
ESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_global_init@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_perform@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_slist_free_all@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_setopt@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_init@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_slist_append@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_cleanup@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_global_cleanup@CURL_OPENSSL_4'
All non-CUDA jobs are failing because parallelproj
by defaults requires nvcc
, see e.g. https://github.com/SyneRBI/SIRF-SuperBuild/actions/runs/3849192276/jobs/6557897239#step:7:507 and https://github.com/gschramm/parallelproj/issues/22. If no CUDA, we need to build it with SKIP_CUDA_LIB
. Easy?
> ESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_global_init@CURL_OPENSSL_4'
I guess this is on a local build as I don't see it in any of the logs. What configuration is this? (As a 22.04 VM builds, and @ckolbPTB builds (of course it's the SB master
version on docker 22.04 https://github.com/ckolbPTB/OpenSourceMrRecon/blob/main/Dockerfile)
This might be an Ubuntu (possibly it distributes built curl without openssl) or CMake problem (possibly it doesn't know what libraries it should pass).
Yes it is a local build.
This is a Ubuntu22.04 from a 11.7.1-cudnn8-devel-ubuntu22.04 NVIDIA image.
The build command is
docker build --build-arg BASE_IMAGE=nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 \
--build-arg PYTHON_INSTALL_DIR=/opt/conda --build-arg SIRF_SB_URL="https://github.com/paskino/SIRF-SuperBuild" \
--build-arg SIRF_SB_TAG="fix_prerelease3.4" --build-arg NUM_PARALLEL_BUILDS=1 --target sirf .
With USE_SYSTEM_HDF5=ON
I get the linking error.
I will try to install libcurl4-openssl-dev
although I thought I checked and verified it was already installed.
Probably unrelated, but in STIR GHA tests, I suddenly get
E: Failed to fetch http://azure.archive.ubuntu.com/ubuntu/pool/main/c/curl/libcurl4-openssl-dev_7.81.0-1ubuntu1.6_amd64.deb 404 Not Found [IP: 40.119.46.219 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
This library is installed because of HDF5. So maybe they're fixing something. (I had to apt update
first)
Yes it was installed
Step 62/76 : RUN apt-get install -y libcurl4-openssl-dev
---> Running in 5c8a5c97be87
Reading package lists...
Building dependency tree...
Reading state information...
libcurl4-openssl-dev is already the newest version (7.81.0-1ubuntu1.6).
libcurl4-openssl-dev set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
However, on the ubuntu page the version is 7.81.0-1ubuntu1.7 and it is a security update. I guess we need to run an apt upgrade
before cmaking.
Ah, I understand why I had the 7.81.0-1ubuntu1.6
version instead of 7.81.0-1ubuntu1.7
of libcurl4-openssl-dev
!
This is because I built using cached images. So my image, although it had run apt-get update
, it had done it when the current version was 1.6. Let's hope this fixes the problem.
It seems that we should ~link~ build ISMRMRD adding -lcurl
and it works.
How can I do this?
It seems that we should ~link~ build ISMRMRD adding
-lcurl
and it works.
Only necessary for ISMRMRD and not for SIRF? Weird.
I doubt they use curl directly, so think this is a CMake bug. But I cannot understand why you see this in your docker image and we don't have it elsewhere. Which version of CMake are you using in your docker instance?
How can I do this?
We could force the linker flag CMAKE_EXE_LINKER_FLAGS
(ugly), for HDF5_LIBRARIES
variable (ugly) or use our own clone for ISMRMRD (not nice). But as they don't have this in CI, nor we as far as I understand, I'd want to understand better why you have it...
The problem seems to arise when using the system HDF5 library, not if we build our own HDF5. I just fixed it for ISMRMRD and it appeared for Gadgetron.
My CMake is v3.25.1
The problem seems to arise when using the system HDF5 library, not if we build our own HDF5. I just fixed it for ISMRMRD and it appeared for Gadgetron.
My CMake is v3.25.1
strange. definitely a CMake bug then. Can you post some log when building ISMRMRD after find_package(HDF5)
(I think we have the HDF5_FIND_DEBUG
flag on such that it writes diagnostics, but if not, please run with it
-- SuperBuild - HDF5[OK]
-- HDF5 C compiler wrapper is unable to compile a minimal HDF5 program.
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so (found suitable version "1.10.7", minimum required is "1.8")
-- USING the system HDF5, found HDF5_INCLUDE_DIRS=/usr/include/hdf5/serial, HDF5_C_LIBRARY_hdf5=,HDF5_LIBRARIES=/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- HDF5_CMAKE_ARGS=-DHDF5_INCLUDE_DIRS:PATH=/usr/include/hdf5/serial;-DHDF5_LIBRARIES:STRING=/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;-DHDF5_C_LIBRARIES:STRING=;-DHDF5_FIND_DEBUG:BOOL=ON
For ISMRMRD
[ 18%] Performing configure step for 'ISMRMRD'
loading initial cache file /opt/SIRF-SuperBuild/builds/ISMRMRD/tmp/ISMRMRD-cache-Release.cmake
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib/ccache/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/lib/ccache/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- HDF5 C compiler wrapper is unable to compile a minimal HDF5 program.
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so (found version "1.10.7") found components: C
-- HDF5_DIR: HDF5_DIR-NOTFOUND
-- HDF5_DEFINITIONS:
-- HDF5_INCLUDE_DIRS: /usr/include/hdf5/serial
-- HDF5_LIBRARIES: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- HDF5_HL_LIBRARIES:
-- HDF5_C_DEFINITIONS:
-- HDF5_C_INCLUDE_DIR: /usr/include/hdf5/serial
-- HDF5_C_INCLUDE_DIRS: /usr/include/hdf5/serial
-- HDF5_C_LIBRARY:
-- HDF5_C_LIBRARIES: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- HDF5_C_HL_LIBRARY:
-- HDF5_C_HL_LIBRARIES:
-- Defined targets (if any):
-- ... hdf5::hdf5
mHDF5 found at: /usr/include/hdf5/serial
HDF5 found at: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Found Doxygen: /usr/bin/doxygen (found version "1.9.1") found components: doxygen dot
Building info application
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found suitable version "1.74.0", minimum required is "1.43") found components: program_options random
-- FFTW3 UNIX libraries: /usr/lib/x86_64-linux-gnu/libfftw3.so/usr/lib/x86_64-linux-gnu/libfftw3f.so
FFTW3 and Boost Found... building utilities
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3f.so
CMake Deprecation Warning at examples/c/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found suitable version "1.74.0", minimum required is "1.43") found components: unit_test_framework
CPACK_PACKAGING_INSTALL_PREFIX: /opt/SIRF-SuperBuild/INSTALL
-- Found CPack generators: DEB
-- Configuring done
-- Generating done
Obviously adding -lcurl
doesn't work on Ubuntu < 22.04 https://github.com/SyneRBI/SIRF-SuperBuild/actions/runs/3874884037/jobs/6606658793#step:7:3764
Please upgrade boost (like in #438). 1.72 gives problems with ISMRMD.
The nvcc fatal : Unsupported gpu architecture 'compute_86+PTX'
issue is not related to parallelproj version. We tried on an Ubuntu 22.04 system with parallelproj 1.2.12. I still think this is a conflict between nvcc versions etc.
Currently we're trying with CUDA toolkit 12.0 from the NVIDIA website. Note that this says to use apt install cuda
, using apt install nvidia-cuda-toolkit
then creates a conflict as it installs nvcc 11.5 back again. Will let you know.
Currently we're trying with CUDA toolkit 12.0 from the NVIDIA website. Note that this says to use
apt install cuda
, usingapt install nvidia-cuda-toolkit
then creates a conflict as it installs nvcc 11.5 back again. Will let you know.
apt install cuda
didn't install nvcc
, and it says you need apt install nvidia-cuda-toolkit
anyway. But that downgraded. Oh well.
Wait @KrisThielemans ! The problem is that the the docker build was using a Ubuntu 20.04 image as base. I changed to Ubuntu 22.04 and it builds. Currently it fails installing jupyter, but I'm probably close to fix it.
I will need to upgrade Boost.
Next problem is installing SIRF-Exercises dependencies when we use conda...
An attempt to build Gadgetron master is https://github.com/paskino/SIRF-SuperBuild/pull/4/
Next problem is installing SIRF-Exercises dependencies when we use conda...
we never had a problem with that in the past, so do we have one now? I guess pip install
should still work?
@paskino what's the situation with HDF5 now? I guess you're building your own? I've checked in my 22.04 VM and spotted the difference that explains why it works on the VM and not for you. On my VM, the CMake output is
irfuser@vagrant:~/devel/buildVM/builds/ISMRMRD/build$ cmake .
-- HDF5: Using hdf5 compiler wrapper to determine C configuration
-- HDF5_DIR: HDF5_DIR-NOTFOUND
-- HDF5_DEFINITIONS:
-- HDF5_INCLUDE_DIRS: /usr/include/hdf5/serial
-- HDF5_LIBRARIES: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libcrypto.so;/usr/lib/x86_64-linux-gnu/libcurl.so;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/libsz.so;/home/sirfuser/devel/install/lib/libz.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so
...
The first line says that it uses h5cc
to find the options, while yours says it couldn't use the compiler wrapper. Indeed, on my VM I have
$ h5cc -show
gcc -I/usr/include/hdf5/serial -L/usr/lib/x86_64-linux-gnu/hdf5/serial /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5_hl.a /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.a -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/hdf5/serial
Checking https://packages.ubuntu.com, it appears that h5cc
sits in hdf5-helpers
. So, I think that if you'd add that to our last of apt packages, you can switch back to using the system libraries.
Next problem is installing SIRF-Exercises dependencies when we use conda...
we never had a problem with that in the past, so do we have one now? I guess
pip install
should still work?
pip install
works but wants to uninstall llmvlite
installed by conda and refuses to do it, so it breaks.
ah well. Probably a few ways around it:
conda install requirements.txt
as opposed to pip install requirements.txt
requirements.yml
for conda
in the SIRF-exercisesMaybe you can show a more complete error log?
@paskino, I suggest that we merge this PR after fixing the following:
hdf5-helpers
i.e. leave Gadgetron master for a separate PR (which you have already anyway).
This PR will get us to a working version on Ubuntu 22.04, which more and more people are trying of course.
ah well. Probably a few ways around it:
- use
conda install requirements.txt
as opposed topip install requirements.txt
- add a specific
requirements.yml
forconda
in the SIRF-exercises- pin version numbers (I guess python people would recommend that, but I generally try to avoid it as these things do get out-dated).
Maybe you can show a more complete error log?
The error log is here
The actual error is CondaValueError: could not parse 'numpy --only-binary=numpy' in: requirements.txt
So, conda install of the requirements of SIRF-Exercises fails because of that --only-binary=numpy
and then everything goes to pip
. Maybe the solution is to remove the --only-binary
stuff from https://github.com/SyneRBI/SIRF-Exercises/blob/master/requirements.txt
Maybe the solution is to remove the
--only-binary
stuff from https://github.com/SyneRBI/SIRF-Exercises/blob/master/requirements.txt
Definitely, already suggested by @ashgillman I believe, probably about 2 years ago.
My current solution works, but the failure is due to a no space left of device
problem on the GPU runner. Apparently the build of the sirf
target and the service
one are not separated in the GPU case.
closes #735 as it installs CMake 3.25
I just noticed that the GPU docker version was built using Ubuntu 22.04 while the non-GPU used Ubuntu 20.04.
I spotted it because the hdf5-helpers
package came from different repositories: Ubuntu 22.04 jammy for gpu and Ubuntu 20.04 focal for non-gpu.
Of course, and unfortunately, the Ubuntu 22.04 failed!
There are network problems? All jobs fail contacting archive.ubuntu.com
:(
I ca't believe it's (almost) all green!
Great!
Only failure is due to #770. Can we set that docker GPU job to "allow failure" for the moment?
Clean-up time! Don't forget to use paralleproj
1.2.13
Please create an issue for the HDF5 problem. I don't understand why it failed (I suspect it's something to do with cached docker files TBH), but we've run out of time to fix that now.
Please create an issue for the HDF5 problem. I don't understand why it failed (I suspect it's something to do with cached docker files TBH), but we've run out of time to fix that now.
I'm pretty sure it happens on my machine too.
Please create an issue for the HDF5 problem. I don't understand why it failed (I suspect it's something to do with cached docker files TBH), but we've run out of time to fix that now.
I'm pretty sure it happens on my machine too.
but not on the VM. Is h5cc
present?
STIR 5.1.0 is out! Please use that.
I think this PR is ready to be merged. I changed the base branch built in docker-compose.yml
to SyneRBI/master
, so it may fail on the docker builds.
I did update that, but didn't add to the commit. I'll merge now.
Final tweaks for release 3.4 of SIRF & Co.
With this PR we aim at updated versions: