SyneRBI / SIRF-SuperBuild

SIRF CMake SuperBuild
http://www.ccpsynerbi.ac.uk
Apache License 2.0
15 stars 18 forks source link

Release 3.4 #787

Closed paskino closed 1 year ago

paskino commented 1 year ago

Final tweaks for release 3.4 of SIRF & Co.

With this PR we aim at updated versions:

paskino commented 1 year ago

Currently failing all pytest unit tests like

 ctest -V -R REG_TESTS_PYTHON
UpdateCTestConfiguration  from :/opt/SIRF-SuperBuild/builds/SIRF/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/opt/SIRF-SuperBuild/builds/SIRF/build/DartConfiguration.tcl
Test project /opt/SIRF-SuperBuild/builds/SIRF/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 6
    Start 6: REG_TESTS_PYTHON

6: Test command: /opt/conda/bin/python3 "-m" "pytest" "--cov=sirf.Reg" "--cov-config=.coveragerc-Reg" "src/Registration/pReg"
6: Test timeout computed to be: 10000000
6: ERROR: usage: __main__.py [options] [file_or_dir] [file_or_dir] [...]
6: __main__.py: error: unrecognized arguments: --cov=sirf.Reg --cov-config=.coveragerc-Reg
6:   inifile: None
6:   rootdir: /opt/SIRF-SuperBuild/sources/SIRF
6:
1/1 Test #6: REG_TESTS_PYTHON .................***Failed    0.22 sec
paskino commented 1 year ago

errors are due to the missing pytest-cov package

paskino commented 1 year ago

I moved to Ubuntu 22.04 and stumbled upon some errors with CUDA while building parallelproj.

nvcc fatal   : Unsupported gpu architecture 'compute_86+PTX'

I also upgraded CMake to 3.25.1 because with 3.17 there was the following error:

CMake Error: Error required internal CMake variable not set, cmake may not be built correctly. Missing variable is: CMAKE_CUDA17_EXTENSION_COMPILE_OPTION
KrisThielemans commented 1 year ago

building which project?

paskino commented 1 year ago

building which project?

parallelproj

KrisThielemans commented 1 year ago

Possible @gschramm can help, but a quick look at the internet says this is a conflict between driver and toolkit version

paskino commented 1 year ago

using parallelproj v1.2.11 solved the issue, now the next one is

Linking CXX executable ismrmrd_test_xml
ESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_global_init@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_perform@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_slist_free_all@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_setopt@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_init@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_slist_append@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_easy_cleanup@CURL_OPENSSL_4'
ESC[0mESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_global_cleanup@CURL_OPENSSL_4'
KrisThielemans commented 1 year ago

All non-CUDA jobs are failing because parallelproj by defaults requires nvcc, see e.g. https://github.com/SyneRBI/SIRF-SuperBuild/actions/runs/3849192276/jobs/6557897239#step:7:507 and https://github.com/gschramm/parallelproj/issues/22. If no CUDA, we need to build it with SKIP_CUDA_LIB. Easy?

KrisThielemans commented 1 year ago
> ESC[91m/usr/bin/ld: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so: undefined reference to `curl_global_init@CURL_OPENSSL_4'

I guess this is on a local build as I don't see it in any of the logs. What configuration is this? (As a 22.04 VM builds, and @ckolbPTB builds (of course it's the SB master version on docker 22.04 https://github.com/ckolbPTB/OpenSourceMrRecon/blob/main/Dockerfile)

This might be an Ubuntu (possibly it distributes built curl without openssl) or CMake problem (possibly it doesn't know what libraries it should pass).

paskino commented 1 year ago

Yes it is a local build.

This is a Ubuntu22.04 from a 11.7.1-cudnn8-devel-ubuntu22.04 NVIDIA image.

The build command is

docker build --build-arg BASE_IMAGE=nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 \
  --build-arg PYTHON_INSTALL_DIR=/opt/conda --build-arg SIRF_SB_URL="https://github.com/paskino/SIRF-SuperBuild" \
  --build-arg SIRF_SB_TAG="fix_prerelease3.4" --build-arg NUM_PARALLEL_BUILDS=1 --target sirf .

With USE_SYSTEM_HDF5=ON I get the linking error.

I will try to install libcurl4-openssl-dev although I thought I checked and verified it was already installed.

KrisThielemans commented 1 year ago

Probably unrelated, but in STIR GHA tests, I suddenly get

E: Failed to fetch http://azure.archive.ubuntu.com/ubuntu/pool/main/c/curl/libcurl4-openssl-dev_7.81.0-1ubuntu1.6_amd64.deb  404  Not Found [IP: 40.119.46.219 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

This library is installed because of HDF5. So maybe they're fixing something. (I had to apt update first)

paskino commented 1 year ago

Yes it was installed

Step 62/76 : RUN apt-get install -y libcurl4-openssl-dev
 ---> Running in 5c8a5c97be87
Reading package lists...
Building dependency tree...
Reading state information...
libcurl4-openssl-dev is already the newest version (7.81.0-1ubuntu1.6).
libcurl4-openssl-dev set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.

However, on the ubuntu page the version is 7.81.0-1ubuntu1.7 and it is a security update. I guess we need to run an apt upgrade before cmaking.

paskino commented 1 year ago

Ah, I understand why I had the 7.81.0-1ubuntu1.6 version instead of 7.81.0-1ubuntu1.7 of libcurl4-openssl-dev!

This is because I built using cached images. So my image, although it had run apt-get update, it had done it when the current version was 1.6. Let's hope this fixes the problem.

paskino commented 1 year ago

It seems that we should ~link~ build ISMRMRD adding -lcurl and it works.

How can I do this?

KrisThielemans commented 1 year ago

It seems that we should ~link~ build ISMRMRD adding -lcurl and it works.

Only necessary for ISMRMRD and not for SIRF? Weird.

I doubt they use curl directly, so think this is a CMake bug. But I cannot understand why you see this in your docker image and we don't have it elsewhere. Which version of CMake are you using in your docker instance?

How can I do this?

We could force the linker flag CMAKE_EXE_LINKER_FLAGS (ugly), for HDF5_LIBRARIES variable (ugly) or use our own clone for ISMRMRD (not nice). But as they don't have this in CI, nor we as far as I understand, I'd want to understand better why you have it...

paskino commented 1 year ago

The problem seems to arise when using the system HDF5 library, not if we build our own HDF5. I just fixed it for ISMRMRD and it appeared for Gadgetron.

My CMake is v3.25.1

KrisThielemans commented 1 year ago

The problem seems to arise when using the system HDF5 library, not if we build our own HDF5. I just fixed it for ISMRMRD and it appeared for Gadgetron.

My CMake is v3.25.1

strange. definitely a CMake bug then. Can you post some log when building ISMRMRD after find_package(HDF5) (I think we have the HDF5_FIND_DEBUG flag on such that it writes diagnostics, but if not, please run with it

paskino commented 1 year ago
-- SuperBuild -     HDF5[OK]
-- HDF5 C compiler wrapper is unable to compile a minimal HDF5 program.
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so (found suitable version "1.10.7", minimum required is "1.8")
-- USING the system HDF5, found HDF5_INCLUDE_DIRS=/usr/include/hdf5/serial, HDF5_C_LIBRARY_hdf5=,HDF5_LIBRARIES=/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- HDF5_CMAKE_ARGS=-DHDF5_INCLUDE_DIRS:PATH=/usr/include/hdf5/serial;-DHDF5_LIBRARIES:STRING=/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;-DHDF5_C_LIBRARIES:STRING=;-DHDF5_FIND_DEBUG:BOOL=ON

For ISMRMRD

[ 18%] Performing configure step for 'ISMRMRD'
loading initial cache file /opt/SIRF-SuperBuild/builds/ISMRMRD/tmp/ISMRMRD-cache-Release.cmake
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib/ccache/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/lib/ccache/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- HDF5 C compiler wrapper is unable to compile a minimal HDF5 program.
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so (found version "1.10.7") found components: C
-- HDF5_DIR: HDF5_DIR-NOTFOUND
-- HDF5_DEFINITIONS:
-- HDF5_INCLUDE_DIRS: /usr/include/hdf5/serial
-- HDF5_LIBRARIES: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- HDF5_HL_LIBRARIES:
-- HDF5_C_DEFINITIONS:
-- HDF5_C_INCLUDE_DIR: /usr/include/hdf5/serial
-- HDF5_C_INCLUDE_DIRS: /usr/include/hdf5/serial
-- HDF5_C_LIBRARY:
-- HDF5_C_LIBRARIES: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- HDF5_C_HL_LIBRARY:
-- HDF5_C_HL_LIBRARIES:
-- Defined targets (if any):
-- ... hdf5::hdf5
mHDF5 found at: /usr/include/hdf5/serial
HDF5 found at: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Found Doxygen: /usr/bin/doxygen (found version "1.9.1") found components: doxygen dot
   Building info application
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found suitable version "1.74.0", minimum required is "1.43") found components: program_options random
-- FFTW3 UNIX libraries: /usr/lib/x86_64-linux-gnu/libfftw3.so/usr/lib/x86_64-linux-gnu/libfftw3f.so
   FFTW3 and Boost Found... building utilities
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3f.so
CMake Deprecation Warning at examples/c/CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found suitable version "1.74.0", minimum required is "1.43") found components: unit_test_framework
   CPACK_PACKAGING_INSTALL_PREFIX: /opt/SIRF-SuperBuild/INSTALL
-- Found CPack generators: DEB
-- Configuring done
-- Generating done
paskino commented 1 year ago

Obviously adding -lcurl doesn't work on Ubuntu < 22.04 https://github.com/SyneRBI/SIRF-SuperBuild/actions/runs/3874884037/jobs/6606658793#step:7:3764

KrisThielemans commented 1 year ago

Please upgrade boost (like in #438). 1.72 gives problems with ISMRMD.

KrisThielemans commented 1 year ago

The nvcc fatal : Unsupported gpu architecture 'compute_86+PTX' issue is not related to parallelproj version. We tried on an Ubuntu 22.04 system with parallelproj 1.2.12. I still think this is a conflict between nvcc versions etc.

Currently we're trying with CUDA toolkit 12.0 from the NVIDIA website. Note that this says to use apt install cuda, using apt install nvidia-cuda-toolkit then creates a conflict as it installs nvcc 11.5 back again. Will let you know.

KrisThielemans commented 1 year ago

Currently we're trying with CUDA toolkit 12.0 from the NVIDIA website. Note that this says to use apt install cuda, using apt install nvidia-cuda-toolkit then creates a conflict as it installs nvcc 11.5 back again. Will let you know.

apt install cuda didn't install nvcc, and it says you need apt install nvidia-cuda-toolkit anyway. But that downgraded. Oh well.

paskino commented 1 year ago

Wait @KrisThielemans ! The problem is that the the docker build was using a Ubuntu 20.04 image as base. I changed to Ubuntu 22.04 and it builds. Currently it fails installing jupyter, but I'm probably close to fix it.

I will need to upgrade Boost.

paskino commented 1 year ago

Next problem is installing SIRF-Exercises dependencies when we use conda...

paskino commented 1 year ago

An attempt to build Gadgetron master is https://github.com/paskino/SIRF-SuperBuild/pull/4/

KrisThielemans commented 1 year ago

Next problem is installing SIRF-Exercises dependencies when we use conda...

we never had a problem with that in the past, so do we have one now? I guess pip install should still work?

KrisThielemans commented 1 year ago

@paskino what's the situation with HDF5 now? I guess you're building your own? I've checked in my 22.04 VM and spotted the difference that explains why it works on the VM and not for you. On my VM, the CMake output is

irfuser@vagrant:~/devel/buildVM/builds/ISMRMRD/build$ cmake .
-- HDF5: Using hdf5 compiler wrapper to determine C configuration
-- HDF5_DIR: HDF5_DIR-NOTFOUND
-- HDF5_DEFINITIONS: 
-- HDF5_INCLUDE_DIRS: /usr/include/hdf5/serial
-- HDF5_LIBRARIES: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libcrypto.so;/usr/lib/x86_64-linux-gnu/libcurl.so;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/libsz.so;/home/sirfuser/devel/install/lib/libz.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so
...

The first line says that it uses h5cc to find the options, while yours says it couldn't use the compiler wrapper. Indeed, on my VM I have

$ h5cc -show
gcc -I/usr/include/hdf5/serial -L/usr/lib/x86_64-linux-gnu/hdf5/serial /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5_hl.a /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.a -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/hdf5/serial

Checking https://packages.ubuntu.com, it appears that h5cc sits in hdf5-helpers. So, I think that if you'd add that to our last of apt packages, you can switch back to using the system libraries.

paskino commented 1 year ago

Next problem is installing SIRF-Exercises dependencies when we use conda...

we never had a problem with that in the past, so do we have one now? I guess pip install should still work?

pip install works but wants to uninstall llmvlite installed by conda and refuses to do it, so it breaks.

KrisThielemans commented 1 year ago

ah well. Probably a few ways around it:

Maybe you can show a more complete error log?

KrisThielemans commented 1 year ago

@paskino, I suggest that we merge this PR after fixing the following:

i.e. leave Gadgetron master for a separate PR (which you have already anyway).

This PR will get us to a working version on Ubuntu 22.04, which more and more people are trying of course.

paskino commented 1 year ago

ah well. Probably a few ways around it:

  • use conda install requirements.txt as opposed to pip install requirements.txt
  • add a specific requirements.yml for conda in the SIRF-exercises
  • pin version numbers (I guess python people would recommend that, but I generally try to avoid it as these things do get out-dated).

Maybe you can show a more complete error log?

The error log is here

The actual error is CondaValueError: could not parse 'numpy --only-binary=numpy' in: requirements.txt So, conda install of the requirements of SIRF-Exercises fails because of that --only-binary=numpy and then everything goes to pip. Maybe the solution is to remove the --only-binary stuff from https://github.com/SyneRBI/SIRF-Exercises/blob/master/requirements.txt

KrisThielemans commented 1 year ago

Maybe the solution is to remove the --only-binary stuff from https://github.com/SyneRBI/SIRF-Exercises/blob/master/requirements.txt

Definitely, already suggested by @ashgillman I believe, probably about 2 years ago.

paskino commented 1 year ago

My current solution works, but the failure is due to a no space left of device problem on the GPU runner. Apparently the build of the sirf target and the service one are not separated in the GPU case.

paskino commented 1 year ago

closes #735 as it installs CMake 3.25

paskino commented 1 year ago

I just noticed that the GPU docker version was built using Ubuntu 22.04 while the non-GPU used Ubuntu 20.04.

I spotted it because the hdf5-helpers package came from different repositories: Ubuntu 22.04 jammy for gpu and Ubuntu 20.04 focal for non-gpu.

Of course, and unfortunately, the Ubuntu 22.04 failed!

paskino commented 1 year ago

There are network problems? All jobs fail contacting archive.ubuntu.com :(

paskino commented 1 year ago

Looks like that in Ubuntu22.04 the hdf5-helpers trick does not work.

paskino commented 1 year ago

I ca't believe it's (almost) all green!

KrisThielemans commented 1 year ago

Great!

Only failure is due to #770. Can we set that docker GPU job to "allow failure" for the moment?

Clean-up time! Don't forget to use paralleproj 1.2.13

KrisThielemans commented 1 year ago

Please create an issue for the HDF5 problem. I don't understand why it failed (I suspect it's something to do with cached docker files TBH), but we've run out of time to fix that now.

paskino commented 1 year ago

Please create an issue for the HDF5 problem. I don't understand why it failed (I suspect it's something to do with cached docker files TBH), but we've run out of time to fix that now.

I'm pretty sure it happens on my machine too.

KrisThielemans commented 1 year ago

Please create an issue for the HDF5 problem. I don't understand why it failed (I suspect it's something to do with cached docker files TBH), but we've run out of time to fix that now.

I'm pretty sure it happens on my machine too.

but not on the VM. Is h5cc present?

KrisThielemans commented 1 year ago

STIR 5.1.0 is out! Please use that.

paskino commented 1 year ago

I think this PR is ready to be merged. I changed the base branch built in docker-compose.yml to SyneRBI/master, so it may fail on the docker builds.

paskino commented 1 year ago

I did update that, but didn't add to the commit. I'll merge now.