Closed eds-slim closed 4 years ago
I have now found out about the --nv
option which is clearly relevant. With this option the pipeline now crashes with
...................Allocated GPU # 0...................
thrust::system_error thrown in CudaVolume::common_assignment_from_newimage_vol after resize() with message: function_attributes(): after cudaFuncGetAttributes: invalid device function
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: invalid device function
Aborted (core dumped)
I've never successfully run qsiprep in singularity using the cuda version of eddy.
If using a gpu is very important for you, it's possible to run qsiprep without a container as long as you can install all the dependencies (ANTs, DSI Studio, MRtrix, etc.. it's a pain). You might also have some luck by making a dockerfile that starts with one of the ubuntu 16.04 images from here https://hub.docker.com/r/nvidia/cuda. In theory, you can just replace the FROM
statement at the beginning of the dockerfile.
If you open a PR I'd be happy to help try to figure it out. For now, the eddy_openmp that comes with the image works slowly but reliably
Great, thanks, I'll try to bootstrap the docker file from one of the CUDA-enabled ubuntu images.
While I'm at it, is there are reason you're using FSL 5.0.11 (which comes with CUDA 7.5), rather than a more recent version?
Would you recommend an upgrade? 0.7 will have an upgraded ANTs, Dipy and MRtrix, so maybe this would be a good time to update FSL also.
Yes, I would recommend updating to FSL-6. In addition to various technical improvements and optimisations, it also includes eddyqc
which is, as far as I can tell, one of the very few automated quality assessment tools for diffusion data.
I bootstrapped the DOCKERFILE
FROM nvidia/cuda:9.1-runtime-ubuntu16.04
without problems, and eddy_cuda9.1
from fsl-6.0.2-centos7_64.tar.gz
to run without further modifications. Unfortunately, neither DSI Studio, nor Convert3D could be downloaded from the URLs specified in the file so I haven't been able to test the rest of pipeline yet.
Given the small change necessary to the code, and the potentially huge speedup, I think it might be worthwhile to include CUDA support in the current, or one of the upcoming releases.
Thanks for your help!
Could you please try the dockerfile from here: https://github.com/PennBBL/qsiprep/blob/fsl6/Dockerfile. This has fixes for the missing downloads
The dockerfile seems to work after replacing the base image with nvidia/cuda:9.1-runtime-ubuntu16.04
and patching qsiprep/interfaces/eddy.py
in line 288 with
self._cmd = 'eddy_cuda9.1' if self.inputs.use_cuda else 'eddy_openmp'
rather than
self._cmd = 'eddy_cuda' if self.inputs.use_cuda else 'eddy_openmp'
The processing pipeline runs to completion and the html report looks reasonable to me.
I'm now running some of the reconstruction pipelines, but they shouldn't be affected by the changes in eddy
and fsl
I suppose.
That is awesome!! Can you confirm that it ran more quickly? I am thinking it makes the most sense to build the official qsiprep image using the nvidia/cuda:9.1-runtime-ubuntu16.04
, so those with gpus can use it if they want to and the openmp version should still work. Did you need to do anything special to build the docker image?
I felt much quicker, I'll try and time a few runs over the weekend.
Surprisingly, no further tweaking of the dockerfile or build process was necessary, so including CUDA support in the official image seems the right thing to do.
With FSL
upgraded to v6
it might also be a good idea to save the eddy_qc
output somewhere and/or include it in the html report, but that's probably not top priority.
I ran a few very non-scientific timing tests with the latest version built from the cuda:9.1
image, and including FSL 6. With "use_cuda": false,
in eddy_params.json
the preprocessing (not including connectome reconstruction) was about 2-3 times slower compared to "use_cuda": true,
(~70 min vs ~30 min for a single subject, both runs using 8 threads).
This is great news. It looks like the CI tests are all passing with the updated fsl. I think this is ready to merge
could you provide an example of the singularity command you used to run this? I'd like to add an example to the documentation
Absolutely. I used
singularity run -B /mnt/data/HCHS:/work --nv /tmp/test/qsipreptest-2019-11-22-00089af84b60.sif --participant_label 00012 -w /work --eddy_config /work/wdir/eddy_params.json --output-resolution 2 --skip_bids --fs-license-file /work/license.txt /work/wdir/bids50/ /work/wdir/bids50/derivatives/ participant
The key feature is obviously the --nv
option to the singularity run
command.
Merged an working on docker too!
Hi, I'm currently exploring
qsiprep v0.6.4
on Ubuntu 18.04 and encountered a problem with CUDA. Specifically, very early on, the pipeline throws the errorIn order to get this far, I had to manually link
libcudart.so.7.5
by settingexport SINGULARITYENV_LD_LIBRARY_PATH=/libs
and specifying-B /usr/local/cuda-7.5/lib64:/libs
in the call tosingularity
. Without it wouldn't find the CUDA runtime library and crash.On the host I have CUDA 9.1 and NVIDIA driver version
390.132
. Running the offending command (witheddy_cuda
replaced byeddy_cuda9.1
)works well.
Does the singularity container have a CUDA 7.5 dependency built-in? And how does this square with the observation that
eddy_cuda
seems to support only version8.0
and9.1
?Thanks for tying to help figure this out!