Closed fleurgaudfernau closed 2 years ago
Hi @fleurgaudfernau,
Thanks for your issue! I am currently working on making Deformetrica compatible with our KeOps v2.0 beta version: I will keep you updated on this thread. Out of curiosity, on which computer are you trying to run this experiment? I am currently trying to set things up for Tom Boeken, who has to go through Docker/Singularity on the GPU cluster of the Institut Pasteur. If you have a similar configuration, I may send you a quick guide to deploy Deformetrica + KeOps v2 correctly.
Best regards, Jean
Hello @jeanfeydy
Thank you for your answer! Unfortunately the few Deformetrica users are scattered over different machines I think. I am using one of the CMAP. However, it would be very helpful if you could send me your guide anyways. (I have been using Deformetrica for over a year without encountering any issue, so it is likely I messed things up with some bad command and it is very frustrating).
Kind regards, Fleur
Hello @jeanfeydy and @fleurgaudfernau, did you happen to solve this issue? I'm hoping to use Deformetrica with a singulartiy container on the GPU cluster at the university of Bern, and I keep running in to the same RuntimeError. Best regards, Alexandra
Hi @oswalda-10, You can try this simple solution found by @jeanfeydy and which worked for me: install deformetrica in new environment install pykeops version 2.0b: pip install pykeops==2.0b. Best, Fleur
Hi @fleurgaudfernau, thank you for the tip! Did you have to update anything else from your configuration? With this setup the GPU isn't found anymore.
Requirement already satisfied: deformetrica in ./.local/lib/python3.8/site-packages (4.3.0)
Requirement already satisfied: matplotlib>=2.2.2 in ./.local/lib/python3.8/site-packages (from deformetrica) (3.5.2)
Requirement already satisfied: numpy>=1.16.2 in ./.local/lib/python3.8/site-packages (from deformetrica) (1.23.1)
Requirement already satisfied: torchvision==0.7 in ./.local/lib/python3.8/site-packages (from deformetrica) (0.7.0)
Requirement already satisfied: PyQt5 in ./.local/lib/python3.8/site-packages (from deformetrica) (5.15.7)
Requirement already satisfied: pillow>=5.4.1 in ./.local/lib/python3.8/site-packages (from deformetrica) (9.2.0)
Requirement already satisfied: vtk>=8.2.0 in ./.local/lib/python3.8/site-packages (from deformetrica) (9.1.0)
Requirement already satisfied: psutil>=5.4.8 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from deformetrica) (5.9.1)
Requirement already satisfied: torch==1.6 in ./.local/lib/python3.8/site-packages (from deformetrica) (1.6.0)
Requirement already satisfied: scikit-learn>=0.20.3 in ./.local/lib/python3.8/site-packages (from deformetrica) (1.1.1)
Requirement already satisfied: nibabel>=2.3.3 in ./.local/lib/python3.8/site-packages (from deformetrica) (4.0.1)
Requirement already satisfied: GPUtil in ./.local/lib/python3.8/site-packages (from pykeops==1.4.1->deformetrica) (1.4.0)
Requirement already satisfied: future in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from torch==1.6->deformetrica) (0.18.2)
Requirement already satisfied: pyparsing>=2.2.1 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from matplotlib>=2.2.2->deformetrica) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.8/site-packages (from matplotlib>=2.2.2->deformetrica) (4.34.4)
Requirement already satisfied: kiwisolver>=1.0.1 in ./.local/lib/python3.8/site-packages (from matplotlib>=2.2.2->deformetrica) (1.4.4)
Requirement already satisfied: python-dateutil>=2.7 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from matplotlib>=2.2.2->deformetrica) (2.8.2)
Requirement already satisfied: cycler>=0.10 in ./.local/lib/python3.8/site-packages (from matplotlib>=2.2.2->deformetrica) (0.11.0)
Requirement already satisfied: packaging>=20.0 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from matplotlib>=2.2.2->deformetrica) (21.3)
Requirement already satisfied: setuptools in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from nibabel>=2.3.3->deformetrica) (61.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from scikit-learn>=0.20.3->deformetrica) (3.1.0)
Requirement already satisfied: joblib>=1.0.0 in ./.local/lib/python3.8/site-packages (from scikit-learn>=0.20.3->deformetrica) (1.1.0)
Requirement already satisfied: scipy>=1.3.2 in ./.local/lib/python3.8/site-packages (from scikit-learn>=0.20.3->deformetrica) (1.8.1)
Requirement already satisfied: wslink>=1.0.4 in ./.local/lib/python3.8/site-packages (from vtk>=8.2.0->deformetrica) (1.6.6)
Requirement already satisfied: PyQt5-Qt5>=5.15.0 in ./.local/lib/python3.8/site-packages (from PyQt5->deformetrica) (5.15.2)
Requirement already satisfied: PyQt5-sip<13,>=12.11 in ./.local/lib/python3.8/site-packages (from PyQt5->deformetrica) (12.11.0)
Requirement already satisfied: six>=1.5 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2.2->deformetrica) (1.16.0)
Requirement already satisfied: aiohttp<4 in ./.local/lib/python3.8/site-packages (from wslink>=1.0.4->vtk>=8.2.0->deformetrica) (3.8.1)
Requirement already satisfied: frozenlist>=1.1.1 in ./.local/lib/python3.8/site-packages (from aiohttp<4->wslink>=1.0.4->vtk>=8.2.0->deformetrica) (1.3.0)
Requirement already satisfied: cmake==3.16.3 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (3.16.3)
Collecting pykeops==2.0b
Using cached pykeops-2.0b0-py3-none-any.whl
Requirement already satisfied: numpy in ./.local/lib/python3.8/site-packages (from pykeops==2.0b) (1.23.1)
Requirement already satisfied: pybind11 in ./.conda/envs/deformetrica_2/lib/python3.8/site-packages (from pykeops==2.0b) (2.10.0)
Installing collected packages: pykeops
Attempting uninstall: pykeops
Found existing installation: pykeops 1.4.1
Uninstalling pykeops-1.4.1:
Successfully uninstalled pykeops-1.4.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
deformetrica 4.3.0 requires pykeops==1.4.1, but you have pykeops 2.0b0 which is incompatible.
Successfully installed pykeops-2.0b0
INFO:__main__:Setting output directory to: /storage/homefs/ao11a087/control_output/IN905R_Scapula
[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
Logger has been set to: INFO
`
>> No initial CP spacing given: using diffeo kernel width of 15.0
OMP_NUM_THREADS found in environment variables. Using value OMP_NUM_THREADS=1
>> No specified state-file. By default, Deformetrica state will by saved in file: /storage/homefs/ao11a087/control_output/IN905R_Scapula/deformetrica-state.p.
>> Using a Sobolev gradient for the template data with the ScipyLBFGS estimator memory length being larger than 1. Beware: that can be tricky.
[pyKeOps] Warning : keyword argument cuda_type in Genred is deprecated ; argument is ignored.
>> Set of 385 control points defined.
>> Momenta initialized to zero, for 16 subjects.
>> Started estimator: ScipyOptimize
>> Scipy optimization method: L-BFGS-B<<
------------------------------------- Iteration: 1 -------------------------------------
which then crashes because of `KeyError: 'nvrtc'
You're welcome! Could you try using the GradientAscent estimator instead? I think this error only occurs with the L-BFGS-B estimator (which I don't use). Fleur
I tried the GradientAscent estimator, however the error is the same. I still think somehow there is a compatibility issue somewhere within my versions of CUDA, pykeops, g++ or something else that I havn't found yet. I unfortunately don't have a Nvidia graphics card locally, so it makes it hard to debug on the cluster where I have to use the singularity container.
Hi @oswalda-10,
Thanks for your interest in the library! In order to get your setup working, we should proceed in two steps:
With respect to point 1, I have added instructions for Singularity in the documentation.
The simplest option is simply to clone our official image on DockerHub, but you can also build your own by looking at our Dockerfile if you need something specific.
In order to make sure that KeOps can find your CUDA installation, you may need to set the CUDA_PATH environment variable with export CUDA_PATH=...
as in e.g. our DockerFile. This is often required on shared scientific clusters, that host several CUDA installations in non-standard folders.
Also: don't forget to launch your Singularity container with the --nv
option, or the GPU won't be available.
With respect to point 2, the core problem is that Deformetrica hasn't been maintained for more than a year now. (The Aramis Inria team has shifted interests from computational anatomy to longitudinal statistics.) I am going to get it back up with a full-time engineer from February 2023 onwards - but currently, we have to rely on workarounds. This is what was described by @fleurgaudfernau (thanks!).
What do you think? Best regards, Jean
Hi Jean, thank you so much for your help. I loaded the official image onto the cluster, but I'm still having a hard time getting pykeops to find CUDA. Within my newly created conda environement, loaded from singularity if I run:
import pykeops
I get the warning:
[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
however I did set the CUDA_PATH and the LD_LIBRARY_PATH
print(os.environ.get('CUDA_PATH'))
/software.el7/software/CUDA/11.3.1/bin
print(os.environ.get('LD_LIBRARY_PATH'))
/software.el7/software/CUDA/11.3.1/lib64:/software.el7/software/CUDA/11.3.1/lib64
I also think there might be some problem with the cpp compiler? when I try to run pykeops.test_torch_bindings() I get the error
software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold: /lib64/libstdc++.so.6: version 'GLIBCXX_3.4.21' not found (required by /software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold)
/software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold: /lib64/libstdc++.so.6: version 'CXXABI_1.3.9' not found (required by /software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold)
/software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold: /lib64/libstdc++.so.6: version 'GLIBCXX_3.4.20' not found (required by /software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold)
/software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold: /lib64/libstdc++.so.6: version 'CXXABI_1.3.8' not found (required by /software.el7/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld.gold)
collect2: error: ld returned 1 exit status
ModuleNotFoundError: No module named 'pykeops_cpp_c66979fc2a'
Have you seen this before? Best, Alexandra
Hi @oswalda-10,
You're very welcome!
To be clear: the official "keops-full" Docker image contains everything you need to run KeOps.
It includes a separate copy of CUDA and PyTorch (which is why the image weighs 5-6 Gb) in the /opt/conda
folder.
Normally, you should be able to log in your Docker/Singularity container with something like:
singularity shell -H ~/singularity_homes/keops-full/:/home ~/containers/keops-full.sif
and simply type python
to open an interactive session where import pykeops; pykeops.test_torch_bindings()
works.
You don't have to re-set the CUDA_PATH by hand: this is useful if you are working with your own configuration, but with the official Docker image, everything has already been taken care of. This should also fix the compiler issue.
Finally, please note that if you want to use a custom Singularity file instead of our official image (which is fine, of course), CUDA_PATH should be such that $CUDA_PATH/include/cuda.h
and $CUDA_PATH/include/nvrtc.h
exist. So on your cluster, something closer to /software.el7/software/CUDA/11.3.1/
, without the /bin
suffix. Of course, you should check that these files are accessible from your Singularity shell, and possibly use the --bind
option to link them from the host filesystem to the Singularity container.
What do you think?
I think it is almost working! At least, I did manage to get the pykeops.check_torch_bindings() to work with gpu. I also manged to install deformetrica. For my project, I need to use the varifold distance between point clouds, and when I try to run this specific code, there is a TypeError: (I tried with keops2.0b aswell as keops2.1)
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/deformetrica/core/model_tools/attachments/multi_object_attachment.py", line 138, in varifold_scalar_product
return torch.dot(areaa.view(-1), kernel.convolve((x, nalpha), (y, nbeta), areab.view(-1, 1), mode='varifold').view(-1))
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/deformetrica/support/kernels/keops_kernel.py", line 123, in convolve
res = self.varifold_convolve[d - 2](gamma, x.contiguous(), y.contiguous(), nx.contiguous(), ny.contiguous(), p.contiguous(), device_id=device_id)
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 624, in __call__
out = GenredAutograd.apply(
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 78, in forward
myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
File "/opt/conda/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 68, in __call__
obj = self.cls(*args)
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 15, in __init__
super().__init__(*args, fast_init=fast_init)
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 18, in __init__
self.init(*args)
File "/storage/homefs/ao11a087/.local/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 126, in init
) = get_keops_dll(
File "/opt/conda/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 27, in __call__
self.library[str_id] = self.fun(*args)
File "/opt/conda/lib/python3.8/site-packages/keopscore/get_keops_dll.py", line 93, in get_keops_dll_impl
red_formula = GetReduction(red_formula_string, aliases)
File "/opt/conda/lib/python3.8/site-packages/keopscore/formulas/GetReduction.py", line 27, in __new__
reduction = eval(red_formula_string, globals(), aliases_dict)
File "<string>", line 1, in <module>
TypeError: __new__() takes 3 positional arguments but 4 were given
Have you seen this one before? Thanks again.
Hi @oswalda-10,
Thanks for your report: there was a big bug in the definition of the WeightedSqDist
operator that we introduced during the switch to KeOps 2.0!
I have just fixed it, and the update will be included in future releases of KeOps. I cannot push a new v2.1.1 on PyPi now (@bcharlier has the password for the repo and I don't) as well as a new official Docker image. So until September, you can get a working configuration by:
git clone --recursive https://github.com/getkeops/keops.git ~/keops
/opt/keops
inside your Singularity image. The command you use to log into your Singularity container should look like:
singularity exec \
-H ~/singularity_homes/keops-test/:/home \
--bind ~/keops:/opt/keops \
--nv \
keops-full.sif \
bash
What do you think?
Hi @jeanfeydy, Thank you so much for all your help! It's working on the cluster now. Best, Alex
Perfect, you're very welcome! Good luck for your project then, and feel free to re-open an issue if needed. Best, Jean
Dear all,
I have been encountering the following error when using PyKeops:
This error seems quite common, however I could not managed to fix it by following the advice on the other issues. It is important to note that I use the software Deformetrica and so I need to use specific package versions. To solve this problem, I followed the following instructed: https://gitlab.com/icm-institute/aramislab/deformetrica/-/issues/72 Which temporarily solved the problem, which then came back (sorry I cannot be more specific, but I have no idea what triggered this issue).
Configuration:
gcc : 7.5.0 g++ : 7.5.0 cmake : 3.10.2 Nvidia Driver: 455.32.00 CUDA Version: 11.1 (but I use version 10 using the command "module load cuda/10" in my environment before launching Deformetrica because apparently the keops version I use is not compatible with cuda 11+)
Deformetrica 4.3.0 Python 3.6.8 GPUtil 1.4.0 pip 21.3.1 pykeops 1.4.1 PyQt5 5.15.6 PyQt5-Qt5 5.15.2 PyQt5-sip 12.9.1 torch 1.6.0 torchvision 0.7.0 vtk 9.1.0
Code:
Thank you in advance for your help!