getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

Extend setup recommendations #250

Closed gabrieldernbach closed 2 years ago

gabrieldernbach commented 2 years ago

I quickly wanted to check out the GMM from https://www.kernel-operations.io/keops/_auto_tutorials/gaussian_mixture/plot_gaussian_mixture.html but have troubles with creating a working environment.

In the standard pytorch docker container there is no g++, and even if installed the cuda.h is missing. I ended up with the quite large nvidia-developer containerdocker pull nvidia/cuda:11.0.3-devel-ubuntu20.04 which at least runs the import of pykeops error free.

With that setup I later get a segmentation fault running the GMM example.

[KeOps] Generating code for formula Max_SumShiftExpWeight_Reduction(Concat(-Sum(Var(2,4,1)*TensorProd(Var(0,2,0)-Var(1,2,1),Var(0,2,0)-Var(1,2,1))),Var(3,1,1)),0) ... Segmentation fault (core dumped)

Can you extend on how to setup an environment to run pykeops appropriately?

FROM nvidia/cuda:11.0.3-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive

RUN apt update; apt install python3-pip vim -y
RUN pip install numpy scipy scikit-learn scikit-image pandas jupyter matplotlib pykeops
RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
jeanfeydy commented 2 years ago

Hi @gabrieldernbach,

Thanks for your interest in this library! We were just discussing including a Dockerfile and a Singularity definition file yesterday, so you're on point :-)

As a quick fix for you, I have just uploaded a trimmed down version of the Singularity definition file that I use for my daily research work: Singularity.def. (It doesn't include a pip install pykeops since I usually work with a local copy of the keops repository.) You may copy-paste the lines that make sense for you, and e.g. don't use all the stuff that is related to R. For the sake of reproducibility, this image starts from a fresh Ubuntu install (22.04 now, but everything worked great with 18.04 too) and installs conda, PyTorch, CUDA by hand. I relied heavily on the official PyTorch dockerfile.

I will clean this environment, check that the documentation renders correctly on a AWS instance and translate it to a Dockerfile tomorrow.

I'm also quite surprised by your segmentation fault: this may be due to a bug in CUDA 11.0 that has been fixed in later revisions.

Best regards, Jean

jeanfeydy commented 2 years ago

Hi @gabrieldernbach, As an update: please note that we now provide a reference image on DockerHub and explain how to use it in the documentation. If you need anything else, feel free to re-open the issue! Best regards, Jean