Closed kaczmarj closed 2 years ago
i uploaded the docker image and i am trying to test it now
docker pull kaczmarj/nitorch:add-dockerfile
one major issue... to compile gpu-enabled pytorch extensions, an nvidia gpu is required when building the image. at the moment, this dockerfile compiled the cxx code but only for cpu.
Hi Jakub,
That seems odd to me. In the past, I have successfully compiled the GPU version using GitHub actions, and I doubt there is a GPU available on the images provided by GH.
What if we specify the list of architectures that we want to compile for using the env variable TORCH_CUDA_ARCH_LIST=all (see here). When that variable's value is "mine" (the default), I use torch to guess the capability of the current GPU (see here). I should probably change the behavior so that "all" is used when no GPU is available.
Thanks a lot for your help Yael
@balbasty - i agree with you. i added a commit that sets TORCH_CUDA_ARCH_LIST
. i use the same values as in the official pytorch docker image.
i also had to replace torch.cuda.is_available()
in setup_cext.py because that would return False when no gpu is detected.
@balbasty - this PR is ready for review now. i tried one of the demo notebooks on a gpu server (quadro rtx 8000 gpus) and the notebook ran successfully.
@kaczmarj Thanks so much!
One small question, what's the current best practice for nobrainer? Should I tag a specific commit?
@balbasty - yes tag a specific commit.
This dockerfile builds nitorch and compiles the extensions. The first stage builds the (compiled) wheel. The second build stage installs that wheel. The dockerfile bootstraps pytorch's official docker image. The first stage uses their "devel" image and the second image uses "runtime".
We should test that nitorch is usable and actually uses the compiled things. It is possible we will need to update the
LD_LIBRARY_PATH
in the second build stage.