Open olevski opened 1 year ago
Just one thing to be aware of here - torch
bring its own cuda and we have seen issues in the past in which there was a cuda version provided in a container image which conflicted with torch
's cuda version. In this case Till was actually able to run his job on a container image which had no cuda support (as pip install torch
installed all the cuda stuff he required) but when he tried an image which had pre-baked cuda, it did not work as there was a conflict between preinstalled cuda and that which torch installed.
@seanrmurphy this is good to know. If we can fully retire these cuda images I will be really happy. Firat asked for this so I will let him know. I thought it was impossible for torch to pull in all the requirements simply by doing pip install torch
.
Torch does this but tensorflow does not afaik. Imo, this is one of those cases where older stuff needs the cuda images but newer stuff does not.
Currently we have no templates with CUDA images. But we build CUDA images and publish them.
This makes using GPUs on renku complicated. The main reason is that when you want to use a GPU you have to:
It also makes updating the image a problem because you have to repeat the whole process.
If we publish project templates with CUDA a lot of these issues go away.