coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

CUDA Driver version #149

Closed mrocklin closed 2 years ago

mrocklin commented 3 years ago

So I'm playing with RAPIDS 2021.06, which requires a fairly recent CUDA driver. I create a software environment as follows

import coiled

# Create a software environment with GPU accelerated libraries
# and CUDA drivers installed
coiled.create_software_environment(
    name="rapids",
    container="gpuci/miniconda-cuda:11.2-runtime-ubuntu20.04",
    conda={
        "channels": [
            "rapidsai",
            "nvidia",
            "conda-forge",
            "defaults",
        ],
        "dependencies": [
            "rapids=21.06",
             "cudatoolkit=11.2",
             "cupy",
             "python=3.8",
        ],
    },
    pip=["afar"],
)

I'm finding that things work, but oddly...

I'm curious if there is maybe a driver mismatch. Do we have to match anything on the VM to the image?

necaris commented 3 years ago

Matthew Rocklin @.***> writes:

So I'm playing with RAPIDS 2021.06, which requires a fairly recent CUDA driver. I create a software environment as follows

import coiled

# Create a software environment with GPU accelerated libraries
# and CUDA drivers installed
coiled.create_software_environment(
    name="rapids",
    container="gpuci/miniconda-cuda:11.2-runtime-ubuntu20.04",
    conda={
        "channels": [
            "rapidsai",
            "nvidia",
            "conda-forge",
            "defaults",
        ],
        "dependencies": [
            "rapids=21.06",
             "cudatoolkit=11.2",
             "cupy",
             "python=3.8",
        ],
    },
    pip=["afar"],
)

I'm finding that things work, but oddly...

I'm curious if there is maybe a driver mismatch. Do we have to match anything on the VM to the image?

I believe the VM needs the underlying CUDA drivers to match, yes. I seem to recall the base Ubuntu VMs we're using only have CUDA 10? @selshowk may know more.

mrocklin commented 3 years ago

If so, would it be easy to change this to CUDA 11? Would it be easy to specify dynamically?

selshowk commented 3 years ago

Yes right now we're using cudatoolkit=10.2 (I think!). Pretty sure we can switch it to 11 by installing different packages on the VM. Making it dynamical is more tricky because we don't have a way to specify that now in the API and because the cuda version is baked into the AMIs we build. If we add some semantics in the API for multiple cuda versions (e.g. tied to the gpu flag) then we could, in principle, build multiple AMIs to support different cuda versions.

mrocklin commented 3 years ago

Short-term I would welcome VMs with version 11.

On Mon, Jul 19, 2021 at 2:07 PM selshowk @.***> wrote:

Yes right now we're using cudatoolkit=10.2 (I think!). Pretty sure we can switch it to 11 by installing different packages on the VM. Making it dynamical is more tricky because we don't have a way to specify that now in the API and because the cuda version is baked into the AMIs we build. If we add some semantics in the API for multiple cuda versions (e.g. tied to the gpu flag) then we could, in principle, build multiple AMIs to support different cuda versions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coiled/feedback/issues/149#issuecomment-882860105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTHW2XX2OWL3EPRXJNDTYSH2TANCNFSM5ARPKCXQ .

ntabris commented 2 years ago

The GPU support I'm adding now will use AMI with new drivers and CUDA (maybe CUDA 11.7 which just came out; this should work for any code built against any 11.x)