Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Apache License 2.0
2.97k stars 112 forks source link

`Error: NotSupported` for `add` example inside docker container #74

Open jac-cbi opened 2 years ago

jac-cbi commented 2 years ago

All,

Today I followed the instructions at https://github.com/Rust-GPU/Rust-CUDA/blob/master/guide/src/guide/getting_started.md#docker and I appear to have a successful, running docker container for building Rust-CUDA.

The Nvidia tools seems to report successfully:

root@ad244cfbfe70:~/rust-cuda/examples/cuda/cpu/add# nvidia-smi
Mon Jun 13 19:34:30 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T1000        Off  | 00000000:06:00.0 Off |                  N/A |
| 82%   63C    P0    N/A /  50W |      0MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
root@ad244cfbfe70:~/rust-cuda/examples/cuda/cpu/add#

The host OS is Gentoo on x86_64, using this to install nvidia-container-runtime:

https://forums.gentoo.org/viewtopic-p-8469852.html?sid=2f635b28a650993b900c03245ade9029#8469852

IIUC, I've set the environment up correctly. However, I get the following when I try to run the add example:

$ docker run -it --gpus all -v $RUST_CUDA:/root/rust-cuda --entrypoint /bin/bash rust-cuda
root@ad244cfbfe70:/# cd ~/rust-cuda/examples/cuda/cpu/add/
root@ad244cfbfe70:~/rust-cuda/examples/cuda/cpu/add# cargo run
info: syncing channel updates for 'nightly-2021-12-04-x86_64-unknown-linux-gnu'
info: latest update on 2021-12-04, rust version 1.59.0-nightly (532d2b14c 2021-12-03)
info: downloading component 'cargo'
info: downloading component 'clippy'
info: downloading component 'llvm-tools-preview'
info: downloading component 'rust-docs'
info: downloading component 'rust-src'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: downloading component 'rustc-dev'
info: downloading component 'rustfmt'
info: installing component 'cargo'
info: installing component 'clippy'
info: installing component 'llvm-tools-preview'
info: installing component 'rust-docs'
info: installing component 'rust-src'
info: installing component 'rust-std'
info: installing component 'rustc'
info: installing component 'rustc-dev'
info: installing component 'rustfmt'
    Updating crates.io index
/**** SNIP ****/
  Downloaded 62 crates (9.8 MB) in 2.65s (largest was `curl-sys` at 3.0 MB)
   Compiling curl-sys v0.4.55+curl-7.83.1
   Compiling curl v0.4.43
   Compiling rustc_codegen_nvvm v0.3.0 (/root/rust-cuda/crates/rustc_codegen_nvvm)
   Compiling cuda_builder v0.3.0 (/root/rust-cuda/crates/cuda_builder)
   Compiling add v0.1.0 (/root/rust-cuda/examples/cuda/cpu/add)
    Finished dev [unoptimized + debuginfo] target(s) in 3m 49s
     Running `/root/rust-cuda/target/debug/add`
cust::quick_init(): NotSupported
Error: NotSupported
root@ad244cfbfe70:~/rust-cuda/examples/cuda/cpu/add#

The third to the last line is an error message I added, since Error: Not Supported isn't very helpful...

$ git diff
diff --git a/examples/cuda/cpu/add/src/main.rs b/examples/cuda/cpu/add/src/main.rs
index 8ced6476e9ba..fb52be41ba67 100644
--- a/examples/cuda/cpu/add/src/main.rs
+++ b/examples/cuda/cpu/add/src/main.rs
@@ -18,7 +18,13 @@ fn main() -> Result<(), Box<dyn Error>> {
     // initialize CUDA, this will pick the first available device and will
     // make a CUDA context from it.
     // We don't need the context for anything but it must be kept alive.
-    let _ctx = cust::quick_init()?;
+    let _ctx = match cust::quick_init() {
+        Ok(c) => c,
+        Err(e) => {
+            println!("cust::quick_init(): {:?}", e);
+            return Err(Box::new(e));
+        }
+    };

     // Make the CUDA module, modules just house the GPU code for the kernels we created.
     // they can be made from PTX code, cubins, or fatbins.

Is there anything I've missed?

jac-cbi commented 2 years ago

Ok, I'm quickly becoming not a fan of ? error handling. In cust::quick_init(), which line would throw NotSupported?

#[must_use = "The CUDA Context must be kept alive or errors will be issued for any CUDA function that is run"]
pub fn quick_init() -> CudaResult<Context> {
    init(CudaFlags::empty())?;
    let device = Device::get_device(0)?;
    let ctx = Context::new(device)?;
    ctx.set_flags(ContextFlags::SCHED_AUTO)?;
    Ok(ctx)
}