coreylowman / cudarc

Safe rust wrapper around CUDA toolkit
Apache License 2.0
483 stars 65 forks source link

Safe API call to get device name and compute capability? #248

Closed polarathene closed 2 weeks ago

polarathene commented 3 weeks ago

With llama.cpp it outputs information to better identify my device (NVIDIA GeForce RTX 4060) and it's compute capability (8.9):

$ llama-bench -m Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf  \
  -r 1 -p 512 -n 0 -b 512 -pg 0,0 -fa 0,1

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: yes
| model                          |       size |     params | backend    | ngl |    n_batch |         fa |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | ---------: | ------------: | ---------------: |
| llama 7B Q4_K - Medium         |   4.07 GiB |     7.24 B | CUDA       |  99 |        512 |          0 |         pp512 |   1313.10 ± 0.00 |
| llama 7B Q4_K - Medium         |   4.07 GiB |     7.24 B | CUDA       |  99 |        512 |          1 |         pp512 |   1344.06 ± 0.00 |

Is this something the crate could provide a safe API for? Presently it looks like I'd have to delve into the unsafe API, but this is foreign to me.


The compute capability can be accessed easily enough through CudaDevice::attribute() (upstream cuDeviceGetAttribute) via the CUdevice_attribute_enum

  • _CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR: Major compute capability version number_
  • _CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR: Minor compute capability version number_

I assume the relevant call for the device name is this one (cuDeviceGetName)?:

CUresult cuDeviceGetName ( char* name, int len, CUdevice dev ) Returns an identifier string for the device.

It seems I should use cu_device()?

pub fn cu_device(&self) -> &CUdevice

Get the underlying sys::CUdevice of this CudaDevice.

Safety

While this function is marked as safe, actually using the returned object is unsafe.

You must not free/release the device pointer, as it is still owned by the CudaDevice.

Then somehow figure out how to call sys::Lib::cuDeviceGetName() and the parameters it wants?:

pub unsafe fn cuDeviceGetName(
    &self,
    name: *mut c_char,
    len: c_int,
    dev: CUdevice
) -> CUresult
polarathene commented 3 weeks ago

Thanks to the help from the Rust community on Discord, I was given guidance on how to use the FFI call lib().cuDeviceGetName().

use cudarc::driver::CudaDevice;
use cudarc::driver::sys::{
    lib,
    cudaError_enum,
    CUdevice_attribute_enum as Attribute,
};
// unsafe call needs to `CStr` to convert buffer into native `String` type:
use std::ffi::CStr;
// Simplify error handling:
use anyhow::Result;

fn main() -> Result<()> {
    let dev = CudaDevice::new(0)?;

    let device_index = dev.ordinal();
    let device_name = get_device_name(&dev)?;
    let (major, minor) = get_compute_capability(&dev)?;
    let supports_vmm = has_vmm_support(&dev)?;

    // Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: true
    println!("Device {device_index}: {device_name}, compute capability: {major}.{minor}, VMM: {supports_vmm}"); 

    Ok(())
}

fn get_device_name(dev: &CudaDevice) -> Result<String> {
    // A buffer with sufficient size to store the string
    let mut buffer = vec![0u8; 64];

    // These unsafe methods require the `Lib` struct, get a static ref via `sys::lib()`:
    let result = unsafe { lib().cuDeviceGetName(
        buffer.as_mut_ptr() as *mut i8, // <-- `name` expects mutable pointer to `buffer`
        buffer.capacity() as i32, // <-- `len` expects capacity of the `buffer`
        *dev.cu_device() // <-- `dev` requires to deref the returned `&CUdevice`
    )};

    // `CUresult` enum returned, verify operation was successful
    // and then return a `String` (_requires converting `buffer` to `CStr` => `&str` => `String`_)
    match result {
        cudaError_enum::CUDA_SUCCESS => {
            let device_name: String = CStr::from_bytes_until_nul(buffer.as_slice())?
                .to_str()?
                .to_owned();

            Ok(device_name)
        }
        _ => anyhow::bail!("Failed to query name of device: {}", dev.ordinal()),
    }
}

fn get_compute_capability(dev: &CudaDevice) -> Result<(u8, u8)> {
    let attr_major = Attribute::CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR;
    let attr_minor = Attribute::CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR;

    match (dev.attribute(attr_major), dev.attribute(attr_minor)) {
        (Ok(major), Ok(minor)) => Ok((major as u8, minor as u8)),
        _ => anyhow::bail!("Failed to query compute capability of device: {}", dev.ordinal()),
    }
}

fn has_vmm_support(dev: &CudaDevice) -> Result<bool> {
    let attr_vmm = Attribute::CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED;

    // i32 result, assume anything not 0 as `true`:
    Ok(dev.attribute(attr_vmm)? != 0)
}
[package]
name = "example-device-info"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0.86"
cudarc = { version = "0.11.4", features = ["cuda-12040"] }
$ cargo run
Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability: 8.9, VMM: true
polarathene commented 3 weeks ago

For reference ArrayFire has a similar set of API calls, although a bit opinionated. They also appear to have chosen a buffer length of 64 bytes for the device name 👍

I suppose ArrayFire may be more similar to Candle, so let me know if I should instead raise a request there for their cuda backend to implement similar to above.