Syllo / nvtop

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Other
7.7k stars 282 forks source link

[CUDA] Wrong per-process memory usage with managed memory #213

Open augustin64 opened 1 year ago

augustin64 commented 1 year ago

Hello. Thanks for this very useful tool. Here is the issue I have when running CUDA code with managed memory, It seems that the per-process list does not take managed memory into account. Here is a screenshot of what I get running the minimal example below:

20230527_1443-7781

#include <stdlib.h>
#include <unistd.h>

#define NUM_ALLOCS 1024*1024

__global__ void access_memory(int** buf) {
    int idx = threadIdx.x + blockDim.x*blockIdx.x;

    if (idx >= NUM_ALLOCS)
        return;

    buf[idx][0] = 0; //* Copies allocated memory to GPU memory
}

int main() {
    int** buffer;

    //* Allocate managed memory
    cudaMallocManaged(&buffer, NUM_ALLOCS, cudaMemAttachHost);
    for (int i=0; i < NUM_ALLOCS; i++) {
         cudaMallocManaged(&(buffer[i]), 49152, cudaMemAttachHost);
    }

    //* Access memory from the GPU
    dim3 gridSize(NUM_ALLOCS/1024 +1);
    dim3 blockSize(1024);

    access_memory<<<gridSize, blockSize>>>(buffer);

    sleep(10);
    //* Free Memory
    for (int i=0; i < NUM_ALLOCS; i++) {
        cudaFree(buffer[i]);
    }
    cudaFree(buffer);

    return 0;
}

Does the NVIDIA driver expose such information ? Would it be possible to use it ?

Syllo commented 1 year ago

I'll take a look at the documentation. Thanks for pointing that out.