NVlabs / NVBit

224 stars 21 forks source link

Support for CUDA dynamic parallelism #120

Open nayakajay opened 1 year ago

nayakajay commented 1 year ago

I was trying to run some dynamic parallelism programs with the mem_trace tool (program example given below). It seems like the tool ignores the memory accesses that happen in the child kernel. Does NVBit support dynamic parallelism? If yes, what changes must be added to the mem_trace tool for things to work.

Providing a dummy example below:

__global__ void child(int *a) {
    a[threadIdx.x] = clock();
}

__global__ void parent(int *a) {
    child<<<1,32>>>(a);
}

Here, the memory accesses that happen in the child kernel are not printed.