NVlabs / NVBit

198 stars 18 forks source link

Support for CUDA dynamic parallelism #120

Open nayakajay opened 10 months ago

nayakajay commented 10 months ago

I was trying to run some dynamic parallelism programs with the mem_trace tool (program example given below). It seems like the tool ignores the memory accesses that happen in the child kernel. Does NVBit support dynamic parallelism? If yes, what changes must be added to the mem_trace tool for things to work.

Providing a dummy example below:

__global__ void child(int *a) {
    a[threadIdx.x] = clock();
}

__global__ void parent(int *a) {
    child<<<1,32>>>(a);
}

Here, the memory accesses that happen in the child kernel are not printed.