Closed xinyi-li7 closed 2 years ago
The mechanism you describe should work. In which point are you allocating the device memory? If you attach a very small example reproducing the error I can take a look at it.
Hi Oriste, Thanks for your response! Sure, I just attached the modified program here. opcode_hist.zip.
For your convenience, I post a diff gist of opcode_hist.cu
.
In inject_funcs.cu
(gist), if I print this pointer, it is 0, which is null. And I cannot access the content in this pointer.
I hope this information can help! Thank you so much!
Because you should be passing:
nvbit_add_call_arg_const_val64(i, (uint64_t)d_histogram);
and not
nvbit_add_call_arg_const_val64(i, (uint64_t)*d_histogram);
Let me know if that fixes it.
Oops, I forgot to modify this snippet; what I tested (ran on my computer) is nvbit_add_call_arg_const_val64(i, (uint64_t)d_histogram);
.
The result I described before (the pointer is 0) is for the former one; the latter one (with *
) will just print segmentation fault
Sorry for this typo.
I attached the correct version here. opcode_hist.zip
You are allocating/freeing inside the launch (so that pointer changes all the time), while instead the instrumentation is passing a constant value at the moment of the instrumentation. Moving cudaMalloc/cudaFree inside nvbit_at_ctx_init/nvbit_at_ctx_term respectively should solve your problem. If you need really to allocate and free memory at launch time it is more complicated, but it can be done.
Ah, gotcha!
Can you give a hint on how to do it inside the kernel?
I will try to modify my algorithm so that I can keep a global table to record the data throughout the whole program. But my initial idea is to keep one table for each kernel so that I can have a fixed-size table in global memory. Just in case I still need a kernel scope table:-). Thanks!
look at mem_trace.cu example, in particular at nvbit_add_call_arg_launch_val64(instr, 0);`` and
nvbit_set_at_launch(ctx, p->f, (uint64_t)&grid_launch_id);```. Good luck, closing issue (as non issue).
In your NVBit-tool examples, you always store the statistical data in unified memory. For example, in the
opcode_hist
tool, you are using__managed__ uint64_t histogram[MAX_OPCODES]
to count the opcodes and pass it to inject function throughnvbit_add_call_arg_const_val64(i, (uint64_t)histogram)
.Since unified memory is expensive, I was wondering if we could allocate device memory (global memory, since shared/constant memory is not available) before a kernel is launched.
Now I am trying to allocate a "d_histogram" variable, and pass it to the injection function through the function nvbit_add_call_arg_const_val64(i, (uint64_t)d_histogram). But it doesn't seem to work: in the injection function, it reports "an illegal memory access."
Could you please confirm whether I can do that in order to debug my program?
Thank you in advance!