Closed jonmdev closed 2 months ago
Hi @jonmdev,
I can give you some more debugging tricks on hand:
if(get_global_id(0)==0) printf("...", ...);
in the OpenCL kernel to see what value a variable has. See here for printf
formatting./*...*/
multiline comment. Still crashing? --> Comment out last 3/4 of the kernel. Repeat subdivision until you find the one faulty line.A very nasty bug is when data types mismatch between Memory<type>
on host code and kernel(type* ...)
on device side. I've already built-in a check that will error for mismatching type sizes, like float
/double
, and for mismatching number/order of kernel parameters between host/device side. But it can't detect mismatching types of the same size, such as uint
/float
.
Kind regards, Moritz
Again, I can't thank you enough for this project and all your many replies and posts on StackExchange and even Reddit which I have found while researching how to do basic things. I now have a working implementation of my project, and only 8 days after I first started. To get up and running with OpenCL and convert a project over in only 8 days is a testament to your good design and explanations.
I just have one more question.
It is challenging to see what is happening inside the Kernels. For example, if you access
whatever[i]
and[i]
is not in range, you will typically get errors in Visual Studio, but the Kernel says nothing if you do this inside it.It is hard to also see what points and
if
etc. are being hit. My best idea was the following:1) Create Memory & Memory objects and pass into Kernel:
For example, in Kernel design, add the following parameters:
Here
debugChar
anddebugFloat
areMemory<char>(device, maxDebugChar)
&Memory<float>(device, maxDebugChar)
.dbgIndexC
anddbgIndexF
areMemory<int>(device, 1)
andMemory<int>(device, 1)
as indexes each initialized to0
so you can increment globally an index with each new addition per kernel run.2) Use inside Kernel:
I have found the
debugFloat
most helpful as it maintains chronology to just use one buffer and putting in strings is too hard aschar
. Putting in floats or ints aschar
is also too hard.So for example, you can do:
Or alternatively, you can try for char, but this is very tedious, and since you can't add floats/doubles/ints into the char array easily it is less useful:
3) Get and print out the Debug Info:
After the Kernel runs, run a function to process and print out the debug info in whatever way the system needs. Like for example:
Ideas?
That was my best idea and it works at least okay. Without it I could never have figured out how to use the kernels or how they were allocating into workgroups etc.
However, it can also crash the Kernel causing it to hang for 400 ms which I presume is the device timeout and then OpenCL just stops responding to future requests. Ie. this is not being "workgroup safe."
I presume this is being triggered when multiple workgroups all try to write to the same debug index/array at once. So this method is not exactly good or needs to be improved though it is at least somewhat useful.
Additionally, besides compilation errors, there are still no obvious good ways I can think of to be alerted if you do something wrong, like outside range attempts to read something, and it is hard to find Kernel code mistakes. Eg. trying to read inside the kernel
dbgIndexF[-1]
(which doesn't exist) creates no error. Interestingly, this returns0
for me when I try to debug out the value using the method above, ie:However, I presume this is just "undefined behavior". I only caught some mistakes I made by copying my kernel out and rephrasing it into regular code and running it on the CPU to see what would happen.
You have obviously been at this longer than me and understand the system better.
I am just wondering if you have come up with any different or better methods for (1) Debugging things out, and (2) Catching Kernel code errors.
Thanks for any thoughts as usual, and thanks again for letting me get into GPU work so quickly and (relatively) painlessly. 🙂