Open goofyseeker311 opened 3 days ago
A second question, why is java lwjgl opencl simple math calculations much slower on nvidia discrete gpus than even amd cpus and igpus, by like 2-4x (multiplication and float4[] matrix multiplication). nvidia opencl cuda running the same opencl program is almost as fast as java auto-vectorized code on cpu. only taking account time taken to run clEnqueueNDRangeKernel() and clFinish(). all data is pre-uploaded and clFinish() before starting the benchmark run.
what is wrong with the opencl/cuda, it gets about 1/1000 floating point operations of what it should be getting. say 2gflops instead of 0.7-2tflops for cpu. and 20gflops instead of 20tflops, for a gpu. yep doing plain C=A*B float multiplications for arrays. or float4 array multiplications with matrix shaped array.
How can you get an long type event, from PointerBuffer, to be used for event profiling for NDRangeEnqueued kernel running. there is no overload for PointerBuffer type of clGetEventProfilingInfo, just the long event types. also the NDRangeEnqueue function only accepts PointerBuffer events, not long type of events.
In other words, how can you do kernel runtime start-end time profiling from lwjgl.
Hey @goofyseeker311,
The cl_event * event
parameter of clEnqueueNDRangeKernel
is an output parameter. If you pass a PointerBuffer
there, when the call returns a cl_event
value will have been written to it. Example code:
PointerBuffer pe = ...; // cl_event *
clEnqueueNDRangeKernel(..., pe);
long e = pe.get(0); // cl_event
clGetEventProfilingInfo(e, ...);
yes. (so how to get the profiling start/end times out of the event. instead of using the code below.)
nvm. somehow I was not able to get that pe.get(0); stuff working before. whatever I did wrong.
previous code looked like this:
PointerBuffer event = clStack.mallocPointer(1);
if (CL12.clEnqueueNDRangeKernel(clQueue, clKernel, dimensions, null, globalWorkSize, null, null, event)==CL12.CL_SUCCESS) {
long ctimestart = System.nanoTime();
CL12.clWaitForEvents(event);
long ctimeend = System.nanoTime();
float ctimedif = (ctimeend-ctimestart)/1000000.0f;
}
edit: new code looks like this:
PointerBuffer event = clStack.mallocPointer(1);
if (CL12.clEnqueueNDRangeKernel(clQueue, clKernel, dimensions, null, globalWorkSize, null, null, event)==CL12.CL_SUCCESS) {
CL12.clWaitForEvents(event);
long eventLong = event.get(0);
long[] ctimestart = {0};
long[] ctimeend = {0};
CL12.clGetEventProfilingInfo(eventLong, CL12.CL_PROFILING_COMMAND_START, ctimestart, (PointerBuffer)null);
CL12.clGetEventProfilingInfo(eventLong, CL12.CL_PROFILING_COMMAND_END, ctimeend, (PointerBuffer)null);
float ctimedif = (ctimeend[0]-ctimestart[0])/1000000.0f;
Question
what is with the amd ryzen 5000 series cpus not showing up as opencl devices on windows 11? nvidia gpus and amd igpus show up just fine in the CLDemo java program. where is the issue?
self-answer: downloading and installing the intel opencl runtime for cpu works for amd cpus too. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-cpu-runtime-for-opencl-applications-with-sycl-support.html