clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
838 stars 242 forks source link

Extend the clBLAS tuning executable to read a log file #2

Open kknox opened 11 years ago

kknox commented 11 years ago

In combination with issue #1, extend the tuning executable to read and parse the log file that is generated. The log file contains all the clBLAS functions and their parameters that a particular app needed or called into, and this allows the tuning executable to create a kernel database .kdb file that is specifically optimized for a particular application.

kknox commented 11 years ago

It should be the responsibility of the tuning program to eliminate tuning for redundant API calls, ignore warnings, errors and optimization hints.

fommil commented 10 years ago

@kknox hi, what are the tunings for? I'm finding it hard to get any documentation on this.

Are the tunings for GPU settings only, or is this to make decisions between GPU/CPU?

If the latter, I fear my project may be duplicating your work: https://github.com/fommil/multiblas and all that is really needed is a true CBLAS API into clBLAS.

kknox commented 10 years ago

Yes, the tuning executable tunes the clBLAS kernels only for GPU devices. At this point of time, we view the CPU OpenCL device more as a debug or development vehicle.

Most of the compute heavy OpenCL kernels (e.g. BLAS L3) in clBLAS are dynamically generated, meaning that you will not find the kernels in the source. When the user calls a GEMM routine the first time in a given process, an OpenCL kernel is stitched together and compiled on the fly. The generation of these kernels is parameterized, and the search space can be quite large. So, the clBLAS runtime makes a ‘best guess’ to keep the search space low as to an optimal kernel on the fly based on baked in parameters.

The Tuning tool is a brute force method to find the optimal parameters for a given device in a user’s machine, and save those parameter to disk such that they can be reused later. The database on the disk is a .kdb file. Running the Tune tool can take a long time, so it’s best to be run offline while the machine is not doing much else.

The tuning tool also offers the option to save the binary compiled kernels to disk, with the --store-kernels command line option. This will pre-compile the OpenCL kernels at tune time instead of at run-time, thereby saving the cost at runtime for a JIT compile of the OpenCL kernels.

byzhang commented 10 years ago

@kknox, how to use the tuned kernels?

mikyoreyes commented 7 years ago

I would also like to know how to use the .kdb file