Open diverger opened 4 months ago
There are a few ways.
First of all, you could modify the tuner's file, e.g. CLBlast/src/tuning/kernels/xgemm.hpp
and reduce the number of parameters in settings.parameters
in multiple places, e.g. change {16, 32, 64}
into {16, 32}
for example.
Secondly, you could change the --fraction
command-line argument (of e.g. clblast_tuner_xgemm
) to something below 1.0 to not test everything.
Thirdly, you could tune only for the precision you need, e.g. single-precision (32
) float only, and skip the other tuners. Basically make alltuners
first compiles everything and then runs all the tuners (e.g. ./clblast_tuner_xgemm --precision 32
) for all precisions after each other.
Lastly, for GEMM specifically there are 4 parts being tuned (from CLBlast/src/tuning/kernels/xgemm.cpp
):
printf("* (1/4) Tuning main GEMM kernel (GEMMK == 0) for fixed set of parameters\n\n");
StartVariation<1>(argc, argv);
printf("* (2/4) Tuning main GEMM kernel (GEMMK == 0) for random parameters out of larger set\n\n");
StartVariation<2>(argc, argv);
printf("* (3/4) Tuning secondary GEMM kernel (GEMMK == 1) for fixed set of parameters\n\n");
StartVariation<11>(argc, argv);
printf("* (4/4) Tuning secondary GEMM kernel (GEMMK == 1) for random parameters out of larger set\n\n");
StartVariation<12>(argc, argv);
You could skip steps 2/4 and 4/4 to save time.
Can I achieve these by modifying the CMakefileList.txt?
No, I don't think so.
Hi, When running the 'make alltuners' on a Mali GPU, some tunes run hours long. And finally it stuck there and never return. Are there any methods to speed up?