Open hibagus opened 2 years ago
Hey @hibagus, thanks for your interest in NVBench and for reaching out! We'll be happy to help.
In your template:
template<typename Gemm, typename scalePrecision, typename mulPrecision, typename accPrecision>
int gemm_cutlass_launch_int(nvbench::state& state)
Are the Gemm, scalePrecision, mulPrecision, accPrecision
template parameters things you hope to sweep across a variety of types using nvbench? Or will these be fixed for a particular benchmark invocation?
Hi @jrhemstad, thanks for the reply.
Currently, my implementation will not use the type sweep on NVBench (i.e., not using NVBENCH_BENCH_TYPES
) so the template parameters will be fixed for a particular benchmark invocation.
Actually, I have tried using NVBENCH_BENCH_TYPES
to invoke the benchmark as follows:
template<typename Gemm, typename scalePrecision, typename mulPrecision, typename accPrecision>
int gemm_cutlass_launch_int(nvbench::state& state, nvbench::type_list<Gemm, scalePrecision, mulPrecision, accPrecision>)
Then:
using gemm_types = nvbench::type_list<Gemm>;
using scalar_types = nvbench::type_list<scalePrecision>;
using multiply_types = nvbench::type_list<mulPrecision>;
using accumulation_types = nvbench::type_list<accPrecision>;
NVBENCH_BENCH_TYPES(gemm_cutlass_launch_int, NVBENCH_TYPE_AXES(gemm_types, scalar_types, multiply_types, accumulation_types));
NVBENCH_MAIN_BODY(gargc_nvbench, gargv_nvbench);
I am not sure if I did it correctly or not. The compilation shows error message as follows:
error: a template declaration is not allowed here
Currently, my implementation will not use the type sweep on NVBench (i.e., not using NVBENCH_BENCH_TYPES) so the template parameters will be fixed for a particular benchmark invocation.
Okay, that makes sense. Then yeah, what you originally had won't work just because of how the preprocessor works in C/C++, it doesn't like commas when using a macro.
The good news is that there's an easy workaround. You can just wrap your template instantiation in an extra set of parenthesis:
NVBENCH_BENCH( (gemm_cutlass_launch_int<Gemm, scalePrecision, mulPrecision, accPrecision>) );
Example: https://godbolt.org/z/xzocY9Ex3
Hi @jrhemstad
I have tried that before by enclosing the brackets, but it still gives me an error as follows;
In file included from /home/bagus/CUDA_Bench/libs/nvbench/include/nvbench/nvbench.cuh:24,
from /home/bagus/CUDA_Bench/include/CUDA_Bench/gemm/gemm_cutlass_launch_int.cuh:12,
from /home/bagus/CUDA_Bench/src/gemm/gemm_cutlass_launch_int.cu:1:
/home/bagus/CUDA_Bench/src/gemm/gemm_cutlass_launch_int.cu:324:110: error: pasting ")" and "_line_" does not give a valid preprocessing token
324 | NVBENCH_BENCH( (gemm_cutlass_launch_int<gemm_types, scalar_types, multiply_types, accumulation_types>) );
| ^
/home/bagus/CUDA_Bench/libs/nvbench/include/nvbench/callable.cuh:58:60: note: in definition of macro ‘NVBENCH_UNIQUE_IDENTIFIER_IMPL2’
58 | #define NVBENCH_UNIQUE_IDENTIFIER_IMPL2(prefix, unique_id) prefix##_line_##unique_id
| ^~~~~~
/home/bagus/CUDA_Bench/libs/nvbench/include/nvbench/callable.cuh:55:43: note: in expansion of macro ‘NVBENCH_UNIQUE_IDENTIFIER_IMPL1’
55 | #define NVBENCH_UNIQUE_IDENTIFIER(prefix) NVBENCH_UNIQUE_IDENTIFIER_IMPL1(prefix, __LINE__)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/bagus/CUDA_Bench/libs/nvbench/include/nvbench/callable.cuh:34:37: note: in expansion of macro ‘NVBENCH_UNIQUE_IDENTIFIER’
34 | NVBENCH_DEFINE_CALLABLE(function, NVBENCH_UNIQUE_IDENTIFIER(function))
| ^~~~~~~~~~~~~~~~~~~~~~~~~
/home/bagus/CUDA_Bench/libs/nvbench/include/nvbench/create.cuh:31:3: note: in expansion of macro ‘NVBENCH_DEFINE_UNIQUE_CALLABLE’
31 | NVBENCH_DEFINE_UNIQUE_CALLABLE(KernelGenerator); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/bagus/CUDA_Bench/src/gemm/gemm_cutlass_launch_int.cu:324:9: note: in expansion of macro ‘NVBENCH_BENCH’
324 | NVBENCH_BENCH( (gemm_cutlass_launch_int<gemm_types, scalar_types, multiply_types, accumulation_types>) );
Any suggestions?
Ah, this looks to be unique to some of nvbench's internal macro shenanigans underlying NVBENCH_BENCH
.
I don't think using NVBENCH_TYPE_AXES
when you don't intend to sweep over those parameters is going to be the right solution. That said, I don't know why the example you showed doesn't work. It seems like it should.
One approach that likely isn't very satisfying would be to wrap invoking your template instantiation in another function that isn't a template:
int do_benchmark(nvbench::state& state){
gemm_cutlass_launch_int<Gemm, scalePrecision, mulPrecision, accPrecision>(state);
}
NVBENCH_BENCH(do_benchmark);
I'd have to defer to @allisonvacanti for a more clever solution than this.
Thanks, @jrhemstad . I use that workaround for now, although it is not that convenient :)
Using single-element typelists should work. Can you share the full test case? It sounds like something odd is going on:
NVBENCH_BENCH_TYPES(gemm_cutlass_launch_int, NVBENCH_TYPE_AXES(gemm_types, scalar_types, multiply_types, accumulation_types));
NVBENCH_MAIN_BODY(gargc_nvbench, gargv_nvbench);
These two macros shouldn't be used from the same scope. NVBENCH_BENCH_TYPES
should be used from global scope, while NVBENCH_MAIN_BODY
should be used from function scope. Maybe you wanted NVBENCH_MAIN
instead?
Hi @allisonvacanti
Our project is accessible on Github. This is how we plan to integrate bench to our project. When I replace NVBENCH_BENCH
with NVBENCH_BENCH_TYPES
, I get the following error message: error: a template declaration is not allowed here
That's why I use NVBENC_MAIN_BODY
alongside NVBENCH_BENCH
since I would like it to use in the function scope.
Ah, ok. That's not how these macros are intended to be used -- I'm honestly surprised that this pattern works with NVBENCH_BENCH
:-)
Take a look through the examples. The NVBENCH_BENCH*
macros should be used at global scope, defining the benchmarks inside a function is not supported.
You can restrict the benchmarks that are executed at runtime by configuring argc
and argv
with the relevant -b
and -a
options and then call NVBENCH_MAIN_BODY(argc, argv)
.
Hi there,
I would like to integrate nvbench on my C++ apps. The method that runs the GPU kernel is a template method as follows.
Then, I pass the method using template arguments to NVBENCH_BENCH as follows.
It gives me error as follows:
Seems like the MACRO does not like "comma" on the template arguments. I've read this and this, but none of them are working.
Any help would be highly appreciated.
Thanks!