Currently instrumentation process changes kernel arguments and adds some initialization code to start of kernel functions. This means if we just try to call kernel functions from kernel code we would need to make big changes to instrumentation code.
Separate kernels to multiple parts, where other part can be called as helper functions but initializations must be done only when kernel is called from host code.
Also each kernel might contain also local address space memory allocations and in that case OpenCL says that functionality is implementation specific if kernels are called by kernels.
This means basically that we would need to normalize behavior of local memory allocation in a way that local memory is allocated only when kernels are called by host code and to pass this information to internally callable kernel version later.
i.e. we would need to write normalization pass that separates kernels to have two implementations depending if they are called from host or internally.
kernel bar(local float* out_local) {
local int local_bar;
local_bar = 1;
}
kernel foo() {
local float local_foo;
bar(&local_foo);
}
would need to be modified to something like:
kernel bar_internal(local float* out_local, local int* local_bar) {
*local_bar = 1;
}
kernel bar(local float* out_local) {
local int local_bar;
bar_internal(out_local, &local_bar);
}
void foo_internal(local float* local_foo, local int* local_bar) {
bar_internal(local_foo, local_bar);
}
kernel foo() {
local float local_foo;
local int local_bar;
foo_internal(&local_foo, &local_bar);
}
This feature would require 2-4 weeks to be implemented and we can't have this ready by end of January.
Currently instrumentation process changes kernel arguments and adds some initialization code to start of kernel functions. This means if we just try to call kernel functions from kernel code we would need to make big changes to instrumentation code.
Separate kernels to multiple parts, where other part can be called as helper functions but initializations must be done only when kernel is called from host code.
Also each kernel might contain also local address space memory allocations and in that case OpenCL says that functionality is implementation specific if kernels are called by kernels.
This means basically that we would need to normalize behavior of local memory allocation in a way that local memory is allocated only when kernels are called by host code and to pass this information to internally callable kernel version later.
i.e. we would need to write normalization pass that separates kernels to have two implementations depending if they are called from host or internally.
would need to be modified to something like:
This feature would require 2-4 weeks to be implemented and we can't have this ready by end of January.