Closed jeffdaily closed 4 years ago
Since this PR also reduces the asyncops size, it could replace #1261 .
Justification for all the numbers is needed.
@emankov
__global__ function parameters are passed to the device via constant memory and are limited to 4 KB.
From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-parameters .@emankov The most important change in this PR is the increase in the default kernarg buffer size. If needed, would such a change be acceptable without the other changes?
@jeffdaily, thank you for explanation. Could you please add just a few words in comments?
@emankov comments added in commit https://github.com/RadeonOpenCompute/hcc/pull/1377/commits/f0e2b40f13086f73565de6142a792f889f29b7b9.
Decrease HCC_ASYNCOPS_SIZE from 16k to 1k. HCC_KERNARG_BUFFER_SIZE is now an environment variable. HCC_KERNARG_POOL_SIZE is now an environment variable.