Closed fthaler closed 1 week ago
Hi there, this is jenkins continuous integration... Do you want me to verify this patch?
launch jenkins
launch perftests
launch jenkins
launch jenkins
launch jenkins
launch perftest
launch jenkins
launch perftest
launch perftest
All tests passed, apart from ault/HIP which is offline.
launch perftest
launch perftest
launch perftest
launch perftest
launch jenkins
launch perftest
launch perftest
launch jenkins
launch jenkins
launch perftest
Implements loop blocking for the GPU fn backend. Thread block size (that is, CUDA/HIP threads per block) and loop block size (that is, loop iterations per CUDA/HIP thread) can now be specified as template parameters.
Further changes:
__launch_bounds__
in the fn GPU kernel based on the thread block size.GT_PROMISE
.Performance changes:
__launch_bounds__
affects performance of thefn_cartesian_vertical_advection
benchmark significantly (positively or negatively, depending on domain size).