issues
search
cp2k
/
dbcsr
DBCSR: Distributed Block Compressed Sparse Row matrix library
https://cp2k.github.io/dbcsr/
GNU General Public License v2.0
134
stars
46
forks
source link
ocl: revised device-split, additional tuning param, and other improvements
#718
Closed
hfp
closed
9 months ago
hfp
commented
9 months ago
Split into maximum number of sub-devices if ACC_OPENCL_DEVSPLIT=1
If 1<ACC_OPENCL_DEVSPLIT always split according to CL_DEVICE_PARTITION_EQUALLY.
Prioritize CL_DEVICE_AFFINITY_DOMAIN_NUMA if ACC_OPENCL_DEVSPLIT=0|1.
Experimental support for XF. Remove XF=0 when storing JSON.
Handle extension flag (XF) with currently only one state-bit.
Other improvements/changes
Check finally tuned kernel even when handling signal (recursion).
Adjusted built-in default (OPENCL_LIBSMM_SMM_AL).