Hotfix for improving the round-robin factorization for TileBlockDecomposition when there are odd prime factors (and making it a lot easier to read and understand).
Old behavior for nproc 64000: nproc_x 2560 nproc_y 5 nproc_z 5
New behavior for nproc 64000: nproc_x 40 nproc_y 40 nproc_z 40
(as you might imagine I learned this the hard way on Frontier, but luckily I was watching and killed the job relatively quickly.)
The other commit adds export ROCFFT_RTC_CACHE_PATH=/dev/null according to:
Hotfix for improving the round-robin factorization for TileBlockDecomposition when there are odd prime factors (and making it a lot easier to read and understand).
Old behavior for nproc 64000: nproc_x 2560 nproc_y 5 nproc_z 5 New behavior for nproc 64000: nproc_x 40 nproc_y 40 nproc_z 40
(as you might imagine I learned this the hard way on Frontier, but luckily I was watching and killed the job relatively quickly.)
The other commit adds
export ROCFFT_RTC_CACHE_PATH=/dev/null
according to:https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#olcfdev-1479-rocfft-default-cache-file-is-in-nfs-shared-directory-scalability-issue
TIO link for playing with the new function