Closed al42and closed 6 months ago
Thanks!
BTW, do you mind sharing how close is the ambitious Multiple GPU job splitting? Are you considering only intra-node, or also multi-node decomposition?
Not close, it is not the goal of my PhD, so I haven't actually spent time on it at all. The decomposition techniques should be similar to the single GPU, I have a rough idea of how the data movement needs to be handled with a level structure similar to shared-global-intranode-multinode. It just will take a lot of effort and time to understand how GPU-to-GPU communications work best for all vendors (unless it is done through simple MPI with host copy, but this is surely bad).
There are also many design questions that I am unsure about, like how data needs to be distributed between nodes before and after the transform. It would have been more productive if it was a more defined request (coming from a vendor, for example) and not just an exploration.
Thanks for the explanation. We (GROMACS) have interest if multi-GPU R2C FFT, but we sadly don't have resources to actively drive the project.
Intel default installation on Ubuntu puts LevelZero headers in
/usr/include/level_zero/ze_api.h
.