eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
64 stars 14 forks source link

miniapp communication #1105

Closed rasolca closed 6 months ago

rasolca commented 7 months ago
msimberg commented 7 months ago

@rasolca looks reasonable to me. Just to check my understanding, what you're benchmarking is a broadcast from rank 0 to all other ranks:

Probably my ignorance, but do you need ensure that each rank will do as many receives as the root rank does sends? Or since it's a broadcast, do you not need to do a receive on every rank?

rasolca commented 7 months ago

@rasolca looks reasonable to me. Just to check my understanding, what you're benchmarking is a broadcast from rank 0 to all other ranks:

* with the default options for contiguous/non-contiguous

* with all the combinations of CPU/GPU memory, contiguous/non-contiguous send/recv

Exactly. From the given backend B (which decides where the data is located) I try all the available combinations.

Probably my ignorance, but do you need ensure that each rank will do as many receives as the root rank does sends? Or since it's a broadcast, do you not need to do a receive on every rank?

The idea is to allocate a local matrix with the same size on all the ranks. I still have a couple of open TODO when creating the matrix. It might be that I'm still using a distributed matrix by mistake.

rasolca commented 7 months ago

Summary of the first benchs (A100):

rasolca commented 7 months ago

Note: Compilation of the new miniapp is very slow.

rasolca commented 7 months ago

cscs-ci run

rasolca commented 7 months ago

cscs-ci run

msimberg commented 7 months ago

Note: Compilation of the new miniapp is very slow.

Likely due to the same reason as #1013. I have been getting increasingly annoyed by compile times of the other miniapps recently as well, though I don't think it's necessarily gotten worse. It may be worth bumping this up my to do list, or if someone else feels motivated they could look into it.

rasolca commented 6 months ago

cscs-ci run