eth-cscs / conflux

Distributed Communication-Optimal LU-factorization Algorithm
BSD 3-Clause "New" or "Revised" License
12 stars 3 forks source link

Smaller Broadcast of A00 #24

Closed saethrej closed 3 years ago

saethrej commented 3 years ago

We noticed that a lot of time in the code was spent on broadcasting the new A00, especially for larger values of P like 1024. In an iteration, A00 is only required at a processor if this processor owns tiles in A10 that are to be updated. However, for such very large P, often a large fraction of processors do not have any local tiles in A10, hence the broadcast is unnecessary.

This PR thus introduces new functionality where communicators are created at initialization time for every power of two, which thus corresponds to the depth of the broadcast tree.