Follow-up to #3 (in particular f6f96894b0ad114222f80467ab89b3562adcfb95)
This is motivated by a few observations:
With AMR we spend 20-30% of our time communicating with the right ventricle model
Without AMR we spend about 10% of our time communicating with the same model
Even with just a basic NSE solver and 2 processors we still spend 10% of our time communicating
The majority of this problem can be fixed by generating parallel data distributions that are less bad (we still have large workload imbalances with AMR). Ultimately, it would be great to devise some new load-balancing algorithms that solve an integer program to evenly distribute cells per processor while minimizing the total amount of ghost data. In the mean time, though, a fairly easy win is to make communication about twice as fast by getting rid of extra copies. This bit of ArrayData.C is revealing:
i.e., we do an extra copy to avoid a virtual function call, which is not a good performance tradeoff these days. We could also just use templates in a smarter way to get rid of some virtual functions but for the most part we should just copy directly into buffers when possible.
Follow-up to #3 (in particular f6f96894b0ad114222f80467ab89b3562adcfb95)
This is motivated by a few observations:
The majority of this problem can be fixed by generating parallel data distributions that are less bad (we still have large workload imbalances with AMR). Ultimately, it would be great to devise some new load-balancing algorithms that solve an integer program to evenly distribute cells per processor while minimizing the total amount of ghost data. In the mean time, though, a fairly easy win is to make communication about twice as fast by getting rid of extra copies. This bit of ArrayData.C is revealing:
https://github.com/IBAMR/samrai-2.4.4/blob/787ab77ebb0c59463deced8ef693df56ad935820/source/patchdata/array/ArrayData.C#L457-L486
i.e., we do an extra copy to avoid a virtual function call, which is not a good performance tradeoff these days. We could also just use templates in a smarter way to get rid of some virtual functions but for the most part we should just copy directly into buffers when possible.