IBAMR / IBSAMRAI2

SAMRAI 2.4.4 fork and associated patches.
Other
1 stars 1 forks source link

Get rid of extra copies in Schedule.C #8

Closed drwells closed 1 year ago

drwells commented 1 year ago

Follow-up to #3 (in particular f6f96894b0ad114222f80467ab89b3562adcfb95)

This is motivated by a few observations:

  1. With AMR we spend 20-30% of our time communicating with the right ventricle model
  2. Without AMR we spend about 10% of our time communicating with the same model
  3. Even with just a basic NSE solver and 2 processors we still spend 10% of our time communicating

The majority of this problem can be fixed by generating parallel data distributions that are less bad (we still have large workload imbalances with AMR). Ultimately, it would be great to devise some new load-balancing algorithms that solve an integer program to evenly distribute cells per processor while minimizing the total amount of ghost data. In the mean time, though, a fairly easy win is to make communication about twice as fast by getting rid of extra copies. This bit of ArrayData.C is revealing:

https://github.com/IBAMR/samrai-2.4.4/blob/787ab77ebb0c59463deced8ef693df56ad935820/source/patchdata/array/ArrayData.C#L457-L486

i.e., we do an extra copy to avoid a virtual function call, which is not a good performance tradeoff these days. We could also just use templates in a smarter way to get rid of some virtual functions but for the most part we should just copy directly into buffers when possible.

drwells commented 1 year ago

It might also be more efficient to get rid of the specialized local copies function and just route everything through MPI.

boyceg commented 1 year ago

New MPIs should optimize for this kind of thing.

boyceg commented 1 year ago

See #10.

drwells commented 1 year ago

Fixed by #10.