Open JaredCrean2 opened 7 years ago
Do you have a sense of how many unnecessary communications there might be? Is the effort of developing a cache invalidation worth it?
There will be 2 more parallel communications than necessary in each predictor loop iterations. If that is a lot, I'm not sure. There is one parallel communication per Newton iteration (corrector step). So if Newton typically takes 2 iterations, its a lot, but if Newton typically takes 10, then its not a big deal. I was hoping the cache invalidation would be really easy, but that was before I realized it had to be globally consistent.
I don't think there is a way to do this automatically. We could have a flag, set manually, that tells the the parallel code that there is no need to actually do communication. This approach is potentially error prone.
Working on the predictor-correct globalization, I noticed that the algorithm will do parallel communication more often than necessary because of repeated calls to
calcResidual
. Basically, the problem iscalcResidual
doesn't know thateqn.q_vec
hasn't been updated since the last time it was called, so it does parallel communication to make sureeqn.q_face_send
andeqn.q_face_recv
are up to date. Some kind of cache invalidation mechanism for the send and receive buffers would avoid the extra parallel communication.The cache invalidation would have to be consistent across all processes. Ick.