NCAR / icar

The Intermediate Complexity Atmospheric Research model (ICAR)
MIT License
72 stars 53 forks source link

enhancement: better performance for wind 3 u,v exchange #164

Closed scrasmussen closed 1 year ago

scrasmussen commented 1 year ago

TYPE: enhancement

KEYWORDS: halo exchange, wind 3 option

SOURCE: Soren Rasmussen's implementation and testing, Ethan's idea

DESCRIPTION OF CHANGES: More efficient halo communication for the u,v exchange. North/south communication and then east/west communication is performed instead of doing all u,v halo communication twice. The last PR #163 switched to performaing u,v halo communication twice to fix a bug. This had a hit on performance, while for the new PR the performance difference in negligible.

TESTS CONDUCTED: Changes tested on the wind 3 option. When the two "smooth_array" lines are commented out the output of 32 vs 36 num processors is the same. I took a closer look at outputs at one of the corners and it looked correct.

The "second PR" in the table is #163, the "original 1" is the develop branch before #163 was applied. The "new PR" is this PR. This shows that the new u,v enhancement fixes the bug and is a lot quicker than the old commit since the amount of data movement done is greatly reduced.

uv_exchange_improvement

Checklist

Merging the PR depends on following checklist being completed. Add X between each of the square brackets if they are completed in the PR itself. If a bullet is not relevant to you, please comment on why below the bullet.

gutmann commented 1 year ago

Thanks Soren, did you confirm that this approach also fixes the bug?

 Ethan

scrasmussen commented 1 year ago

Thanks Soren, did you confirm that this approach also fixes the bug? Ethan

I did confirm that. I did that by commenting out the smooth_array lines from here and here. I then ran the model for 25 hours and compared the output of u and v for 32np and 36np. The output was identical.

I then did two pairs of runs with smooth_array present and then commented out. The executables ran were this PR and the last PR, so as to see if the differing fix caused any changes in output. It was the same.