chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.76k stars 414 forks source link

investigate why FAMOs are faster than AMOs on EFA #19204

Open jhh67 opened 2 years ago

jhh67 commented 2 years ago

@ronawho measured the performance of fetching vs. non-fetching AMOs on EFA and got the following results.

01/31/22        3.82428 0.00261487      SUCCESS # 19001 non-fetching w/ CHPL_COMM_FAMO_AM
01/31/22        2.8571  0.00350005      SUCCESS # 19001     fetching w/ CHPL_COMM_FAMO_AM

Both types of AMOs on EFA are implemented using AMs. In the FAMO case, the result is returned by injecting a result AM that writes the result and sets the "done" flag, so it doesn't make sense that a FAMO would be faster than an AMO.

jhh67 commented 2 years ago

After giving this some thought the difference between a FAMO implemented with an AM and an AMO is the AMO sets the "done" flag via PUT whereas the FAMO does it via the AM. I did an experiment in which I also set the AMO "done" flag via an AM and bingo, the performance of FAMO and AMO are pretty much the same:

FAMO  3.03702    0.00329271
AMO   3.03908    0.00329047

This was hacked up for the experiment, I'll do a proper implementation.