chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

Improve performance of ISx #9622

Open ronawho opened 6 years ago

ronawho commented 6 years ago

ISx scalability pretty closely tracks the reference SHMEM version up to 256 locales, but raw performance is still ~40% behind:

isx-time

I believe this is partially due to overhead of full dynamic array registration and could also be a result of only using FMA under ugni instead of BTE. It's also possible we have extra comm compared to the reference version.

TODOs:

ronawho commented 6 years ago

An experimental branch that uses BTE for puts and forces heap-extensions (https://github.com/ronawho/chapel/tree/isx-perf) has promising performance that's on par with the reference at 256 locales:

isx-train

bradcray commented 6 years ago

@ronawho: Can this be closed now, or do you want to keep it open to track other potential improvements to ISx?

ronawho commented 6 years ago

I want to keep it open. I think there's more we can do (noinit on local arrays, minimize comm-counts in exchange, etc.) Given that performance is competitive with SHMEM, I don't think it's important to look into those in the near future, but I don't want to lose track of these ideas.