Open ronawho opened 6 years ago
An experimental branch that uses BTE for puts and forces heap-extensions (https://github.com/ronawho/chapel/tree/isx-perf) has promising performance that's on par with the reference at 256 locales:
@ronawho: Can this be closed now, or do you want to keep it open to track other potential improvements to ISx?
I want to keep it open. I think there's more we can do (noinit on local arrays, minimize comm-counts in exchange, etc.) Given that performance is competitive with SHMEM, I don't think it's important to look into those in the near future, but I don't want to lose track of these ideas.
ISx scalability pretty closely tracks the reference SHMEM version up to 256 locales, but raw performance is still ~40% behind:
I believe this is partially due to overhead of full dynamic array registration and could also be a result of only using FMA under ugni instead of BTE. It's also possible we have extra comm compared to the reference version.
TODOs: