Open jeffhammond opened 8 years ago
One of our team members is actively working on porting some PRKs (along with a few other non-PRK tasks) and understanding performance gaps. Most of his work has been on stencil so far (shared-memory only, so far, I think?) though I believe he's also started in on transpose this week. So far I think things have been clear enough on the PRK side of things and we won't hesitate to contact you, Rob, and Ulf when questions come up.
In case anyone from the internet comes along and finds this issue, please see https://github.com/chapel-lang/chapel/tree/master/test/studies/prk for now.
Someone should bug the authors of this paper: Comparative Performance and Optimization of Chapel in Modern Manycore Architectures
They should have a few PRKs, but I dont know where they are.
@ian-bertolacci : We've been (slowly) working with Engin to get copies of his PRKs, review them, and add them to the test/studies directory Jeff mentions above.
https://github.com/e-kayrakli/chapel_prk are the droids you are looking for.
@e-kayrakli may have additional comments.
My repo is not as reliable as the ones that are in the Chapel repo -- https://github.com/chapel-lang/chapel/tree/master/test/studies/prk . Even the versions that got merged to the Chapel master was slightly different then the ones I have there (and they have seen couple revisions later). And I still make changes to my repo willy-nilly. Frankly, I forgot I left it public :)
@e-kayrakli @bradcray It would be great if someone could put together a PR of p2p, stencil, and transpose, at least. I will likely do an in-depth study of the "big 3" PRKs for shared-memory in the near future, and I'd like to be able to include Chapel here.
Every time I look at the various Chapel PRK implementations, I can never figure out which ones are the latest and best.
I think p2p would be the one that requires the most work among those. I'd say the other two are in a much better shape, at least for shared memory studies.
FWIW, a to-do list for PRKs in Chapel: https://github.com/chapel-lang/chapel/issues/6162
You might look at my recent p2p ports eg in the Cxx11 folder. If Chapel supports task dependencies similar to OpenMP, then it’s really easy to write a shared-memory implementation that performs well. There’s also a data-parallel version that loops over anti-diagonals. Neither of these is suitable for distributed memory though.
@bradcray Please let me know how we can help here.