chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 421 forks source link

Accessing non-adjacent memory in a parallel statement #11935

Open damianmoz opened 5 years ago

damianmoz commented 5 years ago

To access elements in a column of anyt real matrix (except a tiny one), I would likely do

var xcolk = x[i..j, k];
....
// say
xcolk += v[i..j] * t;
....
x[i..j, k] = xcolk;

This has the advantage that I am working with a copy of the column stored contiguously in memory. It has the copy overhead, plus twice I am accessing memory all over the place (and probably doing naughty things outside the cache) but only just the two times.

Should I instead use a ref-erence

ref xcolk = x[i..j, k]

which, while I avoid the copy out and back in, potentially means I am going all over memory whenever I am doing things with xcolk? And also, that column may not be nicely aligned on a 128-bit boundary (for AVX-128 vector instructions or similar nice alignment issues if using vectorization)

mppf commented 5 years ago

@benharsh - you might know the answer to this question

benharsh commented 5 years ago

I suspect it would be best to create xcolk as a ref because it avoids array allocation and bulk copies of the temporary array and array slice. If through measurement you find that the ref pattern is slower, I would be interested in seeing the code to learn why that is, and whether we can further optimize our arrays to avoid problems in this case.

bradcray commented 5 years ago

A couple of other thoughts on this issue:

mppf commented 5 years ago

Of course, all of the above was factoring out the computation, and part of your concern is whether or not the column-wise striding is going to kill your performance in the cache or not.

It's also possible to make prefetch calls in Chapel when the CPU isn't understanding the data access pattern. The Prefetch package module should work for this.