CUBoulder-HPCPerfAnalysis / memory

Experiments with memory performance
MIT License
2 stars 7 forks source link

about stencil.c #12

Open fdkong opened 9 years ago

fdkong commented 9 years ago

Hi Jed,

I think there are two ways to improve the performance of the code. (1) Change G-S to Jacobi. (2) Continue chaining Jacobi to a block version.

Fande,

jedbrown commented 9 years ago
  1. That is likely true (if it optimizes) -- demonstrate it.
  2. Would a block version execute faster or merely do more flops?

What is the fastest you can make this execute? You might find the output of -ftree-vectorize -fopt-info-vec-missed (with recent gcc) interesting. Does vectorization matter in this case? Can you trust what the compiler is saying?