Its been a while since we did any good performance profiling, and we'd like to know performance profile of existing master before doing performance comparison of the new shared memory parallel versions.
Additionally, I watched this stunning talk by Emery Berger on performance profiling (Performance Matters) where he talks about a few of his profiling tools that take the statistical approach we need to be using in our performance analysis.
Discussion:
What profilers?
Suggest all of:
grpof
We (at least I have) been using this from the beginning, leverages our existing experiences with parflow performance.
Profiler that eliminates effect so address space layout (which can apparently have incredible effects on performance, and can be sourced from benign artifacts, such as long/short usernames)
What test cases?
Suggest all of:
ClayL
RU-Conus
TFG-Conus
Big Sinusoidal
Deliverables:
Profiling results from parflow/parflow/master
Profiling results from hydroframe/ParFlow_PerfTeam/pf_cuda
With neither CUDA or OpenMP enabled
With CUDA enabled
With OpenMP enabled
(If OpenMP and CUDA be used together) With both CUDA and OpenMP enabled.
Done When:
Profiling results for the implementations and test cases are uploaded here.
Its been a while since we did any good performance profiling, and we'd like to know performance profile of existing master before doing performance comparison of the new shared memory parallel versions.
Additionally, I watched this stunning talk by Emery Berger on performance profiling (Performance Matters) where he talks about a few of his profiling tools that take the statistical approach we need to be using in our performance analysis.
Discussion:
Deliverables:
Done When: