This repository will be used for experiments with memory performance.
Modern CPU architectures have several levels of cache, each with different semantics.
Create a GitHub account, clone this repository, and configure your Git environment
git clone https://yourname@github.com/CUBoulder-HPCPerfAnalysis/memory.git
git config --global user.name 'Your Name'
git config --global user.email your.email@colorado.edu
Feel free to use SSH instead on HTTPS. If you use bash, I recommend downloading and sourcing git-prompt and git-completion.bash. Git resources:
make CC=gcc CFLAGS='-O3 -march=native' stream
./stream
Implement a "dot product" test (x ⋅ y = sum_i x_i y_i) and commit your changes.
results/yourname.csv
with the format # Username, Machinename, CPU name, CPU GHz, CPU Cores, CPU Cores used, Private cache/core (MB), Shared cache (MB), Array Length (MB), Peak MB/s, Copy MB/s, Scale MB/s, Add MB/s, Triad MB/s, Dot MB/s
jed, batura, i7-4500U, 1.8, 2, 1, 0.256, 4.096, 76.3, 25600, 19555, 12784, 13483, 13677, ?
...
Report the "best rate".
On Linux systems, look in /proc/cpuinfo
.
For theoretical peak bandwidth, try http://ark.intel.com.
Leave a ?
for missing data (but try not to have missing data).
Commit this new file:
git add results/yourname.csv
git commit
Use the commit message to explain your workflow creating the results file. We're going to add more tests in the future, so automation would be good here (collaboration encouraged).
Make a GitHub Fork this repository on GitHub, push changes to your fork, and submit a pull request to the main repository.
There may be ambiguities in the specification. If you spot any, open an issue. Also open issues for ways to improve the workflow and provenance for running experiments and reporting data.
It is useful to have a modern statistical environment for analyzing and plotting data.
The R Project is a widely used open source statistical package that compares well with commercial packages and has a great user repository (new statistical methods tend to show up here first).
Unfortunately, the R language has some shortcomings and is not general purpose.
Pandas is an up-and-coming Python package that provides a "data frame", a suite of common statistical tools, and plotting similar to R.
I recommend Pandas for this class, but welcome you to use any package you feel comfortable with.
Experiment with plotting interesting relationships using the stream-analyze.py
script.
The Pandas visualization documentation may be useful, as is the IPython interpreter.
Prefetchers like to follow contiguous memory streams.
What happens to performance if we interleave threads?
The block-cyclic mapping of the range 0,1,...,N-1
defined by j(i) = (i*b)%N + (i*b)//N
may be useful.
What happens if many threads try to write to the same cache line?
Can you measure the effect of false sharing (longer article), sometimes called "cache line contention".
Design an experiment to test cache behavior with multiple threads, run it to produce data, and make a plot using Pandas, R, or another plotting system. Commit the source code changes, your data, the plotting script, and any figures you produce. Describe what your experiment is testing and your interpretation of the data and figures in your commit message and submit as a pull request. Plan to present these results (~5 minutes each) next class period (Wednesday, 2015-01-21).
Gprof (comes with binutils, gcc support)
-pg
, run application (which now writes gmon.out
), and use gprof executable gmon.out
Valgrind (start with Callgrind)
valgrind --tool=callgrind ./program -args
--dump-instr=yes
)perf record ./program -args
, then perf report
Other instrumentation systems