Open kennethchiu opened 6 months ago
@kennethchiu Here it is:
$ g++ --version
g++ (GCC) 14.0.0 20231231 (experimental)
$ g++ -O2 -march=native membw.cpp -o membw && ./membw
1 thread(s):
8.94312e+09
To prevent optimizing out all ops: 2268145617
2 thread(s):
1.76821e+10
To prevent optimizing out all ops: 2268145617
3 thread(s):
2.21497e+10
To prevent optimizing out all ops: 2268145617
4 thread(s):
2.61705e+10
To prevent optimizing out all ops: 2268145617
5 thread(s):
3.22876e+10
To prevent optimizing out all ops: 2268145617
6 thread(s):
3.49681e+10
To prevent optimizing out all ops: 2268145617
7 thread(s):
3.38957e+10
To prevent optimizing out all ops: 2268145617
8 thread(s):
3.67197e+10
To prevent optimizing out all ops: 2268145617
9 thread(s):
3.53458e+10
To prevent optimizing out all ops: 2268145617
So up to ~37 GB / second for 8 threads.
My machine is a Lenovo Yoga Slim 7 sporting a AMD Ryzen 7 4800U with 16GB of RAM (DDR4, 1600 Mhz).
Now I don't know much about memory bandwidth, so would love to hear the significance of this.
Ah, just that this provides a baseline. I'm not an expert on maximizing out every ounce of mem BW, but this at least suggest that 12/37 secs will be a hard limit, because you cannot read the data out of memory faster than 37 GB/s, and the data file is about 12 GB.
That makes sense. Thank you @kennethchiu!
Curious about the memory bandwidth of your machine. If you can, I'd be interested in the results of the C++ program below. Compile with full optimization of course.