Closed gonzalobg closed 1 month ago
EDIT: will do this in a sub-sequent PR using intptr_t
I've noticed while testing on GPUs with more than 104 GB of memory that the results were "wrong".
The problem is that with int array_size
, the largest array is 2^31 8 1e-9 = ~17.2 GB (51.6 GB with 3 arrays). With unsigned int
it was ~104 GB. When trying to use larger arrays, validation would fail due to overflow.
Fixing this required using size_t
in a couple of places. I've verified this for cuda
, thrust
, std-par
, openmp, etc. I've turned the check solutions "on by default", adding a --silence-errors
flag that one can pass to continue the benchmark even if validation fails.
@tomdeakin finished the changes :)
Ordering LGMT now - thanks. Just some weird mixed tab/space issues causes misaligned indenting so we need to run a tabs->spaces conversion to fix that.
@tomdeakin fixed.
Thanks @gonzalobg !
This PR gives a pass to
main.cpp
to make it much easier to add newer benchmarks in the future.Main changes are:
--only
option enables specifying different groups or individual benchmarks, e.g.,--only Classic
runs the classic 5 benchmarks,--only All
runs all benchmarks (e.g. including nstream), and--only Triad
runs only triad, etc.--triad-only
/--nstream-only
options removed (replaced by--only Triad
and--only Nstream
).--only Triad
now runstriad
in the exact same way as all other benchmarks, i.e., timing each individual iteration. Before, the outer loop was being timed.--gigabytes
and similar option print the bandwidths in GB/s (or GiB/s, MB/s, MiB/s)Bug fixes:
Init
were incorrect; the timers fromRead
were incorrectly used for bothInit
andRead
.Future:
Scan
is pretty straightforward: bump benchmark count by 1, and add them to all places in the code that fail (runner, checker, and that's it..).