Improvements to main.cpp

gonzalobg commented 3 months ago

This PR gives a pass to main.cpp to make it much easier to add newer benchmarks in the future.

Main changes are:

--only option enables specifying different groups or individual benchmarks, e.g., --only Classic runs the classic 5 benchmarks, --only All runs all benchmarks (e.g. including nstream), and --only Triad runs only triad, etc.
--triad-only/--nstream-only options removed (replaced by --only Triad and --only Nstream).
--only Triad now runs triad in the exact same way as all other benchmarks, i.e., timing each individual iteration. Before, the outer loop was being timed.
--gigabytes and similar option print the bandwidths in GB/s (or GiB/s, MB/s, MiB/s)

Bug fixes:

Metrics for Init were incorrect; the timers from Read were incorrectly used for both Init and Read.

Future:

After these changes, adding new benchmarks like Scan is pretty straightforward: bump benchmark count by 1, and add them to all places in the code that fail (runner, checker, and that's it..).

gonzalobg commented 3 months ago

EDIT: will do this in a sub-sequent PR using intptr_t

I've noticed while testing on GPUs with more than 104 GB of memory that the results were "wrong". The problem is that with int array_size, the largest array is 2^31 8 1e-9 = ~17.2 GB (51.6 GB with 3 arrays). With unsigned int it was ~104 GB. When trying to use larger arrays, validation would fail due to overflow.

Fixing this required using size_t in a couple of places. I've verified this for cuda, thrust, std-par, openmp, etc. I've turned the check solutions "on by default", adding a --silence-errors flag that one can pass to continue the benchmark even if validation fails.

gonzalobg commented 1 month ago

@tomdeakin finished the changes :)

gonzalobg commented 1 month ago

Ordering LGMT now - thanks. Just some weird mixed tab/space issues causes misaligned indenting so we need to run a tabs->spaces conversion to fix that.

@tomdeakin fixed.

tomdeakin commented 1 month ago

Thanks @gonzalobg !

UoB-HPC / BabelStream

Improvements to main.cpp #186