bangerth / spec-cpuv8-sampleflow

A benchmark for the SPEC CPUv8 test suite based on the deal.II and SampleFlow libraries
GNU Lesser General Public License v2.1
0 stars 0 forks source link

About 747.sampleflow

747.sampleflow is a benchmark for the SPEC CPUv8 test suite that is based on the deal.II and SampleFlow libraries, see https://www.dealii.org and https://github.com/bangerth/sampleFlow/ . The deal.II has previously already served as the basis for 447.dealii in SPEC CPU2006 and 510.parest in SPEC CPU2017.

The current benchmark is an implementation of a testcase that uses a Monte Carlo sampling method to gain information about a probability distribution p(x). The definition of the function p(x) involves the solution of a partial differential equation using the finite element method. The overall problem this benchmark solves is concisely defined in this preprint (which has been accepted for publication in SIAM Review); it is intended as a problem that is simple enough to solve yet complicated enough that one needs sophisticated algorithms to get answers with sufficient accuracy. For example, the "ground truth" answers provided in the preprint above were obtained using some 30 CPU years of computations. More sophisticated algorithms than those used in the preprint should be able to obtain the same accuracy with far less effort, but generally the accuracy one gets is inversely proportional to (the square root of) the computational effort and the very large effort made for the results in the paper reflects a desire to have published results with as much accuracy as possible.

This benchmark implements a sampling algorithm for p(x) that is based on a variation of the Differential Evaluation algorithm, which is a parallel version of the Metropolis-Hastings algorithm. In it, a number N of chains are running in parallel, occasionally exchanging information. In the version used for this benchmark, N is a parameter that is set in the input file. The resulting algorithm is then of the fork-join type: At the beginning of each iteration, N work items are created to generate a new sample on each of these chains, and these items can be worked on independently; once done, we start another parallel phase where the samples are post-processed. The fork-join approach of this benchmark is implemented using a simple thread pool that maps tasks on available worker threads. It creates as many worker threads as std::thread::hardware_concurrency() states the machine can provide.

Author

Wolfgang Bangerth bangerth@gmail.com, Colorado State University.

The benchmark contains the deal.II library in a simplified version, primarily stripping out all external dependencies. Many people have contributed over the past 25 years to deal.II, see [https://dealii.org/authors.html](this page).

The benchmark also contains the SampleFlow library; the list of authors for SampleFlow can be found here.

Finally, deal.II contains a stripped-down version of BOOST.

License

deal.II is licensed under the GNU LGPL 2.1 or later.

SampleFlow is licensed under the GNU LGPL 2.1.

BOOST is licensed under the BOOST Software License.

Workload definitions

The benchmark comes with the requisite three workloads. To execute these, run the benchmark executable with the name of the respective input file as the sole command line argument. The input files are as follows:

Limiting parallel execution

By default, the fork-join model mentioned at the top of this page is implemented using a thread pool that upon start-up creates as many threads as the system reports is useful via the function std::hardware_concurrency(). The program then executes available work tasks on these threads whenever a thread becomes idle.

However, the number of threads can be limited by setting the environment variable OMP_NUM_THREADS and in that case the program uses the minimum of the two.

In order to support SPECrate benchmarks, if OMP_NUM_THREADS is set to either zero or one, the program does not start any worker threads at all, and instead of enqueuing tasks and executing them on an available thread, each task is immediately executed on the calling thread. In other words, the program executed everything sequentially.