GenArchBench is a Genomics benchmark suite targeting the Arm architecture. It comprises 13 multithreaded CPU kernels from the most widely used genomics tools covering the most important genome sequencing steps. GenArchBench includes 10 kernels from GenomicsBench and three additional kernels: the Bit-Parallel Myers algorithm, the Wavefront Alignment algorithm, and a SIMD accelerated version of minimap2's chaining implementation (FAST-CHAIN). The kernels have been optimized for Arm and tunned in two Arm processors: the A64FX and Graviton3.
GenArchBench includes two inputs for each kernel, one small with a target execution time of less than a minute and one large with a target execution time of a couple of minutes. Additionally, we include the expected output for each benchmark and input. To download the dataset (~90 GB):
mkdir genarch-temp
cd genarch-temp
wget https://b2drop.bsc.es/index.php/s/Nyg7TXDRpkL5zTn/download
unzip download
rm -r download
cat inputs*/genarch-inputs.tar.gz* > genarch-inputs-merged.bz
rm -r inputs*
tar -xvjf genarch-inputs-merged.bz
mv genarch-inputs ../
cd ..
rm -rf genarch-temp
Each benchmark under the benchmark folder includes a Makefile for compilation and a README that explains how to execute it and some additional information. Additionally, in the scripts
folder of each benchmark, you will find a wrapper to compile the benchmark using different compilers (scripts/compile.sh
) and two scripts to automatically run it using its two inputs with different thread counts and automatic output check (scripts/regression_small.sh
and scripts/regression_large.sh
). In order to use the scripts of each benchmark, you must set the environment variables in setup.sh, and then run the script: source benchmarks/setup.sh
.
Use the Makefile of each benchmark or the wrapper inside the scripts folder of each benchmark (scripts/compile.sh). The wrapper can be used to compile each kernel with different compilers (by default GCC) and show the environment variables that can be set to profile the kernel (see Profiling).
cd benchmarks/X
make
# OR
bash scripts/compile.sh
You can follow the instructions of the README inside each benchmark's folder or use our automatic regression tests.
The regression tests inside the scripts folder of each benchmark, scripts/regression_small.sh
and scripts/regression_large.sh
, run the benchmarks using the small and large input, respectively. Note that you need to set the required environment variables in setup.sh to use these scripts.
By default, the regression tests will run the benchmarks three times, each with a different number of threads: 1, 2, and 4. You can change the thread counts by modifying the script. Before finishing, the obtained output will be compared with the expected output to check the correctness of the execution (the output of DGB is not checked by default, take a look at its regression tests). The execution time taken with each thread count and the status of the correctness check will be reported at the end of the execution. The execution time reported only corresponds to the region of interest of the kernels, not the total execution time.
The regression tests use run_wrapper.sh to automatically detect the job scheduler of the system and run the benchmarks using it. For now, it only supports SLURM and PJM (the job scheduler of the A64FX), but it should be easy to add any other. If run_wrapper.sh does not detect any job-scheduler, it will run the benchmarks without using any.
Example of full workflow:
source benchmarks/setup.sh
cd benchmarks/X
bash scripts/compile.sh
bash scripts/regression_small.sh
bash scripts/regression_large.sh
We have annotated the code of all benchmarks to isolate their region of interest when running different profilers and tools, i.e., only the region of interest is taken into account when using such profilers and tools.
We support the following profilers and tools: Intel VTune, Perf, our modified version of DynamoRIO, the Fujitsu Advanced Performance Profiler, the Fujitsu PWR library, and the RAPL-Stopwatch library.
To profile a benchmark using Intel VTune follow the next steps.
export VTUNE_HOME="THE_PATH_TO_THE_VTUNE_DIRECTORY_THAT_CONTAINS_THE_INCLUDE_FOLDER"
VTUNE_ANALYSIS=1
:
cd benchmarks/X
make VTUNE_ANALYSIS=1
This will link the benchmark against the VTune library and active the code annotations to isolate the region of interest.
commands
variable of the regressions tests (regression_small.sh
and regression_large.sh
). IMPORTANT: You need to use the -start-paused
flag of VTune to only take into account the region of interest of the benchmark. To profile a benchmark using Perf follow the next steps.
PERF_ANALYSIS=1
:
cd benchmarks/X
make PERF_ANALYSIS=1
This will active the code annotations to isolate the region of interest.
perf_ctl.fifo
to communicate with the benchmark
perf_fifo="perf_ctl.fifo"
test -p ${perf_fifo} && unlink ${perf_fifo}
mkfifo ${perf_fifo}
perf stat
) before the benchmark's command in the commands
variable of the regressions tests (regression_small.sh
and regression_large.sh
). IMPORTANT: You need to use the -D -1
flag of Perf to only take into account the region of interest of the benchmark.To use our modified version of DynamoRIO to compute the instruction mix of an application follow the next steps.
DynamoRIO does not work yet with Arm SVE instructions.
export DYNAMORIO_INSTALL_PATH="THE_PATH_TO_DYNAMORIOS_BUILD_DIRECTORY"
DYNAMORIO_ANALYSIS=1
:
cd benchmarks/X
make DYNAMORIO_ANALYSIS=1
This will active the code annotations to isolate the region of interest.
"${DYNAMORIO_INSTALL_PATH}/bin64/drrun" -c "${DYNAMORIO_INSTALL_PATH}/api/bin/libopcodes.so" -- BENCHMARK
) before the benchmark's command in the commands
variable of the regressions tests (regression_small.sh
and regression_large.sh
).To profile a benchmark using the Fujitsu Advanced Performance Profiler (FAPP) follow the next steps.
FAPP_ANALYSIS=1
:
cd benchmarks/X
make FAPP_ANALYSIS=1
This will active the code annotations to isolate the region of interest. IMPORTANT: In the A64FX, this step only works when using the Fujitsu compiler (FCC).
commands
variable of the regressions tests (regression_small.sh
and regression_large.sh
).To measure the energy consumption of a benchmark using the Fujitsu PWR library follow the next steps.
export PWR_INCLUDES="-IPATH_TO_PWRS_INCLUDE_DIRECTORY"
export PWR_LDFLAGS="-LPATH_TO_PWRS_lib64_DIRECTORY -lpwr"
PWR=1
:
cd benchmarks/X
make PWR=1
This will link the benchmark against the PWR library and active the code annotations to isolate the region of interest.
To measure the energy consumption of a benchmark using the RAPL-Stopwatch library
export RAPL_STOPWATCH_INCLUDES="-IPATH_TO_RAPL_STOPWATCHS_INCLUDE_DIRECTORY/rapl_stopwatch"
export RAPL_STOPWATCH_LDFLAGS="-LPATH_TO_RAPL_STOPWATCHS_lib64_DIRECTORY -lrapl_stopwatch"
RAPL_STOPWATCH=1
:
cd benchmarks/X
make RAPL_STOPWATCH=1
This will link the benchmark against the RAPL-Stopwatch library and active the code annotations to isolate the region of interest.
Each benchmark is individually licensed according to the tool it is extracted from.
Lorién López-Villellas, Rubén Langarita-Benítez, Asaf Badouh, Víctor Soria-Pardos, Quim Aguado-Puig, Guillem López-Paradís, Max Doblas, Javier Setoain, Chulho Kim, Makoto Ono, Adrià Armejach, Santiago Marco-Sola, Jesús Alastruey-Benedé, Pablo Ibáñez, and Miquel Moretó. GenArchBench: A genomics benchmark suite for arm HPC processors, Future Generation Computer Systems, 2024.
@article{lopezvillellas2024,
title = {GenArchBench: A genomics benchmark suite for arm HPC processors},
journal = {Future Generation Computer Systems},
year = {2024},
issn = {0167-739X},
doi = {https://doi.org/10.1016/j.future.2024.03.050},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X24001250},
author = {Lorién López-Villellas and Rubén Langarita-Benítez and Asaf Badouh and Víctor Soria-Pardos and Quim Aguado-Puig and Guillem López-Paradís and Max Doblas and Javier Setoain and Chulho Kim and Makoto Ono and Adrià Armejach and Santiago Marco-Sola and Jesús Alastruey-Benedé and Pablo Ibáñez and Miquel Moretó},
}