cirosantilli / parsec-benchmark

PARSEC Benchmark http://parsec.cs.princeton.edu 3.0-beta-20150206 ported to Ubuntu 22.04 and with proper version control and SPLASH2 ported to Buildroot 2017.08 cross compilation (ARM, MIPS, etc.). This repo intends to support all build types and benchmarks. Test data stored on a release since the Princeton website died.
Other
80 stars 39 forks source link

= PARSEC Benchmark :idprefix: :idseparator: - :sectanchors: :sectlinks: :sectnumlevels: 6 :sectnums: :toc: macro :toclevels: 6 :toc-title:

PARSEC http://parsec.cs.princeton.edu/ 3.0-beta-20150206 ported to Ubuntu 22.04 and <> ported to Buildroot 2017.08 cross compilation (ARM, MIPS, etc.). This repo intends to support all build types and benchmarks.

As of November 2023, the official Princeton website was down. The https://web.archive.org/web/20230922200507/https://parsec.cs.princeton.edu/[last working archive was from September 22, 2023]. It is unclear if they retired it or if it is a bug. If they retired it, OMG. For this reason, we re-uploaded their original test data to: https://github.com/cirosantilli/parsec-benchmark/releases/tag/3.0 The scripts in this repository automatically download and use those blobs.

toc::[]

== Getting started Ubuntu 17.10 native host

.... ./configure ....

Before doing anything else, you must get the parecmgmt command with:

.... . env.sh ....

Build all:

.... parsecmgmt -a build -p all ....

-a means "action", i.e. which action to take for the selected packages.

Build just the ferret benchmark:

.... parsecmgmt -a build -p ferret ....

Build all <> benchmarks:

.... parsecmgmt -a build -p splash2x ....

Build just one SPLASH2 benchmark:

.... parsecmgmt -a build -p splash2x.barnes ....

List all benchmarks:

.... parsecmgmt -a info ....

Run one splash2 benchmark with the test <>:

.... parsecmgmt -a run -p splash2x.barnes -i test ....

Non-splash 2:

.... parsecmgmt -a run -p netdedup ....

For some reason, the splash2 version (without the X) does not have any test data besides -i test, making it basically useless. So just use the X version instead. TODO why? Can we just remove it then? When running splash2, it says:

.... NOTE: SPLASH-2 only supports "test" input sets. ....

so likely not a bug.

Run all packages with the default test input size:

.... parsecmgmt -a run -p all ....

Not every benchmark has every input size, e.g. splash2.barnes only has test input inside of core and input-sim

TODO runs all sizes, or just one default size:

.... parsecmgmt -a run -p splash2x ....

=== Input size

Run one splash2 benchmark with one <>, listed in by increasing size:

.... parsecmgmt -a run -p splash2x.barnes -i test parsecmgmt -a run -p splash2x.barnes -i simdev parsecmgmt -a run -p splash2x.barnes -i simsmall parsecmgmt -a run -p splash2x.barnes -i simmedium parsecmgmt -a run -p splash2x.barnes -i simlarge parsecmgmt -a run -p splash2x.barnes -i native ....

The original link:README[] explains how input sizes were originally dosaged:


All inputs except 'test' and 'simdev' can be used for performance analysis. As a rough guideline, on a Pentium 4 processor with 3.0 GHz you can expect approximately the following execution times:

=== __parsec_roi_begin

One of the most valuable things parsec offers is that it instruments the region of interest of all benchmarks with:

.... __parsec_roi_begin ....

That can then be overridden for different targets to check time, cache state, etc. on the ROI:

=== parsecmgmt ferret rebuild after source change broken

Some rebuilds after source changes with -a build are a bit broken. E.g. a direct:

.... parsecmgmt -a build -p ferret ....

doesn't do anything if you have modified sources. Also, trying to clean first still didn't work:

.... parsecmgmt -a clean -p ferret parsecmgmt -a build -p ferret ....

What worked was a more brutal removal of inst and obj:

.... pkgs/apps/ferret/inst pkgs/apps/ferret/obj ....

see also: <>.

You could also do:

.... git clean -xdf pkgs/apps/ferret ./get-inputs ....

but then you would need to re-rrun ./get-inputs again because the git clen -xdf removes the unpacked inputs that were placed under pkgs/apps/ferret/inputs/.

=== Package directory structure

Most/all packages appears to be organized in the same structure, take pkgs/apps/ferret for example:

!/bin/bash

run_exec="bin/ferret" run_args="corel lsh queries 50 20 ${NTHREADS} output.txt" ....

=== Host x264

Fails with:

.... [PARSEC] Running 'time /home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/pkgs/apps/x264/inst/amd64-linux.gcc/bin/x264 --quiet --qp 20 --partitions b8x8,i4x4 --ref 5 --direct auto --b-pyramid --weightb --mixed-refs --no-fast-pskip --me umh --subme 7 --analyse b8x8,i4x4 --threads 1 -o eledream.264 eledream_32x18_1.y4m': [PARSEC] [---------- Beginning of output ----------] PARSEC Benchmark Suite Version 3.0-beta-20150206 yuv4mpeg: 32x18@25/1fps, 0:0 Error in `/home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/pkgs/apps/x264/inst/amd64-linux.gcc/bin/x264': double free or corruption (!prev): 0x0000000001a88e50 /home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/bin/parsecmgmt: line 1222: 20944 Aborted (core dumped) /home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/pkgs/apps/x264/inst/amd64-linux.gcc/bin/x264 --quiet --qp 20 --partitions b8x8,i4x4 --ref 5 --direct auto --b-pyramid --weightb --mixed-refs --no-fast-pskip --me umh --subme 7 --analyse b8x8,i4x4 --threads 1 -o eledream.264 eledream_32x18_1.y4m ....

Mentioned on the following unresolved Parsec threads:

The problem does not happen on Ubuntu 17.10's x264 0.148.2795 after removing b-pyramid which is not a valid argument anymore it seems., so the easiest fix for this problem is to just take the latest x264 (as a submodule, please!!) and apply parsec roi patches to it (git grep parsec under x264/src).

=== Host splash2x.fmm

Segfaults.

== Getting started Buildroot cross compilation

See the instructions at: https://github.com/cirosantilli/linux-kernel-module-cheat#parsec-benchmark The Buildroot package is in that repo at: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/2c12b21b304178a81c9912817b782ead0286d282/parsec-benchmark

If you have already built for the host previously, you must first in this repo:

Only SPLASH2 was ported currently, not the other benchmarks.

PARSEC's build was designed for multiple archs, this can be seen at bin/parsecmgmt, but not for cross compilation. Some of the changes we've had to make:

The following variables are required for cross compilation, with example values:

.... export GNU_HOST_NAME='x86_64-pc-linux-gnu' export HOSTCC='/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.arm~/host/bin/ccache /usr/bin/gcc' export M4='/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.arm~/host/usr/bin/m4' export MAKE='/usr/bin/make -j6' export OSTYPE=linux export TARGET_CROSS='/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.arm~/host/bin/arm-buildroot-linux-uclibcgnueabi-' export HOSTTYPE='"arm"' ....

Then just do a normal build.

=== Non SPLASH

We have made a brief attempt to get the other benchmarks working. We have already adapted and merged parts of the patches static-patch.diff and xcompile-patch.diff present at: https://github.com/arm-university/arm-gem5-rsk/tree/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/parsec_patches

But it was not enough for successful integration as documented below.

The main point to note is that the non-SPLASH benchmarks all use Automake.

==== Non SPLASH arm

Some of the benchmarks fail to build with:

.... atomic/atomic.h:38:4: error: #error Architecture not supported by atomic.h ....

The ARM gem5 RSK patches do seem to fix that for aarch64, but not for arm, we should port them to arm too.

Some benchmarks don't rely on that however, and they do work, e.g. bodytrack.

==== Non SPLASH aarch64

Some builds work, but not all.

parsec.raytrace depends on cmake, which fails with:

....

CMake 2.6-1, Copyright (c) 2007 Kitware, Inc., Insight Consortium

Error when bootstrapping CMake: Cannot find appropriate C compiler on this system. Please specify one using environment variable CC. See cmake_bootstrap.log for compilers attempted. ....

which is weird since I am exporting CC.

It is the only package that depends on cmake and mesa as can be found with:

.... git grep 'deps.*cmake' ....

cmake we could use host / Buildroot built one, but Mesa, really? For a CPU benchmark? I'm tempted to just get rid of this benchmark.

Furthermore, http://gem5.org/PARSEC_benchmarks says that raytrace relies on SSE intrinsics, so maybe it is not trivially portable anyways.

If we disable raytrace, cmake and mesa by editing config/packages/parsec.raytrace.pkgconf, parsec.cmake.pkgconf and parsec.mesa.pkgconf to contain:

.... pkg_aliases="" ....

the next failure is dedup, which depends on ssl, which fails with:

.... Operating system: x86_64-whatever-linux2 Configuring for linux-x86_64 Usage: Configure.pl [no- ...] [enable- ...] [experimental- ...] [-Dxxx] [-lxxx] [-Lxxx] [-fxxx] [-Kxxx] [no-hw-xxx|no-hw] [[no-]threads] [[no-]shared] [[no-]zlib|zlib-dynamic] [enable-mon tasm] [no-asm] [no-dso] [no-krb5] [386] [--prefix=DIR] [--openssldir=OPENSSLDIR] [--with-xxx[=vvv]] [--test-sanity] os/compiler[:flags] ....

dedup and netdedup are the only packages that use ssl. ssl is actually OpenSSL, which Buildroot has.

The next failure is vips due to glib:

.... checking for growing stack pointer... configure: error: in `/path/to/linux-kernel-module-cheat/out/aarch64/buildroot/build/parsec-benchmark-custom/pkgs/libs/glib/obj/aarch64-linux.gcc': configure: error: cannot run test program while cross compiling ....

which is weird, I thought those Automake problems were avoided by --build and --host, which we added in a previous patch.

glib is and libxml are only used by vips. Buildroot has only parts of glib it seems, e.g. glibmm, but it does have libxml2.

The next failure is uptcpip on which all netapps depend:

.... ar rcs libuptcp.a ../freebsd.kern/.o ../freebsd.netinet/.o *.o ../host.support/uptcp_statis.o ../host.support/host_serv.o ../host.support/if_host.o ar: ../host.support/uptcp_statis.o: No such file or directory ....

I hack in a pwd on the configure, and the CWD is pkgs/apps/x264/obj/aarch64-linux.gcc, so sure, there is no ./config.sub there...

And the errors are over! :-)

== test.sh unit tests

While it is possible to run all tests on host with parsecmgmt, this has the following disadvantages:

For those reasons, we have created the link:test.sh[] script, which runs the raw executables directly, and stops on failures.

That script can be run either on host, or on guest, but you must make sure that all test inputs have been previously unpacked with:

.... parsecmgmt -a run -p all ....

test size is required since the input names for some benchmarks are different depending on the test sizes.

== Overview of the benchmarks

https://parsec.cs.princeton.edu/overview.htm gives an overview of some of them, but it is too short to be understood. TODO: go over all of them with sample input/output analysis! One day.

=== apps vs kernels

=== netapps

Documented at: https://parsec.cs.princeton.edu/parsec3-doc.htm#network


PARSEC 3.0 provides three server/client mode network benchmarks which leverage a user-level TCP/IP stack library for communication.


Everything under netapps is a networked version of something under app, e.g.

=== SPLASH2

Was apparently a separate benchmark that got merged in.

This is suggested e.g. at https://parsec.cs.princeton.edu/overview.htm which compares SPLASH2 as a separate benchmark to parsec, linking to the now dead http://www-flash.stanford.edu/apps/SPLASH/

This is also presumably why splash went in under ext.

https://parsec.cs.princeton.edu/parsec3-doc.htm#splash2 documents it as


SPLASH-2 benchmark suite includes applications and kernels mostly in the area of high performance computing (HPC). It has been widely used to evaluate multiprocessors and their designs for the past 15 years.


=== ferret

https://parsec.cs.princeton.edu/overview.htm describes it as:


Content similarity search server


This presentation by original authors appears to describe the software: https://www.cs.princeton.edu/cass/papers/Ferret_slides.pdf And here's the paper: https://www.cs.princeton.edu/cass/papers/Ferret.pdf so we understand that it is some research software from Princeton.

Unzipping the <<package-directory-structure,inputs>> there are a bunch of images, so we understand that it must be some kind of image similarity, i.e. a computer vision task.

Given the incrediable advances in computer vision in the 2010's, these algorithms have likey become completely obsolete compared to deep learning techniques.

Running with:

.... parsecmgmt -a run -p ferret -i simsmall ....

we see the program output as:

.... (7,1) (16,2) (16,3) (16,4) (16,5) (16,6) (16,7) (16,8) (16,9) (16,10) (16,11) (16,12) (16,13) (16,14) (16,15) (16,16) ....

TODO understand.One would guess that it shows which image looks the most like each other image? But then that would mean that the algorithm sucks, since almost everything looks like 16. And 16,16 looks like itself which would have to be excluded.

If we unpack the input directory, we can see that there are 16 images some of them grouped by type:

.... acorn.jpg air-fighter.jpg airplane-2.jpg airplane-takeoff-3.jpg alcatraz-island-prison.jpg american-flag-3.jpg apartment.jpg apollo-2.jpg apollo-earth.jpg apple-11.jpg apple-14.jpg apple-16.jpg apple-7.jpg aquarium-fish-25.jpg arches-9.jpg arches.jpg ....

so presumably authors would expect the airplaines and apples to be more similar to one another.

=== freqmine

.... [PARSEC] parsec.freqmine [1] (data mining) [PARSEC] Mine a transaction database for frequent itemsets [PARSEC] Package Group: apps [PARSEC] Contributor: Intel Corp. [PARSEC] Aliases: all parsec apps openmp ....

link:pkgs/apps/freqmine/src/README[] reads:


Frequent Itemsets Mining (FIM) is the basis of Association Rule Mining (ARM). Association Rule Mining is the process of analyzing a set of transactions to extract association rules. ARM is a very common used and well-studied data mining problem. The mining is applicable to any sequential and time series data via discretization. Example domains are protein sequences, market data, web logs, text, music, stock market, etc.

To mine ARMs is converted to mine the frequent itemsets Lk, which contains the frequent itemsets of length k. Many FIMI (FIM Implementation) algorithms have been proposed in the literature, including FP-growth and Apriori based approaches. Researches showed that the FP-growth can get much faster than some old algorithms like the Apriori based approaches except in some cases the FP-tree can be too large to be stored in memory when the database size is so large or the database is too sparse.


Googling "Frequent Itemsets Mining" leads e.g. to


Based on the items of your shopping basket, suggest other items people often buy together.


E.g. https://www.geeksforgeeks.org/frequent-item-set-in-data-set-association-rule-mining/ mentions:


For example, if a dataset contains 100 transactions and the item set {milk, bread} appears in 20 of those transactions, the support count for {milk, bread} is 20.


Running:

.... parsecmgmt -a run -p freqmine -i test ....

produces output:

.... transaction number is 3 32 192 736 2100 4676 8246 11568 12916 11450 8009 4368 1820 560 120 16 1 the data preparation cost 0.003300 seconds, the FPgrowth cost 0.002152 seconds ....

A manual run can be done with:

.... cd pkgs/apps/freqmine ./inst/amd64-linux.gcc/bin/freqmine inputs/T10I4D100K_3.dat 1 ....

where the parameters are:

both described below.

link:pkgs/apps/freqmine/parsec/test.runconf[] contains contains:

.... run_args="T10I4D100K_3.dat 1" ....

pkgs/apps/freqmine/inputs/input_test.tar contains T10I4D100K_3.dat which contains the following plaintext file:

.... 25 52 164 240 274 328 368 448 538 561 630 687 730 775 825 834 39 120 124 205 401 581 704 814 825 834 35 249 674 712 733 759 854 950 ....

So we see that it contains 3 transactions, and the _3 in the filename means the number of transactions, and it also gets output by the program:

.... transaction number is 3 ....

The README describes the input output incomprehensibly as:


For the input, a date-set file containing the test transactions is provided.

There is another parameter that indicates "minimum-support". When it is a integer, it means the minimum counts; when it is a floating point number between 0 and 1, it means the percentage to the total transaction number.

The program output all (different length) frequent itemsets with fixed minimum support.


Let's hack the "test" input to something actually minimal:

... 1 2 3 1 2 4 2 3 ...

Now the output for parameter 1 is:

.... 4 5 2 ....

and for parameter 2 is:

.... 3 2 ....

I think what it means is, take input parameter 1. 1 means the minimal support we are couning. The output:

.... 4 5 2 ....

means actually means:


How many sets are there with a given size and support at least 1:


.... set_size number_of_sets 1 -> 4 2 -> 5 3 -> 2 ....

For example, for set_size 1 there are 4 possible sets (4 pick 1, as we have 4 distinct numbers):

so we have 4 sets with support at least one, so the output for that line is 4.

For set_size 2, there are 6 possible sets (4 pick 2):

Therefore, we had 5 sets with support at least 1: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, so the output for the line is 5.

For set_size 3, there are 4 possible sets (4 pick 3):

Therefore, we had 2 sets with support at least 1: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, so the output for the line is 2.

If we take the input parameter 2 instead, we can reuse the above full calculations to retrieve the values:

Presumably therefore, there is some way to calculate these outputs without having to do the full explicit set enumeration, so you can get counts for larger support sizes but not necessarily be able to get those for the smaller ones.

=== raytrace

Well, if this doesn't do raytracing, I would be very surprised!

pkgs/apps/raytrace/inputs/input_test.tar contains:

octahedron.obj

....

#

Object octahedron.obj

#

Vertices: 6

Faces: 8

#

#

Octahedron

Synthetic model for PARSEC benchmark suite

Created by Christian Bienia

#

v 1.0 0.0 0.0 v 0.0 1.0 0.0 v 0.0 0.0 1.0 v -1.0 0.0 0.0 v 0.0 -1.0 0.0 v 0.0 0.0 -1.0

6 vertices, 0 vertices normals

f 1 2 3 f 4 2 3 f 1 5 3 f 1 2 6 f 4 5 6 f 1 5 6 f 4 2 6 f 4 5 3

8 faces, 0 coords texture

End of File

....

so clearly a representation of a 3D object, https://en.wikipedia.org/wiki/Wavefront_.obj_file describes the format.

And input_simdev.tar contains a much larger bunny.obj, which is a classic 3D model used by computer graphics researchers: https://en.wikipedia.org/wiki/Stanford_bunny

src/README documents that output would be in video format, and is turned off, boring!!!


The input for raytrace is a data file describing a scene that is composed of a single, complex object. The program automatically rotates the camera around the object to simulate movement. The output is a video stream that is displayed in a video. For the benchmark version output has been disabled.


=== pkg/libs

These appear to be all external libraries, and don't have tests specifically linked to them.

They are then used from other tests, e.g. pkg/libs/mesa is used from pkgs/apps/raytrace:

.... pkgs/apps/raytrace/parsec/gcc-pthreads.bldconf:18:build_deps="cmake mesa" ....

We also note that one lib can depend on another lib, e.g. glib depends on zlib:

.... pkgs/libs/glib/parsec/gcc.bldconf:17:build_deps="zlib" ....

so they were essentially building their own distro. They should have used Buildroot poor newbs!

Two deps in particular are special things used widely across many benchmarks:

.... git grep 'build_deps="[^"]' ....

==== hooks

Hooks are instrumentation hooks that get performance metrics out. They have several flavors for different environment, e.g. native vs magic simulator instructions.

The addition of hook points on several meaningful workloads is basically one of PARSEC's most important features.

==== tbblib

Points to: https://github.com/oneapi-src/oneTBB

Presumably it is something to do with being able to use different forms of parallelism transparently?

https://github.com/massivethreads/tp-parsec

== About this repository

This repo was started from version 3.0-beta-20150206:

.... $ md5sum parsec-3.0.tar.gz 328a6b83dacd29f61be2f25dc0b5a053 parsec-3.0.tar.gz ....

We later learnt about parsec-3.0-core.tar.gz, which is in theory cleaner than the full tar, but even that still contains some tars, so it won't make much of a difference.

Why this fork: how can a project exist without Git those days? I need a way to track my patches sanely. And the thing didn't build on latest Ubuntu of course :-)

We try to keep this as close to mainline functionality as possible to be able to compare results, except that it should build and run.

We can't track all the huge input blobs on GitHub or it will blow up the 1Gb max size, so let's try to track everything that is not humongous, and then let users download the missing blobs from Princeton directly.

Let's also remove the random output files that the researches forgot inside the messy tarball as we find them.

All that matters is that this should compile fine: runtime will then fail due to missing input data.

I feel like libs contains ancient versions of a bunch of well known third party libraries, so we are just re-porting them to newest Ubuntu, which has already been done upstream... and many of the problems are documentation generation related... at some point I want to just use Debian packages or git submodules or Buildroot packages.

TODO: after build some ./configure and config.h.in files are modified. But removing them makes build fail. E.g.:

Parse is just at another level of software engineering quality.

== Bibliography

Princeton stopped actively supporting PARSEC directly, they don't usually reply on the link:https://lists.cs.princeton.edu/pipermail/parsec-users/[mailing list]. So a few forks / patches / issue trackers have popped up in addition to ours: