Standardized benchmarking

zvr commented 1 year ago

There should be a standard benchmark that can run, so that further improvements can be tracked.

Probably implemented via extra goals in the test Makefile.

fvalasiad commented 1 year ago

@zvr What about offering a list of performance tests by providing a script that clones & builds various projects, twice, once with build-recorder and once without it, comparing the results?

This way we could be testing a number of configurations, build systems and languages.

zvr commented 1 year ago

Yes, that's the idea. We can start with a single case.

I assume hello is as simple as it gets.

Of course, not a script, Makefile rules are perfectly fine for this.

fvalasiad commented 1 year ago

@zvr do you expect something like:

hello-test.out: hello-test
cd @$
configure
time $(BUILD_RECORDER) make

hello-test:
git clone <url> hello-test

in the Makefile?

zvr commented 1 year ago

I've merged #193 for this, as a first pass.

fvalasiad commented 1 year ago

@zvr What's the goal of this issue? Simply add more benchmarks? I've been using xbps all this time to measure performance, as well as wine for a "big" project.

zvr commented 1 year ago

We don't necessarily need more benchmarks -- these could be added later pretty easily, by adding more download URLs.

But we do have to be able to run the benchmark and report the result. Right now the commands in the Makefile simply run a build twice: without and with recording. We should be collecting the data and reporting them (saving them in a results file).

fvalasiad commented 1 year ago

@zvr

Well I guess something as simple as recording both outputs and printing em to a file is not what you'd like.

Then, what if we instead printed a percentage in each field, denoting the increase or decrease when compared with the original? Like:

real 112.5% user 101.23% sys 100%

zvr commented 1 year ago

Even recording it is fine; computing increased resources would also be great. Keep in mind that time -v reports around 25 values.

fvalasiad commented 1 year ago

@zvr Yeah I know just reported the standard 3 as an example.

3 examples are enough, no need for 25.

So? Should I proceed with the standard time -v format but with the percentages as previously discussed? Or should I do that in addition to recording the standard outputs of both runs?

zvr commented 1 year ago

We should be generating a report (file):

Baseline run:
<time output>
Build-recorder run:
<time output>
Comparison:
<a table or something>

I'd assume a simple awk script to process the output of the two time'd runs

fvalasiad commented 1 year ago

@zvr Yet another issue I wanna tackle, we are using only a single thread with make.

The performance penalty build recorder forces upon its build process varies depending on the count of threads being used.

I'd also argue that we mostly care about the scenario of multiple threads being used, since I doubt people run big builds(which build-recorder might introduce a considerable penalty upon) with just a thread or two.

Ultimately I'd argue we need a number of different benchmarks exploring all these configurations.

While I was compiling XBPS for example, which consists of a number of tools for each "operation", xbps-remove, xbps-install, xbps-query, etc. Even though the building of object files was fast, the process of make checking in and out of directories was disappointingly slow.

My point is, if we face such pitfalls with make, I can only imagine what would happen with other more complicated systems.

So yes, more benchmarks are definitely needed.

My question is, how do we deal with the threads part? Do we just pick a number like 4 at random or something more sophisticated?

zvr commented 1 year ago

Sure, after release 1.0 we should add more benchmarks, parallel ones, etc. Let's have a different issue about those.

fvalasiad commented 1 year ago

@zvr Which of these do we care for?

I will use gnu's time(1) -f option to specify the output in a machine-readable way so that writing an awk script to do the work won't be a pain.

Time

   %E     Elapsed real time (in [hours:]minutes:seconds).

   %e     (Not in tcsh(1).)  Elapsed real time (in seconds).

   %S     Total number of CPU-seconds that the process spent in kernel
          mode.

   %U     Total number of CPU-seconds that the process spent in user mode.

   %P     Percentage of the CPU that this job got, computed as (%U + %S) /
          %E.

   Memory

   %M     Maximum resident set size of the process during its lifetime, in
          Kbytes.

   %t     (Not in tcsh(1).)  Average resident set size of the process, in
          Kbytes.

  %K     Average total (data+stack+text) memory use of the process, in
          Kbytes.

   %D     Average size of the process's unshared data area, in Kbytes.

   %p     (Not in tcsh(1).)  Average size of the process's unshared stack
          space, in Kbytes.

   %X     Average size of the process's shared text space, in Kbytes.

   %Z     (Not in tcsh(1).)  System's page size, in bytes.  This is a per-
          system constant, but varies between systems.

   %F     Number of major page faults that occurred while the process was
          running.  These are faults where the page has to be read in from
          disk.

   %R     Number of minor, or recoverable, page faults.  These are faults
          for pages that are not valid but which have not yet been claimed
          by other virtual pages.  Thus the data in the page is still
          valid but the system tables must be updated.

   %W     Number of times the process was swapped out of main memory.

   %c     Number of times the process was context-switched involuntarily
          (because the time slice expired).

   %w     Number of waits: times that the program was context-switched
          voluntarily, for instance while waiting for an I/O operation to
          complete.

   I/O

   %I     Number of filesystem inputs by the process.

   %O     Number of filesystem outputs by the process.

   %r     Number of socket messages received by the process.

   %s     Number of socket messages sent by the process.

   %k     Number of signals delivered to the process.

   %C     (Not in tcsh(1).)  Name and command-line arguments of the
          command being timed.

   %x     (Not in tcsh(1).)  Exit status of the command.

zvr commented 1 year ago

Start with all of them, and we see which ones do not make sense.

I think stating with time -v would be easier, since you won't have to duplicate labels and check whether specific ones are supported.

fvalasiad commented 1 year ago

I find it hard defining FS to parse time -v's output with awk.

I find it easier using -f. It doesn't stop me from re-using the labels.

zvr commented 1 year ago

You can use colon (:) as field separator. Something like:

BEGIN { FS = ":" }
NF == 2 {key = $1; val = $2; v[key] = val; }
NF == 6 {key = $1; val = 0 + $5 * 60 + $6; v[key] = val; }

eellak / build-recorder

Standardized benchmarking #186