Closed zvr closed 1 year ago
@zvr What about offering a list of performance tests by providing a script that clones & builds various projects, twice, once with build-recorder
and once without it, comparing the results?
This way we could be testing a number of configurations, build systems and languages.
Yes, that's the idea. We can start with a single case.
I assume hello is as simple as it gets.
Of course, not a script, Makefile rules are perfectly fine for this.
@zvr do you expect something like:
hello-test.out: hello-test
cd @$
configure
time $(BUILD_RECORDER) make
hello-test:
git clone <url> hello-test
in the Makefile?
I've merged #193 for this, as a first pass.
@zvr What's the goal of this issue? Simply add more benchmarks? I've been using xbps all this time to measure performance, as well as wine for a "big" project.
We don't necessarily need more benchmarks -- these could be added later pretty easily, by adding more download URLs.
But we do have to be able to run the benchmark and report the result. Right now the commands in the Makefile simply run a build twice: without and with recording. We should be collecting the data and reporting them (saving them in a results file).
@zvr
Well I guess something as simple as recording both outputs and printing em to a file is not what you'd like.
Then, what if we instead printed a percentage in each field, denoting the increase or decrease when compared with the original? Like:
real 112.5% user 101.23% sys 100%
Even recording it is fine; computing increased resources would also be great.
Keep in mind that time -v
reports around 25 values.
@zvr Yeah I know just reported the standard 3 as an example.
3 examples are enough, no need for 25.
So? Should I proceed with the standard time -v
format but with the percentages as previously discussed? Or should I do that in addition to recording the standard outputs of both runs?
We should be generating a report (file):
Baseline run:
<time output>
Build-recorder run:
<time output>
Comparison:
<a table or something>
I'd assume a simple awk script to process the output of the two time'd runs
@zvr Yet another issue I wanna tackle, we are using only a single thread with make
.
The performance penalty build recorder forces upon its build process varies depending on the count of threads being used.
I'd also argue that we mostly care about the scenario of multiple threads being used, since I doubt people run big builds(which build-recorder might introduce a considerable penalty upon) with just a thread or two.
Ultimately I'd argue we need a number of different benchmarks exploring all these configurations.
While I was compiling XBPS for example, which consists of a number of tools for each "operation", xbps-remove, xbps-install, xbps-query, etc. Even though the building of object files was fast, the process of make checking in and out of directories was disappointingly slow.
My point is, if we face such pitfalls with make, I can only imagine what would happen with other more complicated systems.
So yes, more benchmarks are definitely needed.
My question is, how do we deal with the threads part? Do we just pick a number like 4 at random or something more sophisticated?
Sure, after release 1.0 we should add more benchmarks, parallel ones, etc. Let's have a different issue about those.
@zvr Which of these do we care for?
I will use gnu's time(1)
-f
option to specify the output in a machine-readable way so that writing an awk script to do the work won't be a pain.
Time
%E Elapsed real time (in [hours:]minutes:seconds).
%e (Not in tcsh(1).) Elapsed real time (in seconds).
%S Total number of CPU-seconds that the process spent in kernel
mode.
%U Total number of CPU-seconds that the process spent in user mode.
%P Percentage of the CPU that this job got, computed as (%U + %S) /
%E.
Memory
%M Maximum resident set size of the process during its lifetime, in
Kbytes.
%t (Not in tcsh(1).) Average resident set size of the process, in
Kbytes.
%K Average total (data+stack+text) memory use of the process, in
Kbytes.
%D Average size of the process's unshared data area, in Kbytes.
%p (Not in tcsh(1).) Average size of the process's unshared stack
space, in Kbytes.
%X Average size of the process's shared text space, in Kbytes.
%Z (Not in tcsh(1).) System's page size, in bytes. This is a per-
system constant, but varies between systems.
%F Number of major page faults that occurred while the process was
running. These are faults where the page has to be read in from
disk.
%R Number of minor, or recoverable, page faults. These are faults
for pages that are not valid but which have not yet been claimed
by other virtual pages. Thus the data in the page is still
valid but the system tables must be updated.
%W Number of times the process was swapped out of main memory.
%c Number of times the process was context-switched involuntarily
(because the time slice expired).
%w Number of waits: times that the program was context-switched
voluntarily, for instance while waiting for an I/O operation to
complete.
I/O
%I Number of filesystem inputs by the process.
%O Number of filesystem outputs by the process.
%r Number of socket messages received by the process.
%s Number of socket messages sent by the process.
%k Number of signals delivered to the process.
%C (Not in tcsh(1).) Name and command-line arguments of the
command being timed.
%x (Not in tcsh(1).) Exit status of the command.
Start with all of them, and we see which ones do not make sense.
I think stating with time -v
would be easier, since you won't have to duplicate labels and check whether specific ones are supported.
I find it hard defining FS to parse time -v
's output with awk.
I find it easier using -f
. It doesn't stop me from re-using the labels.
You can use colon (:
) as field separator. Something like:
BEGIN { FS = ":" }
NF == 2 {key = $1; val = $2; v[key] = val; }
NF == 6 {key = $1; val = 0 + $5 * 60 + $6; v[key] = val; }
There should be a standard benchmark that can run, so that further improvements can be tracked.
Probably implemented via extra goals in the
test
Makefile.