TuxML / size-analysis

Analysis of 125+ Linux configurations (this time for predicting/understanding kernel sizes)
2 stars 1 forks source link

compilation time and size #9

Open FAMILIAR-project opened 5 years ago

FAMILIAR-project commented 5 years ago

An hypothesis is that the compilation time is related to the size of the Linux kernel: the more you spend time in compiling code, larger is the kernel size. Both compilation time and kernel sizes strongly depend on the configuration used since some options may have different effects. We aim to verify this hypothesis and characaterize the exact relationship between compilation time and size.

To do so, we first need to measure in an accurate way the compilation time. Right now, in our dataset, compilation time is not trustable since we have used very different machines with different CPUs, cores, etc. and also different workloads. We may also have doubts about the measurement procedure of TuxML: does TuxML start to measure at the right time? when there is a compilation error, there is a try-and-fix process based on apt that can bias results, does it count? A cross-cutting challenge is that we have to measure many configurations. The use of an individual machine does not scale. A plan is thus to use a cluster of machines, but we have to verify that the machines have the same characateristics and are not used by other people.

The plan is as follows:

I've used the following script for my past experiences in IGRIDA... the goal was to have the same kind of machine for controlling the homogeneous measurement of time

#!/bin/sh
#OAR -l core=8,walltime=12:00:00
#OAR -p cluster = 'lambda'
#OAR -O /temp_dd/igrida-fs1/macher/SCRATCH/fake_job.%jobid%.output
#OAR -E /temp_dd/igrida-fs1/macher/SCRATCH/fake_job.%jobid%.error
set -xv

echo
echo "Starting x264 config measurements"
echo "==================================="

./launchAll

echo
echo "DONE x264!"
echo "---------------------"
EOF
FAMILIAR-project commented 5 years ago

A short thought: why not using something like docker run tuxml/tuxml-14 cd linux-4.15; make; ls -l vmlinux (prior there is a copy of a .config in linux-4.14)

My point: we bypass the usual build process of tuxml, with the benefits of measuring the real, actual compilation time (we didn't get the features of tuxml like retrieval of hardware resources, update of packages, etc. that can effect the time). Of course there are disadvantages of this technique:

But this technique (direct call of docker image) can be used as a baseline or as a test case: given a configuration, if the compilation time with built-in tuxml facilities (significantly) differs from compilation time with native build, it may mean something.

As we are doing some specific measurements, we can also think about instrumenting our full experiment with this technique. Please note my command is a rough over-approximation and we need to refine the implementation ;)