Open FAMILIAR-project opened 5 years ago
Which kernel do we analyse?
4.13.3
Just pushed a Python script that analyses help messages from kconfig files of a given kernel. https://github.com/TuxML/ProjetIrma/blob/dev/miscellaneous/special-config/Kconfiglib/analyse_kconfig_help_msg.py
It produces a CSV file. Have to analyse this CSV file now.
How to catch help messages that talk about 'size'?
I drawn a first command for that:
cat helpMsgs.csv | grep -Po ".*(small|big|size|huge|tiny|reduce|increase|large)+.*" > filterMsg.csv
Have to manually check the results.
Do you have other ideas?
MTD_CFI_ADV_OPTIONS
If you need to specify a specific endianness for access to flash chips, or if you wish to reduce the size of the kernel by including support for only specific arrangements of flash chips, say 'Y'. This option does not directly affect the code, but will enable other configuration options which allow you to do so. If unsure, say 'N'.
MTD_CFI_GEOMETRY
;This option does not affect the code directly, but will enable some other configuration options which would allow you to reduce the size of the kernel by including support for only certain arrangements of CFI chips. If unsure, say 'N' and all options which are supported by the current code will be enabled.
@arnobl that's great! I like very much your work and pattern...
technically we could integrate the pattern directly within the Python code, but it's a detail and discussable
now the question is how to validate your pattern... One method is to check whether popular options we know like https://cateee.net/lkddb/web-lkddb/DEBUG_INFO.html are in your list.
I guess so, but I would add larger
(or large
subsumes larger
?)
maybe also gcc
!?
another "test": https://cateee.net/lkddb/web-lkddb/SLOB.html
here the allocator
and space
terms seems key
I think you don't get it with your pattern
With your current pattern I am expecting:
maybe you can try more agressive pattern (gcc, larger, etc.) to see whether the number of relevant options increases. In fact it could be nice to know the frequencies of each individual pattern...
A good "test" is to check whether you can handle all options referenced here: https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config
Another list is here: https://elinux.org/Kernel_Size_Tuning_Guide#How_to_configure_the_kernel but it's deprecated (worth trying)
I guess there may be "corner" cases: options you cannot detect with your pattern, and options that are not talking about sizes, but I think there are very rare cases ;)
btw how many options do you identify? can you move your code to this repo?
or large subsumes larger
Yes it it.
gcc, allocator and space
I will include them.
I do a manual check by reading the texts as there is many false positive. I have 611 results (before the manual pass). 1025 results with your suggested keywords.
can you move your code to this repo?
Yes.
Funny fact : cat helpMsgs.csv | grep "reduce the size"
I get 13 true-positive results:
MTD_CFI_ADV_OPTIONS
MTD_CFI_GEOMETRY
ACENIC_OMIT_TIGON_I
POSIX_TIMERS
PROC_PAGE_MONITOR
PS3_REPOSITORY_WRITE
CPU_HAS_MSA
MIPS_O32_FP64_SUPPORT
ARCH_CATS
ARCH_PERSONAL_SERVER
ARCH_EBSA285_ADDIN
ARCH_EBSA285_HOST
ARCH_NETWINDER
Another funny fact : cat helpMsgs.csv | grep "enlarge the kernel"
I get 3 true-positive results:
BLK_DEV_RAM_DAX
ATA_VERBOSE_ERROR
PROC_SYSCTL
Another one: cat helpMsgs.csv | grep "reduce the kernel"
I get 2 true-positive results:
SSB_SILENT
DEBUG_ZBOOT
cat helpMsgs.csv| grep "smaller kernel"
I get three true-positive results:
KERNEL_XZ
CC_OPTIMIZE_FOR_SIZE
OPTIMIZE_INLINING
I pushed the true-positive options related to size (manually checked) here: https://github.com/TuxML/size-analysis/blob/master/optionsRelatedToSize.txt Maybe someone can double check the options I selected in the ods files.
The results of the manual analysis is in the folder dataOptionsText
.
Orange lines refer to options that may affect the kernel size but not sure (so not included in the final results).
Some options used in https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config are not identified as they do not talk about size (or not in 4.13): https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html https://cateee.net/lkddb/web-lkddb/X86_NEED_RELOCS.html https://cateee.net/lkddb/web-lkddb/UBSAN_ALIGNMENT.html https://cateee.net/lkddb/web-lkddb/USB_SERIAL_OPTICON.html https://cateee.net/lkddb/web-lkddb/KASAN.html https://cateee.net/lkddb/web-lkddb/KCOV_INSTRUMENT_ALL.html https://cateee.net/lkddb/web-lkddb/MAXSMP.html https://cateee.net/lkddb/web-lkddb/FW_LOADER_USER_HELPER.html https://cateee.net/lkddb/web-lkddb/STRICT_MODULE_RWX.html https://cateee.net/lkddb/web-lkddb/LOCK_STAT.html https://cateee.net/lkddb/web-lkddb/PCI.html
Do we include de facto all the options that match the patterns:
.*DEBUG.*
, .*LOGGING*
, .*FIRMWARE.*
, .*Support for.*
, .*DRIVER.*
?
Disabling these options would reduce the kernel size for sure (but not necessary).
A threat to validity: to what extent the help text is relevant? Correct? The text may omit that the option has an impact on the kernel size (recall). The text may suggest a tiny change in the kernel size (eg, "enlarge 2KB" = relevant?). The text may be ambiguous (eg "smaller driver": does not reduce the size of the kernel, but adds a driver that is smaller than another one).
Sometimes hard to figure out whether the size refers to the kernel size or size used at run time.
Thank you very much for this amazing work. Global remark:
I will have a deeper look at your classification, but quickly looking at: https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html for me it's about size no? There are numbers, it's all about memory, etc.
same for https://cateee.net/lkddb/web-lkddb/UBSAN_ALIGNMENT.html => memory accesses https://cateee.net/lkddb/web-lkddb/KASAN.html => debugger information
https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html for me it's about size no? There are numbers, it's all about memory, etc.
For me it is related to memory/size at run time. Not sure it has an impact on the kernel image. It is related to my remark on size/memory: in many cases I had doubts (kernel size or run time size).
same for https://cateee.net/lkddb/web-lkddb/UBSAN_ALIGNMENT.html => memory accesses https://cateee.net/lkddb/web-lkddb/KASAN.html => debugger information
Same thing here: no information related to the size of the kernel but a reference to memory management and debugging at run time.
If you want me to include such options, no problem. I focused on those that have an impact on the kernel size.
Another example
LOG_BUF_SHIFT: Select the minimal kernel log buffer size as a power of 2. The final size is affected by LOG_CPU_MAX_BUF_SHIFT config parameter, see below. Any higher size also might be forced by log_buf_len boot parameter. Examples: 17 => 128 KB 16 => 64 KB 15 => 32 KB 14 => 16 KB 13 => 8 KB 12 => 4 KB
Do you want to consider this option?
I see your point, it's not as obvious... I found an old thread (2007! http://lkml.iu.edu/hypermail/linux/kernel/0704.3/2272.html) that discusses the option (and normally it should not directly increase the kernel size):
Several people have observed that perhaps LOG_BUF_SHIFT should be in a more
obvious place than under DEBUG_KERNEL. Under some circumstances (such as the
PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an
undesirable trade off for something as trivial as increasing the kernel log
buffer size.
Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more
likely to be able to change it such a circumstance that the default buffer
size is insufficient.
We might have a look at how options are realized in the code to have a better idea. Or ask to domain experts. (btw interesting, recent regression performance about LOG_BUG_SHIFT: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824864)
At the moment, I propose to tag "options" for which we have doubts. I think it's in the same line of your use of "orange" color: I like your idea. Orange options are typical options for which the learning process will (hopefully) state whether or not they have an influence on the kernel size ;)
ok. Will do another pass to mark orange the options that refer to run time.
I have the feeling that most of the options add features to the kernel so that they increase its size. Some of these options clearly say that they increase the kernel, the others not.
Maybe it worth the case to play with the options .DEBUG., FIRMWARE., .Support for., .*DRIVER as they should increase the size of the kernel without any information on that in the documentation.
The script I used identified:
Intersting point @acherm, the option 64BIT talks about kernel size depending on the arch selected: https://cateee.net/lkddb/web-lkddb/64BIT.html
Doc for x86: Say yes to build a 64-bit kernel - formerly known as x86_64 Say no to build a 32-bit kernel - formerly known as i386
Doc for parisc: Enable this if you want to support 64bit kernel on PA-RISC platform. At the moment, only people willing to use more than 2GB of RAM, or having a 64bit-only capable PA-RISC machine should say Y here. Since there is no 64bit userland on PA-RISC, there is no point to enable this option otherwise. The 64bit kernel is significantly bigger and slower than the 32bit one.
https://cateee.net/lkddb/web-lkddb/ELF_CORE.html
Enable support for generating core dumps. Disabling saves about 4k.
not in your list, but quite tricky to identify a good example of a human "quantification": about 4k ;)
Another perspective is to look at tiny.config https://github.com/torvalds/linux/blob/v4.13/kernel/configs/tiny.config and also actually here: https://elixir.bootlin.com/linux/v4.13.3/source/arch/x86/configs/tiny.config
there are 5 options that are pre-set, but not much.
make tinyconfig
is actually a call to make allnoconfig
after the pre-set of 5 options documented in files above:
https://github.com/torvalds/linux/blob/master/scripts/kconfig/Makefile#L122-L123
That is, tinyconfig consists in (1) minimizing the number of options set to 'y' values; (2) 5 options are pre-set.
My point: what really makes the difference is the setting of options to 'n' or 'm'. But it's not really practical -- it's an extreme configuration.
ok thx. Will add the word 'save' to the list of token to analyse and try to create a pattern matching for sizes (4k, 12kb, etc.)
Tiny config seems to activate options only. Why not de-activating others? All options not activated by default?
@FAMILIAR-project I updated the set of options by including new regex to spot sizes (eg 3%, 12 Mb) and 'save'. Results updated on the git.
Tiny config both refers to:
Example: https://cateee.net/lkddb/web-lkddb/NOHIGHMEM.html
there are 5 options that are pre-set, but not much.
We identified these five options! Maybe the kernels we will build using the list I identified will produce smaller binaries than the tiny config one.
@arnobl let's see! But my intuition is that the number of 'n'/'y' values is a strong feature for predicting the size. Beating tiny is a possible and a nice challenge
Hi,
goal: identify all options that are related/talking about size (in the Kconfig documentation aka help text)
An example: https://cateee.net/lkddb/web-lkddb/KASAN_OUTLINE.html
Another: https://cateee.net/lkddb/web-lkddb/GCOV_PROFILE_ALL.html https://cateee.net/lkddb/web-lkddb/DEBUG_INFO.html
how to? parsing Kconfig files is not straightforward, but we have good experiences with this Python library: https://github.com/ulfalizer/Kconfiglib/
there is basic utility here: https://github.com/TuxML/ProjetIrma/blob/dev/miscellaneous/special-config/Kconfiglib/get_options_prompt.py https://github.com/TuxML/ProjetIrma/tree/dev/miscellaneous/special-config/Kconfiglib
it could help to: