TuxML / size-analysis

Analysis of 125+ Linux configurations (this time for predicting/understanding kernel sizes)
2 stars 1 forks source link

Kernel options about sizes (Kconfig) #1

Open FAMILIAR-project opened 5 years ago

FAMILIAR-project commented 5 years ago

Hi,

goal: identify all options that are related/talking about size (in the Kconfig documentation aka help text)

An example: https://cateee.net/lkddb/web-lkddb/KASAN_OUTLINE.html

Before every memory access compiler insert function call __asan_load/__asan_store. These functions performs check of shadow memory. This is slower than inline instrumentation, however it doesn't bloat size of kernel's .text section so much as inline does.

Another: https://cateee.net/lkddb/web-lkddb/GCOV_PROFILE_ALL.html https://cateee.net/lkddb/web-lkddb/DEBUG_INFO.html

how to? parsing Kconfig files is not straightforward, but we have good experiences with this Python library: https://github.com/ulfalizer/Kconfiglib/

there is basic utility here: https://github.com/TuxML/ProjetIrma/blob/dev/miscellaneous/special-config/Kconfiglib/get_options_prompt.py https://github.com/TuxML/ProjetIrma/tree/dev/miscellaneous/special-config/Kconfiglib

it could help to:

arnobl commented 5 years ago

Which kernel do we analyse?

FAMILIAR-project commented 5 years ago

4.13.3

arnobl commented 5 years ago

Just pushed a Python script that analyses help messages from kconfig files of a given kernel. https://github.com/TuxML/ProjetIrma/blob/dev/miscellaneous/special-config/Kconfiglib/analyse_kconfig_help_msg.py

It produces a CSV file. Have to analyse this CSV file now.

arnobl commented 5 years ago

How to catch help messages that talk about 'size'? I drawn a first command for that: cat helpMsgs.csv | grep -Po ".*(small|big|size|huge|tiny|reduce|increase|large)+.*" > filterMsg.csv Have to manually check the results. Do you have other ideas?

arnobl commented 5 years ago

MTD_CFI_ADV_OPTIONS If you need to specify a specific endianness for access to flash chips, or if you wish to reduce the size of the kernel by including support for only specific arrangements of flash chips, say 'Y'. This option does not directly affect the code, but will enable other configuration options which allow you to do so. If unsure, say 'N'.

arnobl commented 5 years ago

MTD_CFI_GEOMETRY;This option does not affect the code directly, but will enable some other configuration options which would allow you to reduce the size of the kernel by including support for only certain arrangements of CFI chips. If unsure, say 'N' and all options which are supported by the current code will be enabled.

FAMILIAR-project commented 5 years ago

@arnobl that's great! I like very much your work and pattern...

technically we could integrate the pattern directly within the Python code, but it's a detail and discussable

now the question is how to validate your pattern... One method is to check whether popular options we know like https://cateee.net/lkddb/web-lkddb/DEBUG_INFO.html are in your list. I guess so, but I would add larger (or large subsumes larger?) maybe also gcc !?

another "test": https://cateee.net/lkddb/web-lkddb/SLOB.html here the allocator and space terms seems key I think you don't get it with your pattern

With your current pattern I am expecting:

maybe you can try more agressive pattern (gcc, larger, etc.) to see whether the number of relevant options increases. In fact it could be nice to know the frequencies of each individual pattern...

A good "test" is to check whether you can handle all options referenced here: https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config

Another list is here: https://elinux.org/Kernel_Size_Tuning_Guide#How_to_configure_the_kernel but it's deprecated (worth trying)

I guess there may be "corner" cases: options you cannot detect with your pattern, and options that are not talking about sizes, but I think there are very rare cases ;)

btw how many options do you identify? can you move your code to this repo?

arnobl commented 5 years ago

or large subsumes larger

Yes it it.

gcc, allocator and space

I will include them.

I do a manual check by reading the texts as there is many false positive. I have 611 results (before the manual pass). 1025 results with your suggested keywords.

can you move your code to this repo?

Yes.

arnobl commented 5 years ago

Funny fact : cat helpMsgs.csv | grep "reduce the size" I get 13 true-positive results: MTD_CFI_ADV_OPTIONS MTD_CFI_GEOMETRY ACENIC_OMIT_TIGON_I POSIX_TIMERS PROC_PAGE_MONITOR PS3_REPOSITORY_WRITE CPU_HAS_MSA MIPS_O32_FP64_SUPPORT ARCH_CATS ARCH_PERSONAL_SERVER ARCH_EBSA285_ADDIN ARCH_EBSA285_HOST ARCH_NETWINDER

arnobl commented 5 years ago

Another funny fact : cat helpMsgs.csv | grep "enlarge the kernel" I get 3 true-positive results: BLK_DEV_RAM_DAX ATA_VERBOSE_ERROR PROC_SYSCTL

arnobl commented 5 years ago

Another one: cat helpMsgs.csv | grep "reduce the kernel" I get 2 true-positive results: SSB_SILENT DEBUG_ZBOOT

arnobl commented 5 years ago

cat helpMsgs.csv| grep "smaller kernel" I get three true-positive results: KERNEL_XZ CC_OPTIMIZE_FOR_SIZE OPTIMIZE_INLINING

arnobl commented 5 years ago

I pushed the true-positive options related to size (manually checked) here: https://github.com/TuxML/size-analysis/blob/master/optionsRelatedToSize.txt Maybe someone can double check the options I selected in the ods files.

The results of the manual analysis is in the folder dataOptionsText. Orange lines refer to options that may affect the kernel size but not sure (so not included in the final results).

Some options used in https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config are not identified as they do not talk about size (or not in 4.13): https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html https://cateee.net/lkddb/web-lkddb/X86_NEED_RELOCS.html https://cateee.net/lkddb/web-lkddb/UBSAN_ALIGNMENT.html https://cateee.net/lkddb/web-lkddb/USB_SERIAL_OPTICON.html https://cateee.net/lkddb/web-lkddb/KASAN.html https://cateee.net/lkddb/web-lkddb/KCOV_INSTRUMENT_ALL.html https://cateee.net/lkddb/web-lkddb/MAXSMP.html https://cateee.net/lkddb/web-lkddb/FW_LOADER_USER_HELPER.html https://cateee.net/lkddb/web-lkddb/STRICT_MODULE_RWX.html https://cateee.net/lkddb/web-lkddb/LOCK_STAT.html https://cateee.net/lkddb/web-lkddb/PCI.html

Do we include de facto all the options that match the patterns: .*DEBUG.*, .*LOGGING*, .*FIRMWARE.*, .*Support for.*, .*DRIVER.* ? Disabling these options would reduce the kernel size for sure (but not necessary).

arnobl commented 5 years ago

A threat to validity: to what extent the help text is relevant? Correct? The text may omit that the option has an impact on the kernel size (recall). The text may suggest a tiny change in the kernel size (eg, "enlarge 2KB" = relevant?). The text may be ambiguous (eg "smaller driver": does not reduce the size of the kernel, but adds a driver that is smaller than another one).

Sometimes hard to figure out whether the size refers to the kernel size or size used at run time.

acherm commented 5 years ago

Thank you very much for this amazing work. Global remark:

I will have a deeper look at your classification, but quickly looking at: https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html for me it's about size no? There are numbers, it's all about memory, etc.

same for https://cateee.net/lkddb/web-lkddb/UBSAN_ALIGNMENT.html => memory accesses https://cateee.net/lkddb/web-lkddb/KASAN.html => debugger information

arnobl commented 5 years ago

https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html for me it's about size no? There are numbers, it's all about memory, etc.

For me it is related to memory/size at run time. Not sure it has an impact on the kernel image. It is related to my remark on size/memory: in many cases I had doubts (kernel size or run time size).

same for https://cateee.net/lkddb/web-lkddb/UBSAN_ALIGNMENT.html => memory accesses https://cateee.net/lkddb/web-lkddb/KASAN.html => debugger information

Same thing here: no information related to the size of the kernel but a reference to memory management and debugging at run time.

If you want me to include such options, no problem. I focused on those that have an impact on the kernel size.

arnobl commented 5 years ago

Another example LOG_BUF_SHIFT: Select the minimal kernel log buffer size as a power of 2. The final size is affected by LOG_CPU_MAX_BUF_SHIFT config parameter, see below. Any higher size also might be forced by log_buf_len boot parameter. Examples: 17 => 128 KB 16 => 64 KB 15 => 32 KB 14 => 16 KB 13 => 8 KB 12 => 4 KB

Do you want to consider this option?

acherm commented 5 years ago

I see your point, it's not as obvious... I found an old thread (2007! http://lkml.iu.edu/hypermail/linux/kernel/0704.3/2272.html) that discusses the option (and normally it should not directly increase the kernel size):

Several people have observed that perhaps LOG_BUF_SHIFT should be in a more 
obvious place than under DEBUG_KERNEL. Under some circumstances (such as the 
PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an 
undesirable trade off for something as trivial as increasing the kernel log 
buffer size.

Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more 
likely to be able to change it such a circumstance that the default buffer 
size is insufficient.

We might have a look at how options are realized in the code to have a better idea. Or ask to domain experts. (btw interesting, recent regression performance about LOG_BUG_SHIFT: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824864)

At the moment, I propose to tag "options" for which we have doubts. I think it's in the same line of your use of "orange" color: I like your idea. Orange options are typical options for which the learning process will (hopefully) state whether or not they have an influence on the kernel size ;)

arnobl commented 5 years ago

ok. Will do another pass to mark orange the options that refer to run time.

arnobl commented 5 years ago

I have the feeling that most of the options add features to the kernel so that they increase its size. Some of these options clearly say that they increase the kernel, the others not.

Maybe it worth the case to play with the options .DEBUG., FIRMWARE., .Support for., .*DRIVER as they should increase the size of the kernel without any information on that in the documentation.

arnobl commented 5 years ago

The script I used identified:

arnobl commented 5 years ago

Intersting point @acherm, the option 64BIT talks about kernel size depending on the arch selected: https://cateee.net/lkddb/web-lkddb/64BIT.html

Doc for x86: Say yes to build a 64-bit kernel - formerly known as x86_64 Say no to build a 32-bit kernel - formerly known as i386

Doc for parisc: Enable this if you want to support 64bit kernel on PA-RISC platform. At the moment, only people willing to use more than 2GB of RAM, or having a 64bit-only capable PA-RISC machine should say Y here. Since there is no 64bit userland on PA-RISC, there is no point to enable this option otherwise. The 64bit kernel is significantly bigger and slower than the 32bit one.

FAMILIAR-project commented 5 years ago

https://cateee.net/lkddb/web-lkddb/ELF_CORE.html Enable support for generating core dumps. Disabling saves about 4k.

not in your list, but quite tricky to identify a good example of a human "quantification": about 4k ;)

FAMILIAR-project commented 5 years ago

Another perspective is to look at tiny.config https://github.com/torvalds/linux/blob/v4.13/kernel/configs/tiny.config and also actually here: https://elixir.bootlin.com/linux/v4.13.3/source/arch/x86/configs/tiny.config

there are 5 options that are pre-set, but not much. make tinyconfig is actually a call to make allnoconfig after the pre-set of 5 options documented in files above: https://github.com/torvalds/linux/blob/master/scripts/kconfig/Makefile#L122-L123 That is, tinyconfig consists in (1) minimizing the number of options set to 'y' values; (2) 5 options are pre-set.

My point: what really makes the difference is the setting of options to 'n' or 'm'. But it's not really practical -- it's an extreme configuration.

arnobl commented 5 years ago

ok thx. Will add the word 'save' to the list of token to analyse and try to create a pattern matching for sizes (4k, 12kb, etc.)

arnobl commented 5 years ago

Tiny config seems to activate options only. Why not de-activating others? All options not activated by default?

arnobl commented 5 years ago

@FAMILIAR-project I updated the set of options by including new regex to spot sizes (eg 3%, 12 Mb) and 'save'. Results updated on the git.

arnobl commented 5 years ago

Tiny config both refers to:

Example: https://cateee.net/lkddb/web-lkddb/NOHIGHMEM.html

there are 5 options that are pre-set, but not much.

We identified these five options! Maybe the kernels we will build using the list I identified will produce smaller binaries than the tiny config one.

FAMILIAR-project commented 5 years ago

@arnobl let's see! But my intuition is that the number of 'n'/'y' values is a strong feature for predicting the size. Beating tiny is a possible and a nice challenge