Linux Kernel

"People interested in low-level scary stuff should take a look at the uaccess.h files for x86 or alpha, and be ready to spend some time just figuring out what it all does" - Linus Torvalds:

Dictionary general meaning - a central or essential part.

Linux is the kernel: the program in the system that allocates the machine's resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Linux is normally used in combination with the GNU operating system: the whole system is basically GNU with Linux added, or GNU/Linux. All the so-called “Linux” distributions are really distributions of GNU/Linux.

Many users do not understand the difference between the kernel, which is Linux, and the whole system, which they also call “Linux”. The ambiguous use of the name doesn't help people understand. These users often think that Linus Torvalds developed the whole operating system in 1991, with a bit of help.

Programmers generally know that Linux is a kernel. But since they have generally heard the whole system called “Linux” as well, they often envisage a history that would justify naming the whole system after the kernel. For example, many believe that once Linus Torvalds finished writing Linux, the kernel, its users looked around for other free software to go with it, and found that (for no particular reason) most everything necessary to make a Unix-like system was already available.

In a Unix system, several concurrent processes attend to different tasks. Each process asks for system resources, be it computing power, memory, network connectivity, or some other resource. The kernel is the big chunk of executable code in charge of handling all such requests. Although the distinction between the different kernel tasks isn’t always clearly marked, the kernel’s role can be split (as shown in Figure 1-1) into the following parts:

Process management

The kernel is in charge of creating and destroying processes and handling their connection to the outside world (input and output). Communication among different processes (through signals, pipes, or interprocess communication primitives) is basic to the overall system functionality and is also handled by the kernel. In addition, the scheduler, which controls how processes share the CPU, is part of process management. More generally, the kernel’s process management activity implements the abstraction of several processes on top of a single CPU or a few of them.

Memory management

The computer’s memory is a major resource, and the policy used to deal with it is a critical one for system performance. The kernel builds up a virtual addressing space for any and all processes on top of the limited available resources. The different parts of the kernel interact with the memory-management subsystem through a set of function calls, ranging from the simple malloc/free pair to much more complex functionalities.

Filesystems

Unix is heavily based on the filesystem concept; almost everything in Unix can be treated as a file. The kernel builds a structured filesystem on top of unstructured hardware, and the resulting file abstraction is heavily used throughout the whole system. In addition, Linux supports multiple filesystem types, that is, different ways of organizing data on the physical medium. For example, disks may be formatted with the Linux-standard ext3 filesystem, the commonly used FAT filesystem or several others.

Device control

Almost every system operation eventually maps to a physical device. With the exception of the processor, memory, and a very few other entities, any and all device control operations are performed by code that is specific to the device being addressed. That code is called a device driver. The kernel must have embedded in it a device driver for every peripheral present on a system, from the hard drive to the keyboard and the tape drive. This aspect of the kernel’s functions is our primary interest in this book.

Networking

Networking must be managed by the operating system, because most network operations are not specific to a process: incoming packets are asynchronous events. The packets must be collected, identified, and dispatched before a process takes care of them. The system is in charge of delivering data packets across program and network interfaces, and it must control the execution of programs according to their network activity. Additionally, all the routing and address resolution issues are implemented within the kernel.

Linux Architecture

Linux is primarily divided into User Space & Kernel Space. These two components interact through a System Call Interface – which is a predefined and matured interface to Linux Kernel for Userspace applications.

Objectives:

[ ] Understand Linux
- [ ] File system #65
- [ ] Startup Process
[ ] Contribute to Linux

Resources

[ ] https://www.kernel.org
[ ] https://developer.ibm.com/technologies/linux/articles/l-linux-kernel
[ ] https://en.wikipedia.org/wiki/Linux_kernel
[ ] https://en.wikipedia.org/wiki/Linux_startup_process
[ ] https://www.kernel.org/doc/html/latest/process/howto.html#documentation
[ ] https://www.collabora.com/news-and-blog/blog/2020/07/14/introduction-to-linux-kernel-initcalls
[ ] https://kernelnewbies.org/Documentation/Subsystems
[ ] Linux System Programming: Talking Directly to the Kernel and C Library Second Edition
[ ] Understanding the Linux Kernel, Third Edition 3rd Edition by Daniel P. Bovet (Author), Marco Cesati
[ ] Linux Kernel Development 3rd Edition by Robert Love
[ ] Unikernels
[ ] https://google.github.io/security-research/pocs/linux/bleedingtooth/writeup
[ ] https://linuxjourney.com Good resource to learn.
[ ] https://books.google.com.np/books?id=vdZWBQAAQBAJ Linux Dictionary
[ ] https://refspecs.linuxfoundation.org/

References

65

Processes #361

Interrupts:

Interrupts are a crucial part of how computers process data.

Interrupts are an essential part of how modern CPUs work. For example, every time you press a key on the keyboard, the CPU is interrupted so that the PC can read user input from the keyboard. This happens so quickly that you don't notice any change or impairment in user experience.

Moreover, the keyboard is not the only component that can cause interrupts. In general, there are three types of events that can cause the CPU to interrupt: Hardware interrupts, software interrupts, and exceptions. Before getting into the different types of interrupts, I'll define some terms.

https://opensource.com/article/20/10/linux-kernel-interrupts

Kernel

Different Types of Kernels

There are, of course, different ways to build a kernel and architectural considerations when building one from scratch. In general, most kernels fall into one of three types: monolithic, microkernel, and hybrid. Linux is a monolithic kernel while OS X (XNU) and Windows 7 use hybrid kernels. Let’s take a quick tour of the three categories so we can go into more detail later.

Microkernel

A microkernel takes the approach of only managing what it has to: CPU, memory, and IPC. Pretty much everything else in a computer can be seen as an accessory and can be handled in user mode. Microkernels have a advantage of portability because they don’t have to worry if you change your video card or even your operating system so long as the operating system still tries to access the hardware in the same way. Microkernels also have a very small footprint, for both memory and install space, and they tend to be more secure because only specific processes run in user mode which doesn’t have the high permissions as supervisor mode.

Pros Portability Small install footprint Small memory footprint Security Cons Hardware is more abstracted through drivers Hardware may react slower because drivers are in user mode Processes have to wait in a queue to get information Processes can’t get access to other processes without waiting

Monolithic Kernel

Monolithic kernels are the opposite of microkernels because they encompass not only the CPU, memory, and IPC, but they also include things like device drivers, file system management, and system server calls. Monolithic kernels tend to be better at accessing hardware and multitasking because if a program needs to get information from memory or another process running it has a more direct line to access it and doesn’t have to wait in a queue to get things done. This however can cause problems because the more things that run in supervisor mode, the more things that can bring down your system if one doesn’t behave properly.

Pros More direct access to hardware for programs Easier for processes to communicate between eachother If your device is supported, it should work with no additional installations Processes react faster because there isn’t a queue for processor time Cons Large install footprint Large memory footprint Less secure because everything runs in supervisor mode

Hybrid Kernel

Hybrid kernels have the ability to pick and choose what they want to run in user mode and what they want to run in supervisor mode. Often times things like device drivers and filesystem I/O will be run in user mode while IPC and server calls will be kept in the supervisor mode. This give the best of both worlds but often will require more work of the hardware manufacturer because all of the driver responsibility is up to them. It also can have some of the latency problems that is inherent with microkernels. Pros Developer can pick and choose what runs in user mode and what runs in supervisor mode Smaller install footprint than monolithic kernel More flexible than other models Cons Can suffer from same process lag as microkernel Device drivers need to be managed by user (typically)

Where Are the Linux Kernel Files?

The kernel file, in Ubuntu, is stored in your /boot folder and is called vmlinuz-version. The name vmlinuz comes from the unix world where they used to call their kernels simply “unix” back in the 60’s so Linux started calling their kernel “linux” when it was first developed in the 90’s.

When virtual memory was developed for easier multitasking abilities, “vm” was put at the front of the file to show that the kernel supports virtual memory. For a while the Linux kernel was called vmlinux, but the kernel grew too large to fit in the available boot memory so the kernel image was compressed and the ending x was changed to a z to show it was compressed with zlib compression. This same compression isn’t always used, often replaced with LZMA or BZIP2, and some kernels are simply called zImage.

The version numbering will be in the format A.B.C.D where A.B will probably be 2.6, C will be your version, and D indicates your patches or fixes.

In the /boot folder there will also be other very important files called initrd.img-version, system.map-version, and config-version.

The initrd file is used as a small RAM disk that extracts and executes the actual kernel file.

The system.map file is used for memory management before the kernel fully loads, and the config file tells the kernel what options and modules to load into the kernel image when the it is being compiled.

Linux Kernel Architecture

Because the Linux kernel is monolithic, it has the largest footprint and the most complexity over the other types of kernels. This was a design feature which was under quite a bit of debate in the early days of Linux and still carries some of the same design flaws that monolithic kernels are inherent to have.

One thing that the Linux kernel developers did to get around these flaws was to make kernel modules that could be loaded and unloaded at runtime, meaning you can add or remove features of your kernel on the fly. This can go beyond just adding hardware functionality to the kernel, by including modules that run server processes, like low level virtualization, but it can also allow the entire kernel to be replaced without needing to reboot your computer in some instances.

Imagine if you could upgrade to a Windows service pack without ever needing to reboot… What if Windows had every driver available already installed and you just had to turn on the drivers you needed? That is essentially what kernel modules do for Linux. Kernel modules, also known as a loadable kernel module (LKM), are essential to keeping the kernel functioning with all of your hardware without consuming all of your available memory.

A module typically adds functionality to the base kernel for things like devices, file systems, and system calls. LKMs have the file extension .ko and are typically stored in the /lib/modules directory. Because of their modular nature you can easily customize your kernel by setting modules to load, or not load, during startup with the menuconfig command or by editing your /boot/config file, or you can load and unload modules on the fly with the modprobe command.

Third party and closed source modules are available in some distributions, like Ubuntu, and may not be installed by default because the source code for the modules is not available. The developer of the software (i.e. nVidia, ATI, among others) do not provide the source code but rather they build their own modules and compile the needed .ko files for distribution. While these modules are free as in beer, they are not free as in speech and thus are not included by some distributions because the maintainers feel it “taints” the kernel by providing non-free software.

A kernel isn’t magic, but it is completely essential to any computer running properly. The Linux kernel is different than OS X and Windows because it includes drivers at the kernel level and makes many things supported “out of the box”. Hopefully you will know a little bit more about how your software and hardware works together and what files you need to boot your computer.

https://www.howtogeek.com/howto/31632/what-is-the-linux-kernel-and-what-does-it-do

Where the kernel fits within the OS

To put the kernel in context, you can think of a Linux machine as having 3 layers:

The hardware: The physical machine—the bottom or base of the system, made up of memory (RAM) and the processor or central processing unit (CPU), as well as input/output (I/O) devices such as storage, networking, and graphics. The CPU performs computations and reads from, and writes to, memory.
The Linux kernel: The core of the OS. (See? It’s right in the middle.) It’s software residing in memory that tells the CPU what to do.

User processes: These are the running programs that the kernel manages. User processes are what collectively make up user space. User processes are also known as just processes. The kernel also allows these processes and servers to communicate with each other (known as inter-process communication, or IPC).

Code executed by the system runs on CPUs in 1 of 2 modes: kernel mode or user mode. Code running in the kernel mode has unrestricted access to the hardware, while user mode restricts access to the CPU and memory to the SCI. A similar separation exists for memory (kernel space and user space). These 2 small details form the base for some complicated operations like privilege separation for security, building containers, and virtual machines.

This also means that if a process fails in user mode, the damage is limited and can be recovered by the kernel. However, because of its access to memory and the processor, a kernel process crash can crash the entire system. Since there are safeguards in place and permissions required to cross boundaries, user process crashes usually can’t cause too many problems.

https://www.redhat.com/en/topics/linux/what-is-the-linux-kernel

https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html

What is ramfs?

Ramfs is a very simple filesystem that exports Linux’s disk caching mechanisms (the page cache and dentry cache) as a dynamically resizable RAM-based filesystem.

tmpfs:

One downside of ramfs is you can keep writing data into it until you fill up all memory, and the VM can’t free it because the VM thinks that files should get written to backing store (rather than swap space), but ramfs hasn’t got any backing store. Because of this, only root (or a trusted user) should be allowed write access to a ramfs mount. A ramfs derivative called tmpfs was created to add size limits, and the ability to write the data to swap space. Normal users can be allowed write access to tmpfs mounts.

What is rootfs?

Rootfs is a special instance of ramfs (or tmpfs, if that’s enabled), which is always present in 2.6 systems. You can’t unmount rootfs for approximately the same reason you can’t kill the init process; rather than having special code to check for and handle an empty list, it’s smaller and simpler for the kernel to just make sure certain lists can’t become empty. Most systems just mount another filesystem over rootfs and ignore it. The amount of space an empty instance of ramfs takes up is tiny. If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by default. To force ramfs, add “rootfstype=ramfs” to the kernel command line.

What is initramfs?

All 2.6 Linux kernels contain a gzipped “cpio” format archive, which is extracted into rootfs when the kernel boots up. After extracting, the kernel checks to see if rootfs contains a file “init”, and if so it executes it as PID 1. If found, this init process is responsible for bringing the system the rest of the way up, including locating and mounting the real root device (if any). If rootfs does not contain an init program after the embedded cpio archive is extracted into it, the kernel will fall through to the older code to locate and mount a root partition, then exec some variant of /sbin/init out of that.

https://wiki.ubuntu.com/Initramfs

initrd ( Initial ramdisk)

initrd is a scheme for loading a temporary root file system into memory, which may be used as part of the Linux startup process. initrd and initramfs refer to two different methods of achieving this. Both are commonly used to make preparations before the real root file system can be mounted.

Initramfs

Initramfs is used as the first root filesystem that your machine has access to. It is used for mounting the real rootfs which has all your data. The initramfs carries the modules needed for mounting your rootfs.

Now here is how the initramfs executes. The first process to get control is the init process. The init process procedurally invokes other scripts kept in the initrd. These scripts are kept in the scripts dir in your initramfs. The scripts dir is further divided into the following dirs: init-top init-premount boot-top (your crypt scripts execute here - to ask the user password for eg) boot-premount boot boot-bottom init-bottom

Here boot is replaced by local or remote depending on whether your rootfs is local or remote. This is the script that actually mounts your rootfs on initramfs/root/. If everything goes right then the boot-bottom, init-bottom scripts are executed sequentially. init then gets the control back and does the following things: moves the /sys from initramfs to /initramfs/root/sys (in your real rootfs) moves /proc from initramfs to /initramfs/root/proc calls run-init to run the real init in your real rootfs kept in /root. run-init does something like chroot to the real rootfs and then executes the init kept in /sbin/ or /bin or whatever user requested as a boot parameter.

initramfs

http://www.linuxfromscratch.org/blfs/view/svn/postlfs/initramfs.html The only purpose of an initramfs is to mount the root filesystem. The initramfs is a complete set of directories that you would find on a normal root filesystem. It is bundled into a single cpio archive and compressed with one of several compression algorithms.

At boot time, the boot loader loads the kernel and the initramfs image into memory and starts the kernel. The kernel checks for the presence of the initramfs and, if found, mounts it as / and runs /init. The init program is typically a shell script. Note that the boot process takes longer, possibly significantly longer, if an initramfs is used.

For most distributions, kernel modules are the biggest reason to have an initramfs. In a general distribution, there are many unknowns such as file system types and disk layouts. In a way, this is the opposite of Linux File System where the system capabilities and layout are known and a custom kernel is normally built. In this situation, an initramfs is rarely needed.

There are only four primary reasons to have an initramfs in the LFS environment: loading the rootfs from a network, loading it from an LVM logical volume, having an encrypted rootfs where a password is required, or for the convenience of specifying the rootfs as a LABEL or UUID. Anything else usually means that the kernel was not configured properly.

init

In Unix-based computer operating systems, init (short for initialization) is the first process started during booting of the computer system. Init is a daemon process that continues running until the system is shut down. It is the direct or indirect ancestor of all other processes and automatically adopts all orphaned processes. Init is started by the kernel during the booting process; a kernel panic will occur if the kernel is unable to start it. Init is typically assigned process identifier 1.

In Unix systems such as System III and System V, the design of init has diverged from the functionality provided by the init in Research Unix and its BSD derivatives. Up until recently, most Linux distributions employed a traditional init that is somewhat compatible with System V, while some distributions such as Slackware use BSD-style startup scripts, and others such as Gentoo have their own customized versions.

Since then, several additional init implementations have been created, attempting to address design limitations in the traditional versions. These include launchd, the Service Management Facility, systemd, Runit and OpenRC.

https://en.wikipedia.org/wiki/Init

Kernel Module

What is a Kernel Module?

A loadable kernel module (LKM) is a mechanism for adding code to, or removing code from, the Linux kernel at run time. They are ideal for device drivers, enabling the kernel to communicate with the hardware without it having to know how the hardware works. The alternative to LKMs would be to build the code for each and every driver into the Linux kernel.

Without this modular capability, the Linux kernel would be very large, as it would have to support every driver that would ever be needed on every device. You would also have to rebuild the kernel every time you wanted to add new hardware or update a device driver. LKMs are loaded at run time, but they do not execute in user space — they are essentially part of the kernel.

The downside of LKMs is that driver files have to be maintained for each device.

Kernel modules run in kernel space and applications run in user space. Both kernel space and user space have their own unique memory address spaces that do not overlap. This approach ensures that applications running in user space have a consistent view of the hardware, regardless of the hardware platform. The kernel services are then made available to the user space in a controlled way through the use of system calls. The kernel also prevents individual user-space applications from conflicting with each other or from accessing restricted resources through the use of protection levels (e.g., superuser versus regular user permissions).

The run-time life cycle of a typical computer program is reasonably straightforward. A loader allocates memory for the program, then loads the program and any required shared libraries. Instruction execution begins at some entry point (typically the main() point in C/C++ programs), statements are executed, exceptions are thrown, dynamic memory is allocated and deallocated, and the program eventually runs to completion. On program exit, the operating system identifies any memory leaks and frees lost memory to the pool.

A kernel module is not an application. For a start there is no main() function.

Some of the key differences are that kernel modules:

Do not execute sequentially A kernel module registers itself to handle requests using its initialization function, which runs and then terminates. The type of requests that it can handle are defined within the module code. This is quite similar to the event-driven programming model that is commonly utilized in graphical-user interface (GUI) applications.
Do not have automatic cleanup Any resources that are allocated to the module must be manually released when the module is unloaded, or they may be unavailable until a system reboots.
Do not have printf() functions Kernel code cannot access libraries of code that is written for the Linux user space. The kernel module lives and runs in kernel space, which has its own memory address space. The interface between kernel space and user space is clearly defined and controlled. We do however have a printk() function that can output information, which can be viewed from within user space.
Can be interrupted One conceptually difficult aspect of kernel modules is that they can be used by several different programs/processes at the same time. We have to carefully construct our modules so that they have a consistent and valid behavior when they are interrupted. For example - BeagleBone has a single-core processor (for the moment) but we still have to consider the impact of multiple processes accessing the module simultaneously.
Have a higher level of execution privilege Typically, more CPU cycles are allocated to kernel modules than to user-space programs. This sounds like an advantage, however, you have to be very careful that your module does not adversely affect the overall performance of your system.
Do not have floating-point support It is kernel code that uses traps to transition from integer to floating-point mode for your user space applications. However, it is very difficult to perform these traps in kernel space. The alternative is to manually save and restore floating point operations — a task that is best avoided and left to your user-space code.

anitsh / til

Linux Kernel #112