anitsh commented 4 years ago

Linux Security Infrastructure

In a normal operating system (OS), every application is unmonitored and it is difficult to determine what is happening in a system.

Privilege rings for the x86 microprocessor architecture available in protected mode. Operating systems determine which processes run in each mode.

'''Enforcing security goes in hand with knowing what you are protecting yourself against, or at least what you are protecting. All that, however, must start with a security policy. Once the policy is formulated, the choice is easier to make. Let's consider a few cases, which may or may not reflect your use cases. But before considering them, it may be helpful to narrow down the available options to the following:

If you cannot modify the target program, the only options you have are MAC --Mandatory Access Control-- (at best), since all you need is provide a security policy by which a given program will be evaluated. No matter the program, no matter the implementation language, since the operating system's security layers take care of it. Sandboxes are another option, but they fall in category 3.
If you are [re-]writing (or can modify) your program: good luck then, because you actually can write a security-aware program by invoking certain security primitives provided by a framework of your choice [more below]. However, you may be limited in that the framework you choose does not have any bindings in your programming language, as most of them tend to be low-level. Is it C, Java, Go, or Javascript (yeah, as if!)? You may be on your own in most cases, so, welcome to the club. But there are options. And silly ones sometimes.
You don't care and just want to keep things from exploding in your face (i.e. none of the above is worth your time): sandboxes then? Maybe something as dramatic as a VM, or if you are in the mood for the unknown, application containers are your best friends. However, the most security-savvy minds out there do warn that containers do not contain. But you still at least benefit from some resource isolation without having to know the internals of your programs. Unprivileged containers are strongly recommended. But otherwise, VMs are still cool.

And now, the cases: I want to run a program in a way that is immune to malicious exploitation: since that means any attack, known or unknown, nothing is guaranteed to satisfy your requirement (every security framework should tell you this). But at least, restricting your program to the fewest possible system calls reduces its attack surface, although you can still be attacked from anywhere even if it's a 1% of the possible ways. However, if stripping privileges off applications to make them less useful to the attacker, system call filtering and capability systems will be helpful. Seccomp (available on Linux), Capsicum (available on FreeBSD, and soon on Linux), and POSIX capabilities are options here. I want to restrict my programs to known/expected behavior: easy (for simple programs) - if you can define runtime behavior in terms of accessed files or kernel objects, then MAC frameworks can help you (AppArmor, SELinux, ...). However, you will also have to make choices as to what level of abstraction you want to express your policies, so preciseness and flexibility will be pulling you in different directions (are paths precise enough, are inodes manageable for you, what about memory segments?). I often find myself needing to run programs with elevated privileges and worry it may be too risky: we've all been there. Dropping capabilities is probably the most reasonable option. It is OS-specific (and so are all the others), but at least implementation-independent if done as part of access control on a given machine (as opposed to invoking the capabilities interface programmatically, which will depend on available bindings in your language).'''

https://security.stackexchange.com/questions/196881/docker-when-to-use-apparmor-vs-seccomp-vs-cap-drop

Linux Security Modules (LSM)

Linux Security Modules (LSM) is a framework that allows the Linux kernel to support a variety of computer security models while avoiding favouritism toward any single security implementation. AppArmor, SELinux, Smack, and TOMOYO Linux are the currently accepted modules in the official kernel.

Mandatory Access Control (MAC)

A type of access control by which the operating system or database constrains the ability of a subject or initiator to access or generally perform some sort of operation on an object or target.[1] In the case of operating systems, a subject is usually a process or thread; objects are constructs such as files, directories, TCP/UDP ports, shared memory segments, IO devices, etc. Subjects and objects each have a set of security attributes. Whenever a subject attempts to access an object, an authorization rule enforced by the operating system kernel examines these security attributes and decides whether the access can take place. Any operation by any subject on any object is tested against the set of authorization rules (aka policy) to determine if the operation is allowed. A database management system, in its access control mechanism, can also apply mandatory access control; in this case, the objects are tables, views, procedures, etc.

With mandatory access control, this security policy is centrally controlled by a security policy administrator; users do not have the ability to override the policy and, for example, grant access to files that would otherwise be restricted. By contrast, discretionary access control (DAC), which also governs the ability of subjects to access objects, allows users the ability to make policy decisions and/or assign security attributes. (The traditional Unix system of users, groups, and read-write-execute permissions is an example of DAC.) MAC-enabled systems allow policy administrators to implement organization-wide security policies. Under MAC (and unlike DAC), users cannot override or modify this policy, either accidentally or intentionally. This allows security administrators to define a central policy that is guaranteed (in principle) to be enforced for all users.

Mandatory Access Controls

A MAC is a framework for defining what a program can and cannot do, on a whitelist basis. A program is represented as a subject. Anything the program wants to act on, such as a file, path, network interface, or port is represented as an object. The rules for accessing the object are called the permission, or flag. Take the AppArmor policy for the ping utility, with added comments:

#include <tunables/global>

/bin/ping {
  # use header files containing more rules
  #include <abstractions/base>
  #include <abstractions/consoles>
  #include <abstractions/nameservice>

  capability net_raw,  # allow having CAP_NET_RAW
  capability setuid,   # allow being setuid
  network inet raw,    # allow creating raw sockets

  /bin/ping mixr,      # allow mmaping, executing, and reading
  /etc/modules.conf r, # allow reading
}

With this policy in place, the ping utility, if compromised, cannot read from your home directory, execute a shell, write new files, etc. This kind of sandboxing is used for securing a server or workstation. Other than AppArmor, some popular MACs include SELinux, TOMOYO, and SMACK. These are typically implemented in the kernel as a Linux Security Module, or LSM. This is a subsystem under Linux that provides modules with hooks for various actions (like changing credentials and accessing objects) so they can enforce a security policy.

Discretionary access Control (DAC)

A type of access control defined by the Trusted Computer System Evaluation Criteria[1] "as a means of restricting access to objects based on the identity of subjects and/or groups to which they belong. The controls are discretionary in the sense that a subject with a certain access permission is capable of passing that permission (perhaps indirectly) on to any other subject (unless restrained by mandatory access control)".

Discretionary access control is commonly discussed in contrast to mandatory access control (MAC). Occasionally a system as a whole is said to have "discretionary" or "purely discretionary" access control as a way of saying that the system lacks mandatory access control. On the other hand, systems can be said to implement both MAC and DAC simultaneously, where DAC refers to one category of access controls that subjects can transfer among each other, and MAC refers to a second category of access controls that imposes constraints upon the first.

Kernel Security Tools

Namespaces : Isolates neighboring processes from each other. It limits what a container can see, and thus prevents attacks from spreading.
cgroups: Limits the resources used by a container. Restricts what a container can use, and thus prevents infected containers from hogging all resources.
seccomp: is a Linux security feature that reduces kernel attack surface area which allows processes to interact with the kernel in a “secure” state, where it can only make a few commands. If it goes beyond these commands, the process is killed.
AppArmor: Enables access controls on processes. Can be set to enforce policies, or merely report on policy violations.
SELinux: Provides access control to the kernel. It enforces mandatory access control (MAC), which controls how containers access the kernel based on policies.

OR

Seccomp reduces the chance that a kernel vulnerability will be successfully exploited.
AppArmor/SELinux prevents an application from accessing files it should not access.
Capability dropping reduces the damage a compromised privileged process can do.

Other

A chroot is a *nix feature that allows setting a new path as the root directory for a given program, forcing it to see everything as relative to that path. This is not usually used for security, since a privileged program can often escape a chroot, and because it does not isolate IPC or networking, allowing even unprivileged processes to do mischief like killing other processes. In a touch, it can be used to augment other security techniques. It is very useful for preventing an application from doing accidental damage, and for giving legacy software a view of the filesystem that it expects.

Chrooting bash, for example, would involve putting any executables and libraries it needs into the new directory, and running the chroot utility (which itself just calls the syscall of the same name):

host ~ # ldd /bin/bash
        linux-vdso.so.1 (0x0000036b3fb5a000)
        libreadline.so.6 => /lib64/libreadline.so.6 (0x0000036b3f6e5000)
        libncurses.so.6 => /lib64/libncurses.so.6 (0x0000036b3f47e000)
        libc.so.6 => /lib64/libc.so.6 (0x0000036b3f0bc000)
        /lib64/ld-linux-x86-64.so.2 (0x0000036b3f938000)
host ~ # ldd /bin/ls
        linux-vdso.so.1 (0x000003a093481000)
        libc.so.6 => /lib64/libc.so.6 (0x000003a092e9d000)
        /lib64/ld-linux-x86-64.so.2 (0x000003a09325f000)
host ~ # mkdir -p newroot/{lib64,bin}
host ~ # cp -aL /lib64/{libreadline,libncurses,libc}.so.6 newroot/lib64
host ~ # cp -aL /lib64/ld-linux-x86-64.so.2 newroot/lib64
host ~ # cp -a /bin/{bash,ls} newroot/bin
host ~ # pwd
/root
host ~ # chroot newroot /bin/bash
bash-4.3# pwd
/
bash-4.3# ls
bin  lib64
bash-4.3# ls /bin
bash  ls
bash-4.3# id
bash: id: command not found

Only a process with the CAP_SYS_CHROOT capability is able to enter a chroot. This is necessary to prevent a malicious program from creating its own copy of /etc/passwd in a directory it controls, and chrooting into it with a setuid program like su, tricking the binary into giving them root.

A hypervisor is virtualization software which usually leverages hardware features that allow isolating all system resources, such as CPU cores, memory, hardware, etc. A virtualized system believes not just that it has root, but that it has ring 0 (kernelmode).
Containers are similar to hypervisors, but rather than using virtualization, they use namespaces. Each container has every resource put in its own namespace, allowing every container to run an independent operating system. The init process on the container sees itself as PID 1 running as root, but the host sees it as just another non-init and non-root PID.

Based on what they do:

Basic sandboxing: seccomp
Sandboxing with policies: seccomp-bpf
Mandatory access control systems: SELinux, AppArmor
System auditing: Auditd
Behavioral monitoring: Falco

Overall, these products can be grouped into ones focused on enforcement vs auditing. Both groups define a policy that describes the allowed or disallowed behavior for a process, in terms of system calls, their arguments, and host resources accessed. Enforcement tools use the policy to change the behavior of a process by preventing system calls from succeeding, or in some cases, killing the process. Seccomp, seccomp-bpf, SELinux, and AppArmor are examples of enforcement tools. Auditing tools use the policy to monitor the behavior of a process and notify when its behavior steps outside the policy. Auditd and Falco are examples of auditing tools. (Falco does allow taking actions on alerts via its command execution notification channel, so it has limited enforcement capabilities, but it is not intended to be used as an enforcement tool).

Sandboxing

At its most basic, sandboxing is a technique to minimize the effect a program will have on the rest of the systems in the case of malice or malfunction. This can be for testing or for enhancing the security of a system. The reason one might want to use a sandbox also varies, and in some cases it is not even related to security, for example in the case of OpenBSD's systrace. The main uses of a sandbox are:

Program testing to detect broken packages, especially during builds.
Malware analysis to understand behavior of malicious software.
Securing untrusted or unsafe applications to minimize damage they can do.

There are many sandboxing techniques, all with differing threat models. Some may just reduce attack surface area by limiting APIs that can be used, while others define access controls using formalized models similar to Bell-LaPadula or Biba.

Resource

Related: #112 #427

anitsh commented 3 years ago

Linux Capabilities

[ ] What is it?
[ ] How Linux Capability Works?

Capabilities break up root privileges in smaller units, so root access is no longer needed. Most of the binaries that have a setuid flag, can be changed to use capabilities instead. They are maintained by the kernel.

Security of Linux systems and applications can be greatly improved by using hardening measures. One of these measures is called Linux capabilities. Capabilities are supported by the kernel for some while now. Using capabilities we can strengthen applications and containers.

Capabilities are a great way to split up root permissions and hand out some permissions to non-privileged users. Unfortunately, still many binaries have the setuid bit set, while they should be replaced with capabilities instead.

Normally the root user (or any ID with UID of 0) gets a special treatment when running processes. The kernel and applications are usually programmed to skip the restriction of some activities when seeing this user ID. In other words, this user is allowed to do (almost) anything.

Linux capabilities provide a subset of the available root privileges to a process. This effectively breaks up root privileges into smaller and distinctive units. Each of these units can then be independently be granted to processes. This way the full set of privileges is reduced and decreasing the risks of exploitation.

Capabilities can be thought of as broad classes of privileged functionality that can be selectively removed from a process or user. The specific functions that have capability checks vary from kernel version to kernel version, and there is often bickering between kernel developers over whether or not a given function should require capabilities to run. Generally, reducing capabilities from a process improves security by reducing the number of privileged actions it can perform. Note that some capabilities are considered root-equivalent, meaning that, even if you disable all other capabilities, they can, in some conditions, be used to regain full permissions.


 cat /proc/sys/kernel/cap_last_cap # See the highest capability number for your kernel. The number of capabilities supported by recent Linux versions is close to 40.

# View  list of available Linux capabilities for the active kernel.
 capsh --print 

# View current user's capabilities. Command should return 5 capabilities with hexadecimal numbers.
 cat /proc/{HIGHEST_CAPABILITY_NUMBER}/status | grep Cap 

    # CapInh = Inherited capabilities
    # CapPrm = Permitted capabilities
    # CapEff = Effective capabilities
    # CapBnd = Bounding set
    # CapAmb = Ambient capabilities set

 # Decode hexadecimal number into the capabilities name.
  capsh --decode=0000003fffffffff

# See the capabilities of a running process.
# getpcaps tool uses the capget() system call to query the available capabilities for a particular thread. This system call only needs to provide the PID to obtain more information.
 getpcaps PROCESS_ID

# See the capabilities of a set of processes that have a relationship.
 getpcaps $(pgrep nginx)

# Drop ping capability.
 capsh --drop=cap_net_raw --print -- -c "/bin/ping -c 1 localhost"

Resources

anitsh commented 3 years ago

Namespace

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple spaces. Examples of such resources are process IDs, hostnames, user IDs, file names, and some names associated with network access, and interprocess communication. Namespaces are a fundamental aspect of containers on Linux. The term "namespace" is often used for a type of namespace (e.g. process ID) as well as for a particular space of names. A Linux system starts out with a single namespace of each type, used by all processes. Processes can create additional namespaces and join different namespaces. Namespaces are created with the "unshare" command or syscall, or as new flags in a "clone" syscall. Namespaces do not restrict access to physical resources such as CPU, memory and disk. That access is metered and restricted by a kernel feature called ‘cgroups’.

Types

Since kernel version 5.6, there are 8 kinds of namespaces. Namespace functionality is the same across all kinds: each process is associated with a namespace and can only see or use the resources associated with that namespace, and descendant namespaces where applicable. This way each process (or process group thereof) can have a unique view on the resources. Which resource is isolated depends on the kind of namespace that has been created for a given process group.

There are 7 namespaces supported under Linux currently:

cgroup: Cgroup root directory
IPC: Isolates the System V inter-process communication between namespaces System V IPC and POSIX message queues
Network: Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.[. Network interfaces, stacks, ports, etc
Mount: Allows creating a different file system layout, or making certain mount points read-only. Mountpoints, similar in function to a chroot
PID: The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces, albeit with different PID numbers.
User: Isolates the user and group IDs between namespaces.
UTS: namespace allows changing the hostname and domain name.

OR

Mount - isolate filesystem mount points UTS - isolate hostname and domainname IPC - isolate interprocess communication (IPC) resources PID - isolate the PID number space Network - isolate network interfaces User - isolate UID/GID number spaces Cgroup - isolate cgroup root directory

An example of PID namespaces using the unshare utility:

host ~ # echo $$
25688
host ~ # unshare --fork --pid
host ~ # echo $$
1
host ~ # logout
host ~ # echo $$
25688

While these can be used to augment sandboxing or even be used as an integral part of a sandbox, some of them can reduce security. User namespaces, when unprivileged (the default), expose a much greater attack surface area from the kernel. Many kernel vulnerabilities are exploitable by unprivileged processes when the user namespace is enabled. On some kernels, you can disable unprivileged user namespaces by setting kernel.unprivileged_userns_clone to 0.

Implementation Details

The kernel assigns each process a symbolic link per namespace kind in /proc//ns/. The inode number pointed to by this symlink is the same for each process in this namespace. This uniquely identifies each namespace by the inode number pointed to by one of its symlinks.

Reading the symlink via readlink returns a string containing the namespace kind name and the inode number of the namespace.

Syscalls Three syscalls can directly manipulate namespaces:

clone, flags to specify which new namespace the new process should be migrated to.

unshare, allows a process (or thread) to disassociate parts of its execution context that are currently being shared with other processes (or threads)

setns, enters the namespace specified by a file descriptor.

Destruction If a namespace is no longer referenced, it will be deleted, the handling of the contained resource depends on the namespace kind. Namespaces can be referenced in three ways:

by a process belonging to the namespace

by an open filedescriptor to the namespace's file (/proc//ns/)

a bind mount of the namespace's file (/proc//ns/)

Usage

Various container software use Linux namespaces in combination with cgroups to isolate their processes, including Docker[12] and LXC. Other applications, such as Google Chrome make use of namespaces to isolate its own processes which are at risk from attack on the internet.[13] There is also an unshare wrapper in util-linux. An example to its use is SHELL=/bin/sh unshare --fork --pid chroot "${chrootdir}" "$@"

A process can be created in Linux by the fork(), clone() or vclone() system calls. In order to support namespaces, 6 flags (CLONE_NEW*) were added. These flags (or a combination of them) can be used in clone() or unshare() system calls to create a namespace.

Resource

[ ] https://en.wikipedia.org/wiki/Linux_namespaces
[ ] https://medium.com/@teddyking/namespaces-in-go-basics-e3f0fc1ff69a Great article that describes all 5 out 7 types of namespaces.
[ ] https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces

anitsh commented 3 years ago

cgroups (abbreviated from control groups)

It is a type of namespace that hides the identity of the control group of which process is a member. A process in such a namespace, checking which control group any process is part of, would see a path that is actually relative to the control group set at creation time, hiding its true control group position and identity.

cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

It is a collection of processes that are bound by the same criteria and associated with a set of parameters or limits. These groups can be hierarchical, meaning that each group inherits limits from its parent group. The kernel provides access to multiple controllers (also called subsystems) through the cgroup interface;[2] for example, the "memory" controller limits memory use, "cpuacct" accounts CPU usage, etc.

Control groups can be used in multiple ways:

By accessing the cgroup virtual file system manually.
By creating and managing groups on the fly using tools like cgcreate, cgexec, and cgclassify (from libcgroup).
Through the "rules engine daemon" that can automatically move processes of certain users, groups, or commands to cgroups as specified in its configuration.
Indirectly through other software that uses cgroups, such as Docker, Firejail, LXC, libvirt, systemd, Open Grid Scheduler/Grid Engine, and Google's developmentally defunct lmctfy.

cgroups provides:

Resource limiting: Groups can be set to not exceed a configured memory limit, which also includes the file system cache.
Prioritization: Some groups may get a larger share of CPU utilization or disk I/O throughput.
Accounting: Measures a group's resource usage, which may be used, for example, for billing purposes.
Control: Freezing groups of processes, their check-pointing and restarting.

Kernfs

Kernfs is basically created by splitting off some of the sysfs logic into an independent entity, thus easing for other kernel subsystems the implementation of their own virtual file system with handling for device connect and disconnect, dynamic creation and removal, and other attributes.

Kernel memory control groups (kmemcg)

The kmemcg controller can limit the amount of memory that the kernel can utilize to manage its own internal processes.

anitsh commented 3 years ago

Seccomp

Seccomp (short for secure computing mode) is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit(), sigreturn(), read() and write() to already-open file descriptors. Should it attempt any other system calls, the kernel will terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

seccomp mode is enabled via the prctl system call using the PR_SET_SECCOMP argument, or (since Linux kernel 3.17) via the seccomp(2) system call. seccomp mode used to be enabled by writing to a file, /proc/self/seccomp, but this method was removed in favor of prctl(). In some kernel versions, seccomp disables the RDTSC x86 instruction, which returns the number of elapsed processor cycles since power-on, used for high-precision timing.[6]

seccomp-bpf is an extension to seccomp[7] that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and Linux.[8] (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.)

Some consider seccomp comparable to OpenBSD pledge and FreeBSD capsicum.

There are two types of seccomp: mode 1 (strict) and mode 2 (filter). Mode 1 is extremely restrictive and, once enabled, only allows four syscalls. These syscalls are read(), write(), exit(), and rt_sigreturn(). A process is immediately sent the fatal SIGKILL signal from the kernel if it ever attempts to use a syscall that is not on the whitelist. This mode is the original seccomp mode and does not require generating and sending eBPF bytecode to the kernel. A special syscall is made, after which mode 1 will be active for the lifetime of the process: seccomp(SECCOMP_SET_MODE_STRICT) or prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT). Once active, it cannot be turned off.

Resource

anitsh commented 3 years ago

AppArmor

AppArmor ("Application Armor") is a Linux kernel security module that allows the system administrator to restrict programs' capabilities with per-program profiles. Profiles can allow capabilities like network access, raw socket access, and the permission to read, write, or execute files on matching paths. AppArmor supplements the traditional Unix discretionary access control (DAC) model by providing mandatory access control (MAC). It has been partially included in the mainline Linux kernel since version 2.6.36 and its development has been supported by Canonical since 2009.

AppArmor gives you network application security via mandatory access control for programs, protecting against the exploitation of software flaws and compromised systems.

AppArmor consists of several different parts:

binutils/ source for basic utilities written in compiled languages
changehat/ source for using changehat with Apache, PAM and Tomcat
common/ common makefile rules
desktop/ empty
kernel-patches/ compatibility patches for various kernel versions
libraries/ libapparmor source and language bindings
parser/ source for parser/loader and corresponding documentation
profiles/ configuration files, reference profiles and abstractions
tests/ regression and stress testsuites
utils/ high-level utilities for working with AppArmor

Resource

anitsh commented 3 years ago

Security-Enhanced Linux (SELinux)

SELinux can potentially control which activities a system allows each user, process, and daemon, with very precise specifications. It is used to confine daemons such as database engines or web servers that have clearly defined data access and activity rights. This limits potential harm from a confined daemon that becomes compromised.

Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies, including mandatory access controls (MAC).

SELinux is a set of kernel modifications and user-space tools that have been added to various Linux distributions. Its architecture strives to separate enforcement of security decisions from the security policy, and streamlines the amount of software involved with security policy enforcement

SELinux features include:

Clean separation of policy from enforcement
Well-defined policy interfaces
Support for applications querying the policy and enforcing access control (for example, crond running jobs in the correct context)
Independence of specific policies and policy languages
Independence of specific security-label formats and contents
Individual labels and controls for kernel objects and services
Support for policy changes
Separate measures for protecting system integrity (domain-type) and data confidentiality (multilevel security)
Flexible policy
Controls over process initialization and inheritance, and program execution
Controls over file systems, directories, files, and open file descriptors
Controls over sockets, messages, and network interfaces
Controls over the use of "capabilities"
Cached information on access-decisions via the Access Vector Cache (AVC)
Default-deny policy (anything not explicitly specified in the policy is disallowed)

Command-line utilities include: chcon, restorecon, restorecond, runcon, secon, fixfiles, setfiles, load_policy, booleans, getsebool, setsebool, togglesebool, setenforce, semodule, postfix-nochroot, check-selinux-installation, semodule_package, checkmodule, selinux-config-enforcing, selinuxenabled, and selinux-policy-upgrade

Comparison with AppArmor

SELinux represents one of several possible approaches to the problem of restricting the actions that installed software can take. Another popular alternative is called AppArmor and is available on SUSE Linux Enterprise Server (SLES), openSUSE, and Debian-based platforms. AppArmor was developed as a component to the now-defunct Immunix Linux platform. Because AppArmor and SELinux differ radically from one another, they form distinct alternatives for software control. Whereas SELinux re-invents certain concepts to provide access to a more expressive set of policy choices, AppArmor was designed to be simple by extending the same administrative semantics used for DAC up to the mandatory access control level.

There are several key differences:

One important difference is that AppArmor identifies file system objects by path name instead of inode. This means that, for example, a file that is inaccessible may become accessible under AppArmor when a hard link is created to it, while SELinux would deny access through the newly created hard link. As a result, AppArmor can be said not to be a type enforcement system, as files are not assigned a type; instead, they are merely referenced in a configuration file.
SELinux and AppArmor also differ significantly in how they are administered and how they integrate into the system.
Since it endeavors to recreate traditional DAC controls with MAC-level enforcement, AppArmor's set of operations is also considerably smaller than those available under most SELinux implementations. For example, AppArmor's set of operations consist of: read, write, append, execute, lock, and link.[34] Most SELinux implementations will support numbers of operations orders of magnitude more than that. For example, SELinux will usually support those same permissions, but also includes controls for mknod, binding to network sockets, implicit use of POSIX capabilities, loading and unloading kernel modules, various means of accessing shared memory, etc.
There are no controls in AppArmor for categorically bounding POSIX capabilities. Since the current implementation of capabilities contains no notion of a subject for the operation (only the actor and the operation) it is usually the job of the MAC layer to prevent privileged operations on files outside the actor's enforced realm of control (i.e. "Sandbox"). AppArmor can prevent its own policy from being altered, and prevent file systems from being mounted/unmounted, but does nothing to prevent users from stepping outside their approved realms of control.
For example, it may be deemed beneficial for help desk employees to change ownership or permissions on certain files even if they don't own them (for example, on a departmental file share). You obviously don't want to give the user(s) root on the box so you give them CAP_FOWNER or CAP_DAC_OVERRIDE. Under SELinux you (or your platform vendor) can configure SELinux to deny all capabilities to otherwise unconfined users, then create confined domains for the employee to be able to transition into after logging in, one that can exercise those capabilities, but only upon files of the appropriate type.[citation needed]
There is no notion of multilevel security with AppArmor, thus there is no hard BLP or Biba enforcement available.[citation needed].
AppArmor configuration is done using solely regular flat files. SELinux (by default in most implementations) uses a combination of flat files (used by administrators and developers to write human readable policy before it's compiled) and extended attributes.
SELinux supports the concept of a "remote policy server" (configurable via /etc/selinux/semanage.conf) as an alternative source for policy configuration. Central management of AppArmor is usually complicated considerably since administrators must decide between configuration deployment tools being run as root (to allow policy updates) or configured manually on each server.

Resource

anitsh commented 3 years ago

Smack (Simplified Mandatory Access Control Kernel)

Smack (Simplified Mandatory Access Control Kernel) is a Linux kernel security module that protects data and process interaction from malicious manipulation using a set of custom mandatory access control (MAC) rules, with simplicity as its main design goal.

TOMOYO Linux

TOMOYO Linux is a Mandatory Access Control (MAC) implementation for Linux that can be used to increase the security of a system, while also being useful purely as a system analysis tool. It focuses on the behaviour of a system. Every process is created to achieve a purpose, and like an immigration officer, TOMOYO Linux allows each process to declare behaviours and resources needed to achieve their purpose. When protection is enabled, TOMOYO Linux acts like an operation watchdog, restricting each process to only the behaviours and resources allowed by the administrator.

The main features of TOMOYO Linux include:

System analysis
Increased security through Mandatory Access Control
Tools to aid in policy generation
Simple syntax
Easy to use
Very few dependencies
Requires no modification of existing binaries

Sysdig

Csysdig is Sysdig's new curses UI. Think of it as strace + htop + Lua + but with history, output customization, drill down capability and incredible container support.
https://www.youtube.com/watch?v=UJ4wVrbP-Q8

Falco, the open-source cloud-native runtime security project, is the de facto Kubernetes threat detection engine. Falco detects unexpected application behavior and alerts on threats at runtime. Falco requires a driver to listen to the Linux Kernel. This driver can either be:

Extended Berkeley Packet Filter (eBPF) probe - a secure mechanism to run user code in the kernel
Open-source kernel module This unique instrumentation allows Falco to have deep visibility into all syscall activity (ex. security events, commands, connections etc.) Falco natively integrates with Kubernetes API audit logs to alert on suspicious orchestrator activity. For cloud environments, Falco also ingests cloud audit logs to provide threat detection and alerting. By adding Kubernetes and cloud application context, teams can understand exactly who did what. https://archive.fosdem.org/2017/schedule/event/container_spawned_shell

anitsh commented 3 years ago

pivot_root allows you to set a new root filesystem for the calling process. I.e. it allows you to change what / is. It does this by mounting the current root filesystem somewhere else while simultaneously mounting some new root filesystem on /. Once the previous root has been moved, it is then possible to umount it. Thus we have a mechanism for 'clearing' the hosts's mounts from inside a new Mount namespace - we simply pivot away and then umount them!

anitsh commented 3 years ago

eBPF #256

eBPF can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules.

anitsh / til