[Meta] Linux Rootkit Analysis & Rule dev

Aegrah commented 1 year ago

Summary

Although we have many visibility issues in Linux right now, I would like to analyze several root/user mode root kits and write detections for their installation, persistence, defense evasion and connection activities.

### Tasks
- [x] Identify 3 to 5 linux root kits that leverage different persistence and defense evasion methods (done)
- [x] Set up a demo environment to run these root kits (done)
- [x] Analyze these root kits (1 week)
- [x] Write detection rules  (1 week)

PRs created so far

Kernel module load related

[New BBR] Tainted Kernel Module Load [New Rule] Out-Of-Tree Kernel Module Load [New BBR] Kernel Driver Load

Misc.

[New BBR] Reverse Connection through Port Knocking [New Rule] Attempt to Clear Kernel Ring Buffer [New BBR] Pot. Persistence Through Systemd-udevd [New Rule] UID Elevation from Unknown Executable [New BBR] Segfault Detected

Aegrah commented 1 year ago

bds_lkm

Upon detonation, the following alerts were automatically triggered:

Its main functionality via a kernel module:

Kernel Module Load Via insmod

Its persistence via a systemd service:

Suspicious File Creation in /etc for Persistence
New systemd Service Created by Previously Unknown Executable

This rootkit has the functionality to delete the kernel logs through dmesg, the following logic is created to address for this gap (https://github.com/elastic/detection-rules/pull/3217):

process where host.os.type == "linux" and event.action == "exec" and event.type == "start" and process.name == "dmesg" and process.args : "-c"

The root kit privescs (like many other rootkits do) by sending a signal via kill. Created this logic to potentially capture this (depends on how well the writer has dealt with logging):

sequence by host.id with maxspan=1s
  [process where host.os.type == "linux" and event.action == "exec" and event.type == "start" and process.name == "kill"]
  [process where host.os.type == "linux" and event.action == "uid_change" and event.type == "change" and user.id == 0]

The root kit is capable of setting up a reverse network connection / bind shell via port knocking functionality. To detect this, the following logic is written (https://github.com/elastic/detection-rules/pull/3219):

sequence by host.id with maxspan=10s
  [network where host.os.type == "linux" and event.action in ("connection_accepted", "connection_attempted") and 
   event.type == "start" and process.name : "*" and not (
     cidrmatch(destination.ip, "127.0.0.0/8", "169.254.0.0/16", "224.0.0.0/4", "::1") or destination.port in (
       20, 21, 22, 23, 25, 53, 67, 68, 69, 80, 110, 123, 137, 138, 139, 143, 161, 162, 179, 443, 445, 465, 514, 515,
       587,636, 989, 990, 993, 995, 1025, 1026, 1080, 1194, 1433, 1434, 1521, 1701, 1723, 1812, 1813, 2082, 2083, 2086,
       2087, 2095, 2096, 2121, 2483, 2484, 3306, 3389, 3478, 3497, 3544, 3689, 3784, 3785, 389, 3998, 5060, 5061, 5190,
       5222, 5223, 5228, 5432, 5500, 554, 5631, 5632, 5800, 5801, 5900, 5901, 8000, 8008, 8080, 8081, 8443, 8888, 9100,
       9200, 9443, 10000
     ) or source.port in (
       20, 21, 22, 23, 25, 53, 67, 68, 69, 80, 110, 123, 137, 138, 139, 143, 161, 162, 179, 443, 445, 465, 514, 515,
       587, 636, 989, 990, 993, 995, 1025, 1026, 1080, 1194, 1433, 1434, 1521, 1701, 1723, 1812, 1813, 2082, 2083, 2086,
       2087, 2095, 2096, 2121, 2483, 2484, 3306, 3389, 3478, 3497, 3544, 3689, 3784, 3785, 389, 3998, 5060, 5061, 5190,
       5222, 5223, 5228, 5432, 5500, 554, 5631, 5632, 5800, 5801, 5900, 5901, 8000, 8008, 8080, 8081, 8443, 8888, 9100,
       9200, 9443, 10000)
     )
  ] by destination.ip
  [network where host.os.type == "linux" and event.action == "network_flow" and event.type == "connection" and
   source.packets == 1 and flow.final == false and not (
     cidrmatch(destination.ip, "127.0.0.0/8", "169.254.0.0/16", "224.0.0.0/4", "::1") or destination.port in (
       20, 21, 22, 23, 25, 53, 67, 68, 69, 80, 110, 123, 137, 138, 139, 143, 161, 162, 179, 443, 445, 465, 514, 515,
       587,636, 989, 990, 993, 995, 1025, 1026, 1080, 1194, 1433, 1434, 1521, 1701, 1723, 1812, 1813, 2082, 2083, 2086,
       2087, 2095, 2096, 2121, 2483, 2484, 3306, 3389, 3478, 3497, 3544, 3689, 3784, 3785, 389, 3998, 5060, 5061, 5190,
       5222, 5223, 5228, 5432, 5500, 554, 5631, 5632, 5800, 5801, 5900, 5901, 8000, 8008, 8080, 8081, 8443, 8888, 9100,
       9200, 9443, 10000
     ) or source.port in (
       20, 21, 22, 23, 25, 53, 67, 68, 69, 80, 110, 123, 137, 138, 139, 143, 161, 162, 179, 443, 445, 465, 514, 515,
       587, 636, 989, 990, 993, 995, 1025, 1026, 1080, 1194, 1433, 1434, 1521, 1701, 1723, 1812, 1813, 2082, 2083, 2086,
       2087, 2095, 2096, 2121, 2483, 2484, 3306, 3389, 3478, 3497, 3544, 3689, 3784, 3785, 389, 3998, 5060, 5061, 5190,
       5222, 5223, 5228, 5432, 5500, 554, 5631, 5632, 5800, 5801, 5900, 5901, 8000, 8008, 8080, 8081, 8443, 8888, 9100,
       9200, 9443, 10000)
     )
  ] by source.ip

Aegrah commented 1 year ago

The following PRs were created based on rootkit research: https://github.com/elastic/endpoint-rules/pull/2891 https://github.com/elastic/endpoint-rules/pull/2892 https://github.com/elastic/detection-rules/pull/3202

Aegrah commented 1 year ago

Reptile

Upon installation, the following new rule triggered:

Tainted Kernel Module Load

We did not capture the installation of the reptile module as it was loaded through a loader rather than through command line utilities. To address this gap, the following logic is created:

host.os.type:linux and event.dataset:"system.syslog" and process.name:kernel and 
message:"loading out-of-tree module taints kernel."

Additionally, we do see the loaded-kernel-module event through auditd_manager. To capture this, we can propose the following logic (although it looks noisy, only 11 hits in my stack over the last 1 year, and those are all TP kernel module loads, 9/11 of them malicious):

driver where host.os.type == "linux" and event.action == "loaded-kernel-module"

Reptile persists through udev. It creates a rule in /lib/udev/rules.d/ which tells systemd-udevd to execute /lib/udev/reptile on boot. We don't get any logs of the execution. The only way to capture this (I think) is to capture the file creation. However, many benign softwares also leverage this method. I created the following EQL logic, which could be a new_terms KQL rule:

file where host.os.type == "linux" and event.action == "creation" and file.path : "/lib/udev/*" and not process.executable == null and not (
   process.name in ("dockerd", "dpkg", "systemd-hwdb", "podman", "buildah", "exe")
)

While setting up a reverse connection, the following rules triggered:

A network event by a kernel worker thread is abnormal, hence the alert:

Network Activity Detected via Kworker

Reptile leverages port knocking functionality, as such, the rule we created for the previous root kit triggered:

Potential Linux Reverse Connection through Port Knocking

The root kit sets up a network connection as root, hence this rule triggered:

Suspicious Network Connection Attempt by Root

And when running commands, they are executed through kworker as well, hence the following alert triggered:

Shell Command Execution via Kworker

And Reptile creates a file under /dev/ptmx through kworker, hence the following alert should have triggered, but did not due to the change to a sequence. Will revert this to the following logic:

Suspicious File Creation via kworker

file where event.action == "creation" and process.name : "kworker*" and 
not (process.name : "kworker*kcryptd*" or file.path : ("/var/log/*", "/var/crash/*"))

Reverse connection gaps

We did not capture the session escalation to root for the kworker process. To address this gap, the following logic was created:

process where host.os.type == "linux" and event.action == "session_id_change" and event.type == "change" and process.name : "kworker*"

Reptile functionality

Reptile has a functionality to gain root by typing /reptile/reptile_cmd root. This is not detected, looking for logic similar to the following to capture those events:

sequence by host.id, process.entity_id with maxspan=1s
[process where host.os.type == "linux" and event.action == "uid_change" and event.type == "change" and user.name == "root" and process.parent.name in ("bash", "dash", "sh", "tcsh", "csh", "zsh", "ksh", "fish") and not process.name in ("bash", "dash", "sh", "tcsh", "csh", "zsh", "ksh", "fish")]
[process where host.os.type == "linux" and event.action == "exec" and event.type == "start" and  user.name == "root" and process.name in ("bash", "dash", "sh", "tcsh", "csh", "zsh", "ksh", "fish")]

or new terms variant of:

process where host.os.type == "linux" and event.action == "uid_change" and event.type == "change" and user.name == "root" and process.parent.name in ("bash", "dash", "sh", "tcsh", "csh", "zsh", "ksh", "fish") and not process.name in ("bash", "dash", "sh", "tcsh", "csh", "zsh", "ksh", "fish", "sudo", "apt", "squid")

Reptile can hide/unhide files based on prefix through /reptile/reptile_cmd hide/show. We cannot see any logs/syscalls, rather than the execution of the binary. Cannot create a detection for this.

When testing reptile's IP conn hide feature, I ended up segfaulting the kernel module. This generated segfault logs in syslog. This could be an interesting BBR. When writing the logic, the only instances in my stacks where a segfault occured was for testing looney tunables CVE, compiling metasploit shells for wrong architecture a while ago, and the segfault of Reptile:

host.os.type:linux and event.dataset:"system.syslog" and process.name:kernel and message:segfault

Adding a threshold feature to this logic, with a high amount of runs in a short interval, could be an interesting buffer overflow detection DR.

Edit

It seems impossible to use the message field within a threshold rule (nor a sequence rule). Will need to double check.

Aegrah commented 1 year ago

reveng_rtkit

Upon installation, the following DRs triggered:

Kernel Module Load via insmod
Attempt to Clear Kernel Ring Buffer
Tainted Out-Of-Tree Kernel Module Load
Tainted Kernel Module Load
Kernel Driver Load

Upon compilation and running the user_mode section of the root kit, 0 alerts trigger. But that was to be expected.

Other than that, no new alerts were generated / behavior was detected.

Aegrah commented 1 year ago

Finally, I experimented with:

But did not find any additional coverage gaps that we are capable of detecting. Many of the actions conducted by the rootkits remain undetectable, as we see 0 logs, even with auditd_manager, monitoring the syscall level.

Next up, a Meta relating eBPF rootkits could be interesting, but I am afraid we also do not capture the necessary data to detect these either.

brokensound77 commented 11 months ago

is this complete?

Aegrah commented 11 months ago

The last PR related to this Meta has been merged a few minutes ago, will close out the Meta! @brokensound77

elastic / detection-rules