Sysinternals / SysmonForLinux

MIT License
1.68k stars 180 forks source link

Build for RHEL9 #159

Open ForsetiJan opened 6 months ago

ForsetiJan commented 6 months ago

We have installed sysmonforlinux-1.3.2-0.el8.x86_64.rpm on our AlmaLinux9 machines however (without clear cause) it results in the machine in completely halting anywhere between 1 to 6 hours since start.

The only fix we have found is building SysmonForLinux on AlmaLinux9 and using the resulting RPM. For us to be able to keep using the Microsoft repo I would to request to build for RHEL9 and publish in the repo.

MarioHewardt commented 6 months ago

Hi - thanks for reporting this. Can you add the tail end of the syslog (specifically the sysmon entries)? Also, when you say the machine halts, what are the specific symptoms?

ForsetiJan commented 6 months ago

The system completely halts/freezes; the only way to recover is a system hard reset. The tail doesn't tell us anything out of the ordinary unfortunately.

MarioHewardt commented 6 months ago

What Sysmon config was used? A couple of additional questions:

  1. During the time window, can you check what Sysmon CPU and mem usage is? Does memory creep up?
  2. How busy is the system?
ForsetiJan commented 6 months ago

config.txt

  1. I will have to get back to you on the resource usage; I am not the one managing these systems.
  2. Near idle, it's a clone a of a NTP server without any clients connected to it.

We have been looking for the source of the behavior for a while as it initially seemed to occur when our vulnerability scan ran; however, this was not consistent. In addition we've observed odd behavior where the same config and sysmonforlinux version works fine on RHEL9 but no luck on AL9. At the time we suspected a VMware template issue, but that wasn't it either.

The only solution to have survived past the 24 hour mark has been the RPM build on AL9.

Edit: SysAdmin got back to me. Memory and CPU usage do not gradually increase. At the time of halting the system CPU usage does spike however.

MarioHewardt commented 6 months ago

" In addition we've observed odd behavior where the same config and sysmonforlinux version works fine on RHEL9 but no luck on AL9"

Can you extrapolate on this?

I've been running Sysmon (using RHEL9 package) on Alma 9.3 for about 24hrs and I can't see the behavior you are seeing. It was default configuration though and I've since restarted with the config you provided.

ForsetiJan commented 6 months ago

We have both RHEL9 and AL9 machines in our environment and it seems that this bug does not affect the RHEL9 machines (we have reverted the install on them for now though). We have no explanation for this either!

Potentially there is some interaction with the hypervisor? We are running it on VMware vSphere.

When you mention the RHEL9 package are you referring to sysmonforlinux-1.3.2-0.el8.x86_64.rpm as provided in the Microsoft repository?

MarioHewardt commented 6 months ago

Yes, that is the package I installed. I also installed the dependency: sysinternalsebpf-1.3.0-0.el8.x86_64.rpm. Both came from https://packages.microsoft.com/rhel/9.0/prod/Packages/s/

MarioHewardt commented 6 months ago

In terms of the hypervisor, Sysmon for Linux doesn't do anything special or odd that would cause problems with the hypervisor. I would be good to know what the state of the Sysmon process is at the time (or right before) via metrics (CPU consumption, memory consumption) as well as getting a core dump of the Sysmon process.

matias624 commented 6 months ago

Hi,

We are having this same halting problem on Rocky Linux 9.3 (Blue Onyx) Virtual machine running on VMware. On Rocky Linux 8.9 no problems.

Installed sysmon version from https://packages.microsoft.com/rhel/9.0/prod/Packages/s/: sysmonforlinux.x86_64 1.3.2-0.el8 @packages-microsoft-com-prod

The Virtual Machine completely freezes after the sysmon process has been running for some time. The logs show nothing before the freeze happens.

We have been testing this on one virtual machine and here is pic of CPU spikes (1month): Screenshot 2024-01-24 105935

You can see the in the graph when the spikes are occurring, only way to recover is system reboot.

And memory graph (1 month): image

We have been changing memory and cpu core resources. Patterns we have noticed are: more cpus = more usage during halt. And the more memory, possibly a small reduction in memory usage during halt.

With 16 vcpus and 32gb halting stopped happening, we have been looking for a clear breakpoint when the halting stops, but so far 6cpu and 14gb has not stopped it.

ForsetiJan commented 6 months ago

Glad, but unfortunate, that we are not the only ones.

@matias624 Have your tried running a binary build on AL9? That fixed it for us.

I attached it to save you the hassle. It is based on 1.3.2 sysmonforlinux-0.0.0-0.el9.x86_64.rpm.zip

MarioHewardt commented 6 months ago

I'm also interested if the pattern is the same in terms of running with a local build instead of from package. If you can let us know that'd be great.

MarioHewardt commented 6 months ago

Also, @matias624, which Sysmon configuration are you running and how long does it take before system freezes?

matias624 commented 6 months ago

Hi,

We have been using this configuration on test machine:

<Sysmon schemaversion="4.70">
<EventFiltering>
<!-- Event ID 1 == ProcessCreate. Log all newly created processes -->
<RuleGroup name="" groupRelation="or">
<ProcessCreate onmatch="exclude"/>
</RuleGroup>
<!-- Event ID 3 == NetworkConnect Detected. Log all network connections -->
<RuleGroup name="" groupRelation="or">
<NetworkConnect onmatch="exclude"/>
</RuleGroup>
<!-- Event ID 5 == ProcessTerminate. Log all processes terminated -->
<RuleGroup name="" groupRelation="or">
<ProcessTerminate onmatch="exclude"/>
</RuleGroup>
<!-- Event ID 9 == RawAccessRead. Log all raw access read -->
<RuleGroup name="" groupRelation="or">
<RawAccessRead onmatch="exclude"/>
</RuleGroup>
<!-- Event ID 10 == ProcessAccess. Log all open process operations -->
<RuleGroup name="" groupRelation="or">
<ProcessAccess onmatch="exclude"/>
</RuleGroup>
<!-- Event ID 11 == FileCreate. Log every file creation -->
<RuleGroup name="" groupRelation="or">
<FileCreate onmatch="exclude"/>
</RuleGroup>
<!--Event ID 23 == FileDelete. Log all files being deleted -->
<RuleGroup name="" groupRelation="or">
<FileDelete onmatch="exclude"/>
</RuleGroup>
</EventFiltering>
</Sysmon>

We tried local build on new VM and it got stuck today, after 4 days.

The Virtual machine with package version installed and with 6vcpu and 18Gb ram is still running without freezes.

MarioHewardt commented 6 months ago

Thanks for the config. I installed Alma 9.3 on VMVARE and ran sysmon using default config. It's been running for days without any freezes. I will try Rocky 9.3 but can you tell me the core and memory settings on your VM? Since I haven't been able to reproduce this, I'm curious if it has something to do with VM settings.

matias624 commented 6 months ago

So far any settings below 6vcpu and 18gb of ram has frozen the package installed machine in our VMWare environment on Rocky 9.3. VMware tools have been installed via open-vm-tools package

The built version froze with 1vcpu 4gb and 2vcpu 8gb setups within the past 24 hour

MarioHewardt commented 6 months ago

I've been running with the below for several days now without a repro. The next steps would be to get a kernel dump of the system when the freeze happens and share that with us. To add a little bit of context to Sysmon for Linux, it consists of a user mode part and an eBPF part. Barring any bugs in the kernel and/or eBPF verifier, it shouldn't be able for Sysmon to completely hang the OS. There is likely something else going on with these systems.

Environment:

  1. vmware 17.5
  2. Rocky 9.3
  3. Package install of Sysmon for Linux
  4. 2 vCPI, 16Gb RAM