V3ntus / nixpkgs

Nix Packages collection & NixOS
MIT License
0 stars 1 forks source link

Package request: Wazuh-agent #1

Open V3ntus opened 5 months ago

V3ntus commented 5 months ago

Moved from: https://github.com/NixOS/nixpkgs/issues/230623

sjdwhiting commented 5 months ago

Huzzuh! (Wazuh?)

Ok so yea, I think I mentioned already that I'm also learning Nix as I go here.

What command are you running to manually start the service?

V3ntus commented 5 months ago

Right now, after the ExecPreStart finishes copying the build result to /var/ossec, you can do /var/ossec/bin/wazuh-control start. But since the service is broken, I would clone the repo and run nix-build ./pkgs/tools/security/wazuh/default.nix inside it. See https://github.com/nealfennimore/nixpkgs/commit/b2e53bde9c411ef03345bed5934fcc9da2d7d6c4#commitcomment-140938027 to reproduce

V3ntus commented 5 months ago

5125b09a46f8442965591e5e97cea504309dd7ca should make the Wazuh service build successfully. My VM ran out of space so I'll try again tomorrow

sjdwhiting commented 5 months ago

Tested it this morning. It builds but does not start. After getting it up and running the manual way yesterday, I reverted back to setting it up in configuration.nix to test.

For clarity, I'm sourcing the wazuh-agent branch here as my nixpkgs and then I am simply adding the following to my configuration.nix

services.wazuh.agent.enable = true;
services.wazuh.agent.managerIP = <IP>;

Here is the error:

[sebastian@nixos:~]$ systemctl status wazuh-agent.service 
× wazuh-agent.service - Wazuh agent
     Loaded: loaded (/etc/systemd/system/wazuh-agent.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2024-04-18 10:02:14 CDT; 10s ago
    Process: 96965 ExecStartPre=/nix/store/q17hsdfh1rz63qp3xidr4b4jj0qzxqbl-unit-script-wazuh-agent-pre-start/bin/wazuh-agent-pre-start (code=exited, status=200/CHDIR)
         IP: 0B in, 0B out
        CPU: 457us

Apr 18 10:02:14 nixos systemd[1]: Starting Wazuh agent...
Apr 18 10:02:14 nixos (re-start)[96965]: wazuh-agent.service: Changing to the requested working directory failed: No such file or directory
Apr 18 10:02:14 nixos systemd[1]: wazuh-agent.service: Control process exited, code=exited, status=200/CHDIR
Apr 18 10:02:14 nixos systemd[1]: wazuh-agent.service: Failed with result 'exit-code'.
Apr 18 10:02:14 nixos systemd[1]: Failed to start Wazuh agent.

It is upset trying to change directories so I'll be looking at that next.

V3ntus commented 5 months ago

It might be related to this? I don't think this line is important anyways https://github.com/V3ntus/nixpkgs/blob/fa43f33ebdb5a091a6012afb87d09bfdbde5b7dd/nixos/modules/services/security/wazuh/wazuh.nix#L98

Yeah I think it's trying to cd into the working directory, but in your case, it hasn't run the preStart yet that contains the code to create that directory and copy stuff into it. Thus the directory doesn't exist and the service fails to start

sjdwhiting commented 5 months ago

Yea, I think that makes sense. I added some debugging statements and they don't ever run so its definitely like preStart isn't actually running. I'm going to try killing that line.

I got mixed up for a bit by my system config too. Forgot to run nix flake update and was very confused by why my changes weren't showing up.

sjdwhiting commented 5 months ago

That did the trick, kind of. Now the preStart script is running into permission errors:

mkdir: cannot create directory ‘/var/ossec’: Permission denied

V3ntus commented 5 months ago

Weird. Guess I'll have to wipe my VM and start fresh with debugging

V3ntus commented 5 months ago

Starting from a fresh VM on a NixOS host, removing the WorkingDirectory option fixed that first error of not being able to cd into it, but nothing about denied permissions.

I am still getting wazuh-execd did not start, and it's possible it could be this? https://github.com/wazuh/wazuh/issues/15640

Edit: Nope, ps is available to systemd image

sjdwhiting commented 5 months ago

I'm getting the same error. I also made a small change to my fork that alleviated permission errors. It uses systemd.tmpfiles.rules.

So the issue seems to that wazuh-control can't sort out the pid of wazuh-execd in the context of the systemd service since starting the binary manually works. And each attempt to start the service results in another instance of wazuh-execd.

V3ntus commented 5 months ago

I read up on systemd tempfiles a bit, sounds like a worthy solution for the state directory permission issues you were having. I suppose we can consider the state directory volatile then?

Seems like the appropriate debugging to narrow down the cause of wazuh-control not sorting out the process must happen here: https://github.com/wazuh/wazuh/blob/3bf19121e8604c99566fc5e78267648a5161b062/src/init/wazuh-client.sh#L165-L182 And here: https://github.com/wazuh/wazuh/blob/3bf19121e8604c99566fc5e78267648a5161b062/src/init/wazuh-client.sh#L195-L222

sjdwhiting commented 5 months ago

The info on the implementation of tempfiles seems a bit vague and poorly documented. I came across a number of posts where people used them just fine for a persistent state directory and said it persists on reboot. Downside is if you manually deleted it, restarting the service doesn't restore it and from what I can tell neither does running nixos-rebuild switch

So I've just been using sed to slowly add debug statements into the wazuh-control script and suddenly it started working... I have no idea. Maybe we need to insert some sleep/wait statements to slow it down?

Looking at number of times it iterated, i'm pretty sure it is some sort of race condition and the insertion of extra statements gave it enough breathing room to work.

Screenshot 2024-04-19 at 3 03 20 PM

V3ntus commented 5 months ago

LOL wow nice job! I have no idea how sed caused it to work, it adds hardly any overhead. That's awesome though!

sjdwhiting commented 5 months ago

Well not sed itself but the execution of the debug statements I put in using it. Maybe those fractions of milliseconds added up haha.

V3ntus commented 5 months ago

Well if you'd like, definitely recommend eventually putting an issue up on https://github.com/wazuh/wazuh/issues. If anything, just to get them aware of this weird behavior and what their input is.

V3ntus commented 5 months ago

Weird logs from wazuh-agentd doing some experimenting: image preStart script looks like this:

cp -rf ${pkg}/* ${stateDir}
touch /tmp/wazuhpwd
sed -i '204i ls $\{DIR}/var/run/$\{pfile}-*.pid 2>&1 | tee /tmp/wazuhpwd; echo $? >> /tmp/wazuhpwd' ${stateDir}/bin/wazuh-control

find ${stateDir} -type f -exec chmod 644 {} \;
find ${stateDir} -type d -exec chmod 750 {} \;
chmod u+x ${stateDir}/bin/*
chmod u+x ${stateDir}/active-response/bin/*
chown -R ${wazuhUser}:${wazuhGroup} ${stateDir}
V3ntus commented 5 months ago

Somehow /var/ossec/etc/client.keys is being overwritten or used as a log: image

sjdwhiting commented 5 months ago

That is pretty odd. This is my current preStart which I updated today. Took some trial and error but replaced the debugging statements with sleep statements that seem to have kept it working. I also checked out that file on my machine and no such issues.

        cp -rf ${pkg}/* ${stateDir}
        sed -i '12i sleep 0.1s;' ${stateDir}/bin/wazuh-control

        sed -i '209i sleep 0.2s ' ${stateDir}/bin/wazuh-control

        find ${stateDir} -type f -exec chmod 644 {} \;
        find ${stateDir} -type d -exec chmod 750 {} \;
        chmod u+x ${stateDir}/bin/*
        chmod u+x ${stateDir}/active-response/bin/*
        chown -R ${wazuhUser}:${wazuhGroup} ${stateDir}
V3ntus commented 5 months ago

I still find it kinda funny that adding more sleep statements have been making this work knowing that wazuh-control already has sleep statements in that same loop lol.

I tried to replicate with your previous working iteration, but I'll try with your new iteration.

FWIW, I'm now using a NixOS host and building VMs declaratively referencing this article: https://nix.dev/tutorials/nixos/nixos-configuration-on-vm.html

Script I'm running:

rm -rf ./result nixos.qcow2  # the VM automatically creates the qcow disk to persist state
nix-build '<nixpkgs/nixos>' \
    -A vm \
    -I nixpkgs=$PWD/nixpkgs \  # point this to your nixpkgs repo path
    -I nixos-config=./configuration.nix \  # see below for example config
    --show-trace && \  # debugging flag, optional
    ./result/bin/run-nixos-vm -nographic && \  # run the QEMU VM
    reset  # reset the terminal after the VM shuts off

Example VM config:

{ config, pkgs, ... }:

{
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  # Wazuh stuff I added
  services.wazuh.agent.enable = true;
  services.wazuh.agent.managerIP = "192.168.2.11";

  users.users.joe = {  # change as desired
    isNormalUser = true;
    extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
    packages = with pkgs; [
      git
    ];
    initialPassword = "password";  # change as desired
  };

  system.stateVersion = "23.11";
}

Seems like a proper, immutable testing environment instead of dealing with manual VM stuff. The VM build doesn't take long either.

sjdwhiting commented 5 months ago

Nice, that seems like a solid approach. I have a flake based setup which is spread across multiple files. I'm using my remote git repo as the source for my nixpkgs at the moment.

inputs = {
    nixpkgs = {
      url = "github:sjdwhiting/nixpkgs/wazuh-agent";
    };

    flake-utils.url = "github:numtide/flake-utils";

    users-flake.url = "../../users";
    users-flake.inputs.nixpkgs.follows = "nixpkgs";

    packages-flake.url = "../../systemPackages";
    packages-flake.inputs.nixpkgs.follows = "nixpkgs";

  };

And then I have the configuration.nix which has the Wazuh services setup. So after I push to remote repo, I just run a rebuild. Only downside to that is that I have to push everything to remote so it means a lot more commits and pushes.

But once you get tisi running, we need to decide what is next to get this to a minimum level of functionality. My gut says we need to start assessign what default Wazuh checks do and don't work on a NixOS system. I have a feeling that in a lot instances, they will work since NixOS often symlinks to the nix store from the normal locaiton for a file so Wazuh will still find things.

V3ntus commented 5 months ago

Awesome! Yeah for sure. I can't remember if I posted a related comment, but I agree, we do need to look through the Wazuh default config (it's currently set up for debian/common FHS) and make sure we're linking stuff properly. Still waiting on journald support from Wazuh, that's the big blocker for that task. I believe it's going through their QA currently?

Edit: It's a PR in review https://github.com/wazuh/wazuh/pull/23137

V3ntus commented 5 months ago

I believe I've identified the root cause of wazuh-control not seeing the spawned processes. image

I was able to narrow it down to our usage of the systemd path option. Specifying pkgs.busybox was not the solution after all, we only needed to give it /run/current-system/sw:

systemd.services.wazuh-agent = mkIf cfg.agent.enable {
    path = [
      "/run/current-system/sw"
    ];
    ...
}

Boom, it's registered! image

View the last commit with the fixes here: bcd82816653d8a2910b2cddb5b959228b733e60e

V3ntus commented 5 months ago

I'm putting the branch up for review in this PR: https://github.com/NixOS/nixpkgs/pull/308041

Should allow it to be globally available for others to test and work with while we wait for Wazuh to implement journald support.

sjdwhiting commented 5 months ago

Good work! We should definitely keep it globally available. I don't see a compelling reason to wait for journald support. Yes, it does limit some of the functionality of the agent but there are other ways to ship journald logs to Wazuh. The Wazuh manager is also a SIEM so in the meantime, someone could ship the journald logs using something like fluent-bit/fluentd or filebeat.

sjdwhiting commented 5 months ago

I noticed the build is failing on that PR. I'm getting the same error testing on my machine. Looking into a bit this morning, will update if I find anything.

sjdwhiting commented 5 months ago

Fixed it: https://github.com/V3ntus/nixpkgs/pull/2

V3ntus commented 5 months ago

Interesting, wasn't getting that error, but that makes sense to me. Thanks for spotting it!

sjdwhiting commented 5 months ago

Ok, new issue. The way we are handling the /var/ossec/ folder today is causing it to be wiped out on a reboot which is problematic because the wazuh-agent stores its key at /var/ossec/etc/client.keys.

My guess is that this is because of the preStart script copying the files over so its overwriting them. Maybe we could use something like rysnc instead of cp to avoid wiping out client.keys

V3ntus commented 5 months ago

That is a big design flaw, good catch. Ideally we do want a mutable /var/ossec but some of these can be kept as immutable and declarative, such as the configuration pieces. rsync would work, but in the case of needing to update the entirety of /var/ossec when the Wazuh package is updated, how do we determine what we can and cannot overwrite?

sjdwhiting commented 5 months ago

I got it to work here: https://github.com/V3ntus/nixpkgs/pull/3

While it is possible to take a declarative approach to client.keys, I don't think that makes sense. If someone wanted to use the same config on multiple machines, they would need someway to register dynamically plus in a large environment you wouldn't want to basically manually register them all.

In my opinion, this is a good approach for any dynamically created files that we want to have persistence which shouldn't be many since the manager is responsible for storage of actual data.

Long term, I think we will have to just wait and see what breaks and update the service as needed, either using that same rsync command or something else, to protect those files.

sjdwhiting commented 5 months ago

Today I tested the group functionality. So I created a NixOS group on the manager, enrolled my NixOS machine to it, and then placed a custom agent.conf file into the relevant directory on the manager which would have been /var/ossec/etc/shared/NixOS.

I verified that it pushed to my machine then reverified its presence after both a reboot and rebuild.

Since I didn't protect that file in anyway, my assumption is that the manager likely replaced it as soon as it saw it was missing. That said, I think we don't have to worry too much about most of the files in there since the Manager will likely fix those issues. Time will tell though!

On another note, what do we have left to actually get this merged? I've read through some of the docs on contributing and I get the feeling we are just waiting on someone to review it?

V3ntus commented 5 months ago

Awesome! Good to know that it'll persist configurations through syncing.

I'm not quite sure, I'm definitely waiting and looking for a review from the NixOS community just to make sure things look good.

Also, just so we're on the same page, being listed as a maintainer for wazuh on nixpkgs is ideally opt-in. I think it would be beneficial wazuh to have maintainers listed, so I'll put my name on there, but obviously I did not want to put anyone else's name if they were unwilling. Relevant doc: https://github.com/NixOS/nixpkgs/blob/master/maintainers/README.md

V3ntus commented 4 months ago

Well that wasn't supposed to happen. https://github.com/NixOS/nixpkgs/pull/308041#issuecomment-2096097574

I'll work on squashing and cleaning up this branch, and resubmitting a new PR. And because I don't 100% my git-fu, I made a backup branch just in case lol. https://github.com/V3ntus/nixpkgs/tree/wazuh-agent-pre-rebase

V3ntus commented 4 months ago

New PR up at https://github.com/NixOS/nixpkgs/pull/309573

sjdwhiting commented 4 months ago

Lol.

Yea, I'm down to be on the maintainers list. Is it safe to open up a PR or are we going to blow it up again?

sjdwhiting commented 4 months ago

https://github.com/V3ntus/nixpkgs/pull/4

PR for adding me to the maintainers as well.

V3ntus commented 4 months ago

Yea, I'm down to be on the maintainers list. Is it safe to open up a PR or are we going to blow it up again?

Should be good! Looked to be a flaky workflow that caused it after a terrible go at a force push on my end.

V3ntus commented 4 months ago

Didn't see this. Probably would have to look into using this process with the recent reviews on the Nixpkgs PR. https://documentation.wazuh.com/current/user-manual/reference/unattended-installation.html

V3ntus commented 3 months ago

First 4.9.0 alpha is out which includes the added journald support. Hoping I can get traction back to testing. https://github.com/wazuh/wazuh/releases/tag/v4.9.0-alpha1

SirMysterion commented 1 month ago

Hi, I tried to run your PR as I am verry interested in getting this working for myself as well. I had issues with pkgs/by-name/wa/wazuh/package.nix Line 156 workingDirectory = "${builtins.currentSystem}-src"; Is this due to using Flakes that I just can't seem to get the builtins to work? Updating it to workingDirectory = "v${version}-src"; has worked for me so far to get v4.7.3 installed and working.

I've been Trying to update the Agent to v4.9.0-rc1 with no success yet. they seem to have Added a new Upstream http-requests tarball and removed some of the libdb dependencies for the later build flags.