Azure / Azure-Sentinel

Cloud-native SIEM for intelligent security analytics for your entire enterprise.
https://azure.microsoft.com/en-us/services/azure-sentinel/
MIT License
4.53k stars 2.97k forks source link

syslog-ng fails to start after installing the AMA on RHEL with default RHEL syslog-ng configuration - issue with defined source name expected from MS being s_src when it is instead s_sys #9678

Closed rekoilgzs closed 6 months ago

rekoilgzs commented 9 months ago

Describe the bug syslog-ng on RHEL, at least on the tested versions 8.9 and 9.3, uses the source configuration name of s_sys in /etc/syslog-ng/syslog-ng.conf, rather than what the AMA installed configuration file (/etc/syslog-ng/conf.d/azuremonitoragent-tcp.conf) expects of s_src. This means that once the AMA is installed through Azure ARC, syslog-ng cannot start (or will crash upon reboot) and only provide the error message:

Job for syslog-ng.service failed because the control process exited with error code. See "systemctl status syslog-ng.service" and "journalctl -xeu syslog-ng.service" for details.

Upon investigation, it seems that in RHEL (at least in tested version 8.9 and 9.3) syslog-ng is configured to use s_sys in the syslog-ng.conf file (located in /etc/syslog-ng) rather than s_src as configured in both the AMA installed configuration file (/etc/syslog-ng/conf.d/azuremonitoragent-tcp.conf) as well as the CEF installer installation script cef_installer.py. This causes the assignment mismatch between the 2 files and causes syslog-ng to fail to start.

OF NOTE: I've tested on both RHEL and Ubuntu and this issue doesn't affect Ubuntu as it uses the source configuration name of s_src as expected by the Microsoft installation.

The immediate fix for the problem is to standardize the log source across both of the configuration files, to either s_sys or s_src. Keep in mind though, that after you run the CEF installer cef_installer.py](https://github.com/Azure/Azure-Sentinel/blob/master/DataConnectors/CEF/cef_installer.py) you will again have to review the syslog-ng configuration file as it comments out s_sys as the source and inserts s_src again for to open the TCP & UDP ports. ** Of note: the CEF installer script doesn't solve the problem on its own either, as all of the local system's logging configuration is still setup to use s_sys and the script isn't commenting all of that out. While you can comment out all local logging, you will no longer get any data to disk so that's not ideal.

To Reproduce Steps to reproduce the behavior:

  1. Install RHEL (tested both version 8.9 and 9.3)
  2. Install and start syslog-ng while removing rsyslog
  3. Arc connect the machine and assign to a Linux DCR to push the AMA extension to the host.
  4. syslog-ng will not start. You will need to standardize the source naming convention in both configuration files to either s_src or s_sys (syslog-ng.conf and azuremonitoragent-tcp.conf) so you can start syslog-ng
  5. Run the python CEF installer script
  6. syslog-ng will not start. Again, standardize the source naming convention in both configuration files to either s_src or s_sys (syslog-ng.conf and azuremonitoragent-tcp.conf) so you can start syslog-ng.

Expected behavior syslog-ng should work out of the box with the installation of the AMA as well as the installation of the CEF Installer python file.

Screenshots Step 2: Default syslog-ng.conf file 2 original-syslog-ng_conf Step 3: Default azuremonitoragent-tcp.conf file pushed you install the AMA by connecting a syslog/linux DCR in Azure 3 azuremonitoragent-tcp_conf Step 4: Updates made to syslog-ng.conf by the CEF Installer python script. syslog-ng won't start as all the local disk logging is still configured using s_sys which is not defined 5 After_CEF_script_run-syslog-ng_conf

Additional context My manual options for resolution are listed below, but I'm worried about future changes breaking this so I'd like advice from Microsoft on a permanent resolution. 1) update ALL references for s_sys to s_src in the syslog-ng.conf file so that it's compatible with the AMA and the CEF installer then hope syslog-ng doesn't update the base config file. 2) Update all references for s_src to s_sys within the AMA configuration file and then again in syslog-ng.conf after running the CEF python script and hope that MS doesn't update the azuremonitoragent-tcp.conf file.

github-actions[bot] commented 9 months ago

Thank you for submitting an Issue to the Azure Sentinel GitHub repo! You should expect an initial response to your Issue from the team within 5 business days. Note that this response may be delayed during holiday periods. For urgent, production-affecting issues please raise a support ticket via the Azure Portal.

v-muuppugund commented 9 months ago

Hi @rekoilgzs , Thanks for flagging this issue, we will investigate this issue and get back to you with some updates by 02/02/2024. Thanks!

rekoilgzs commented 9 months ago

What I ended up doing to resolve was to skip running the CEF python script and instead add 1 line to each of the 2 config files. NOTE: I don't think that this will universally solve the failures of the configuration file out of the box; however, I'm providing my solution to get it functioning.

To the very bottom of /etc/syslog-ng/syslog-ng.conf I added the listening line that is normally added by the CEF Python script without commenting any of the file's contents out: source s_src { udp( port(514)); tcp( port(514));}; image

In the middle of the AMA configuration file at /etc/syslog-ng/conf.d/azuremonitoragent-tcp.conf, within the "log { " brackets, I added the following line to take into account the local systems logging which one would likely still want to make it up through the AMA: source(s_sys); image

This seems like a win-win solution. The local system would still be able to log locally to /var/log/ through the left intact s_sys configuration in the main /etc/syslog-ng/syslog-ng.conf file while avoiding remote syslogs being saved to /var/log/ as s_src isn't configured for local logging.

The Log Analytics Workspace would be able to not only get the remote syslog, as we added s_src to the base syslog-ng configuration file, but additionally adding s_sys to the AMA's logging configuration file at /etc/syslog-ng/conf.d/azuremonitoragent-tcp.conf ensures that the local syslogs are also pushed up through the AMA.

v-muuppugund commented 9 months ago

HI @rekoilgzs ,Is the issue has been resolved with above changes?

v-muuppugund commented 9 months ago

Hi @rekoilgzs ,Gentle reminder,Is the issue has been resolved with above changes?

rekoilgzs commented 9 months ago

This seems to be the simplest solution to get everything up and working; however, the AMA with syslog-ng on RHEL still doesn't work out of the box without these tweaks. I was simply providing my resolution in the event it helps guide to a permanent fix.

My recommendation would be (roughly): 1) Identify which versions of RHEL with syslog-ng use s_sys instead of s_src (as the AMA syslog-ng configruation file expects) within syslog-ng.conf. I don't have access to licenses to test this thoroughly. 2) Update your AMA extension's installation of the azuremonitoragent-tcp.conf file to account for versions containing s_sys instead of s_src on installation (likely, check for RHEL and identified versions and tweak the file accordingly). 3) Update your CEF python installer to also account for versions containing s_sys instead of s_src on installation (likely, check for RHEL and identified versions and tweak the file accordingly).

This proposed resolution method requires a lot of checking specifically for syslog-ng on RHEL (and potentially even specific versions of RHEL / syslog-ng), so I don't know that this is necessarily the best method.

To close this issue out, however, this should really work out of the box without someone having to trace through config files to identify and resolve these issues. This is going to be an issue for everyone else using affected versions of RHEL with syslog-ng.

v-muuppugund commented 9 months ago

Hi @rekoilgzs , will check on these and get back to you.

czanik commented 9 months ago

The problem is that there is no standardized naming scheme for the syslog-ng configuration. All distros have a different name for local logs. The three most popular names are:

This is also the order of popularity of platforms to run syslog-ng, at least as far as we can estimate at syslog-ng upstream.

Of course there are other distros with different names (even if the above cover over 90% of users), and many enterprises roll out their own configurations, written from scratch.

rekoilgzs commented 9 months ago

So the solution is either to rewrite the installation accounting for the systems above and/or to modify the Microsoft Learn article stating that unless one uses Debian/Ubuntu (or if script is updated to one of the operating systems that the script accounts for) that they'll need to review and update for their local logs or syslog-ng will not start after installation of the AMA?

I have some concerns with the latter strategy.

  1. It's not a good client onboarding experience. Syslog-ng doesn't immediately fail after the AMA's installation, but instead after a restart of the service or a reboot of the system the service will fail to come up. This could lead an org to push the AMA to a group of systems without realizing that their logging would go down in the future. This seems even more potentially disastrous with Defender for Cloud's auto-provisioning with the AMA, as this could surprisingly wipe out local logging on a large fleet of systems.
  2. Even after the configuration file is manually fixed I worry that the configuration file /etc/syslog-ng/conf.d/azuremonitoragent-tcp.conf will be automatically updated by Microsoft, like it just was on Ubuntu changing over from using sockets to omfwd, and it will break the fix I put in place without notification to the org.
v-muuppugund commented 9 months ago

Hi @rekoilgzs / @czanik ,Will check on the above issues/options and get back to you with an update

v-muuppugund commented 8 months ago

Hi @rekoilgzs / @czanik ,I have worked on this and need some more time, to share detailed updates,Will update you

v-muuppugund commented 8 months ago

HI @rekoilgzs / @czanik ,will add the details and do the changes,will update you,so other's won't be impacted.

v-muuppugund commented 8 months ago

Hi @rekoilgzs / @czanik ,I have verified for RHEL its not required for s_src,s_sys works,

Will be verifying the following for issue closure 1.Fedora 2.openSUSE / SLES and FreeBSD 3.Debian/Ubuntu

v-muuppugund commented 8 months ago

Hi @rekoilgzs / @czanik ,I am currently ingesting the logs for fedora and OpenSuse, Will update you

v-muuppugund commented 8 months ago

Hi @rekoilgzs / @czanik ,Fedora I am facing issues during data ingestion,invesigating I have followed the steps for dcr,still data is not ingested,working on it,will update you

v-muuppugund commented 7 months ago

Hi @rekoilgzs / @czanik ,VM has been deleted to compliance and am reworking on this creation and set up,will update you

v-muuppugund commented 6 months ago

Hi @rekoilgzs ,As the original issue has been resolved, we are closing your issue (https://github.com/Azure/Azure-Sentinel/issues/9678) as per our standard operating procedures. If you still need support for this issue, feel free to re-open at any time. Thank you for your co-operation!