Azure / nxtools

Azure Automanage Machine Configuration Linux DSC resources
Other
13 stars 11 forks source link

nxService reporting wrong status on Ubuntu Server 22.04 LTS #54

Open eehret opened 5 months ago

eehret commented 5 months ago

Details of the scenario you tried and the problem that is occurring

Our hardening configurations have some instances where we need to ensure that a given service is enabled and running. We've found that on Ubuntu Server 22.04 LTS, the nxService resource is reporting that these services are not running when in fact they are.

Verbose logs showing the problem

I don't know how to cause the GC agent to use verbose logs, but here is what it's outputting for one of the example resources that is returning the wrong status:

[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Displaying messages from the engine:
         WMI channel 1
         ResourceID: [nxService]auditd_running
         Message : [atlas-sb-apps-01]: LCM:  [ Start  Resource ]  [[nxService]auditd_running]
[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Getting a registration instance for nxService
[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Moving the resource [nxService]auditd_running of class >[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Getting Metaconfiguration details.
[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Executing operations for PS DSC resource nxService with>[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Method Exec_PSProvider started with parameters
         Class name: nxService
         Resource ID: [nxService]auditd_running
         Flags: 0
         Execution Mode: 12582912                                                                                                                                                                                         DSC resource Namespace: null
[2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Displaying messages from the engine:
         WMI channel 1
         ResourceID: [nxService]auditd_running
         Message : [atlas-sb-apps-01]: LCM:  [ Start  Test     ]  [[nxService]auditd_running]                                                                                                                    [2024-04-30 19:14:51.811] [PID 122390] [TID 122391] [PSPROVIDER] [INFO] [00000000-0000-0000-0000-000000000000] getPSModulePath(). psModulePath:/var/lib/GuestConfig/Configuration/Linux-Common-CIS-Sections3to6->[2024-04-30 19:14:51.913] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Loading default keywords
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing                                                                                                                                                                                      PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:14:52.211] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Getting module list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:14:56.514] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Creating resource list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:14:58.611] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Loading default keywords
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:14:58.615] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Getting module list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:02.312] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Creating resource list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:11.511] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Displaying messages from the engine:
         WMI channel 1
         ResourceID: [nxService]auditd_running
         Message : [atlas-sb-apps-01]: LCM:  [ End    Test     ]  [[nxService]auditd_running]  in 0.9564 seconds.
[2024-04-30 19:15:11.511] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Displaying messages from the engine:
         WMI channel 1
         ResourceID: [nxService]auditd_running
         Message : [atlas-sb-apps-01]: LCM:  [ Start  Set      ]  [[nxService]auditd_running]
[2024-04-30 19:15:11.511] [PID 122390] [TID 122391] [PSPROVIDER] [INFO] [00000000-0000-0000-0000-000000000000] getPSModulePath(). psModulePath:/var/lib/GuestConfig/Configuration/Linux-Common-CIS-Sections3to6->[2024-04-30 19:15:11.616] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Loading default keywords
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:11.811] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Getting module list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:14.913] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Creating resource list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:16.613] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Loading default keywords
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:16.824] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Getting module list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:20.912] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity Creating resource list
CurrentOperation [atlas-sb-apps-01] Get-DscResource
StatusDescription Processing
 PercentComplete 0
 SecondsRemaining 0
[2024-04-30 19:15:25.910] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Displaying messages from the engine:
         WMI channel 1
         ResourceID: [nxService]auditd_running
         Message : [atlas-sb-apps-01]: LCM:  [ End    Set      ]  [[nxService]auditd_running]  in 0.6925 seconds.
[2024-04-30 19:15:25.911] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Method Exec_PSProvider ended successfully
[2024-04-30 19:15:25.911] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Displaying messages from the engine:
         WMI channel 1
         ResourceID: [nxService]auditd_running
         Message : [atlas-sb-apps-01]: LCM:  [ End    Resource ]  [[nxService]auditd_running]
[2024-04-30 19:15:25.911] [PID 122390] [TID 122391] [DSCEngine] [INFO] [00000000-0000-0000-0000-000000000000] Job 1cf52836-dd40-4af6-a6c5-399d56f7e813 : Activity LCM
CurrentOperation [atlas-sb-apps-01] Set
StatusDescription Applying Configuration
 PercentComplete 6
 SecondsRemaining 41

Then later, we see:

[2024-04-30 19:40:51.713] [PID 122390] [TID 122391] [Reports] [INFO] [1cf52836-dd40-4af6-a6c5-399d56f7e813] Creating assignment report for job '1cf52836-dd40-4af6-a6c5-399d56f7e813'. Resource '[nxService]auditd_running' phrase 'Service 'auditd' is not running or does not exist.' reasoncode 'nxService:nxService:NotRunningDNE'.  

Meanwhile, systemctl shows that it actually is running:

azops.eric.ehret@022gc.onmicrosoft.com@atlas-sb-apps-01:~$ sudo systemctl status auditd
● auditd.service - Security Auditing Service
     Loaded: loaded (/lib/systemd/system/auditd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2024-05-01 10:00:50 UTC; 2h 35min ago
       Docs: man:auditd(8)
             https://github.com/linux-audit/audit-documentation
   Main PID: 469 (auditd)
      Tasks: 14 (limit: 9507)
     Memory: 73.5M
        CPU: 37.755s
     CGroup: /system.slice/auditd.service
             ├─469 /sbin/auditd
             └─473 /opt/microsoft/auoms/bin/auomscollect

Suggested solution to the issue

I don't know. This same configuration works for Ubuntu 20.04 LTS. I have not yet determined the reason it fails on 22.04 LTS.

The DSC configuration that is used to reproduce the issue (as detailed as possible)

    nxService auditd_running {
      Name = "auditd"
      Enabled = $true
      State = "Running"
    }

The operating system the target node is running

Ubuntu Server 22.04 LTS

Version and build of PowerShell the target node is running

Not sure, I guess it's 7.2.0-preview3, judging by this information I found in the powershell_check_supported_os.sh script:

#!/bin/bash
#
# This script checks if the PowerShell 7.2.0-preview3 / .NET 6 is supported by the OS
#

Version of the DSC module that was used

1.4.0

eehret commented 4 months ago

I believe I have found the root cause of this one.

On Ubuntu 22+, the output of systemctl is including some white space before the service unit ID.

For example , using the same command syntax that would be used by the code here , on Ubuntu 18 we see output like the following:

 azops.eric.ehret@022gc.onmicrosoft.com@epic-dv-wb-01:~$ sudo systemctl list-units --type=service --no-legend --all --no-pager cron.service
cron.service loaded active running Regular background program processing daemon

Whereas on Ubuntu 22+ , we see output like this:

azops.eric.ehret@022gc.onmicrosoft.com@atlas-sb-apps-01:~$ sudo systemctl list-units --type=service --no-legend --all --no-pager cron.service
  cron.service loaded active running Regular background program processing daemon

I did the following to test how the nxService code (and its private dependency) would process this, I did the following:

PS C:\Users\EEHRET> $services = "  cron.service loaded active running Regular background program processing daemon"
PS C:\Users\EEHRET> $id, $Load, $Active, $Status, $Description = $services -split '\s+',5
PS C:\Users\EEHRET> $id

PS C:\Users\EEHRET> $Load
cron.service
PS C:\Users\EEHRET> $Active
loaded
PS C:\Users\EEHRET> $Status
active
PS C:\Users\EEHRET> $Description
running Regular background program processing daemon

As you can see, the columns are offset due to the white space at the beginning of the systemctl output.

I believe an easy fix to this would be to simply trim whitespace from the beginning and ends of the systemctl output before attempting to split out the columns into each variable.

eehret commented 4 months ago

It appears that the above issue can also be avoided by adding an additional parameter to the --list-units command: '--plain'.

Here is the before and after:

azops.eric.ehret@022gc.onmicrosoft.com@atlas-sb-apps-01:~$ sudo systemctl list-units --type=service --no-legend --all --no-pager cron.service
  cron.service loaded active running Regular background program processing daemon
azops.eric.ehret@022gc.onmicrosoft.com@atlas-sb-apps-01:~$ sudo systemctl list-units --type=service --no-legend --all --plain --no-pager cron.service
cron.service loaded active running Regular background program processing daemon