Azure / azure-linux-extensions

Linux Virtual Machine Extensions for Azure
Apache License 2.0
308 stars 254 forks source link

DSC Extension for Linux - Stuck in progress #1671

Open eusebyu opened 1 year ago

eusebyu commented 1 year ago

The DSC extension reports RHEL Vms as in progress all the time in the Automation Account.

It fails on TestDscConfiguration when trying to write to sys.stdout. For some reason it seems that sys.stdout is null (None).

/var/opt/microsoft/omsconfig/omsconfig.log

023/02/02 16:20:17: DEBUG: /opt/microsoft/omsconfig/Scripts/python3/TestDscConfiguration.py(32):
Starting script logic for /opt/microsoft/omsconfig/Scripts/python3/TestDscConfiguration.py runing with python sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
2023/02/02 16:20:17: ERROR: null(0): EventId=1 Priority=ERROR Job 4DB14D64-6DF4-4E6C-B897-340F4A86D1DC :
DSC Engine Error :
         Error Message FindFirstFile Failed.
        Error Code : 1
2023/02/02 16:20:17: DEBUG: /opt/microsoft/omsconfig/Scripts/python3/TestDscConfiguration.py(112):
End of script logic for /opt/microsoft/omsconfig/Scripts/python3/TestDscConfiguration.py runing with python sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
2023/02/02 16:21:01: DEBUG: /opt/microsoft/omsconfig/Scripts/python3/PerformRequiredConfigurationChecks.py(125):
Starting Main method for /opt/microsoft/omsconfig/Scripts/python3/PerformRequiredConfigurationChecks.py runing with python sys.version_info(major=3, minor=6, micro=8, releaselevel='final', s
erial=0)
2023/02/02 16:21:01: ERROR: null(0): EventId=1 Priority=ERROR Job 4DB14D64-6DF4-4E6C-B897-340F4A86D1DC :
DSC Engine Error :
         Error Message FindFirstFile Failed.
        Error Code : 1
2023/02/02 16:21:01: DEBUG: /opt/microsoft/omsconfig/Scripts/python3/PerformRequiredConfigurationChecks.py(127):
End of Main method for /opt/microsoft/omsconfig/Scripts/python3/PerformRequiredConfigurationChecks.py runing with python sys.version_info(major=3, minor=6, micro=8, releaselevel='final', ser
ial=0)

/var/opt/omi/log/dsc.log

2023/02/02 16:12:33: DEBUG: /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py(125):
Starting Main method for /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py runing with python sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=
0)
2023/02/02 16:12:33: DEBUG: /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py(118):
2023/02/02 16:12:33: WARNING: null(0): EventId=2 Priority=WARNING Job A09ED1A6-6DBD-462A-872C-0111E15DC8B1 : Starting PerformRequiredConfigurationChecks DSC operation.
2023/02/02 16:12:34: WARNING: null(0): EventId=2 Priority=WARNING Job A09ED1A6-6DBD-462A-872C-0111E15DC8B1 :
Displaying messages from built-in DSC resources:
         WMI channel 1
         ResourceID:
         Message : [vm-3479016]:                            [] Starting consistency engine.
2023/02/02 16:12:34: WARNING: null(0): EventId=2 Priority=WARNING Job A09ED1A6-6DBD-462A-872C-0111E15DC8B1 :
Displaying messages from built-in DSC resources:
         WMI channel 1
         ResourceID:
         Message : [vm-3479016]:                            [] A pending configuration exists. DSC will process a set request on the pending configuration.

Exception: (<class 'AttributeError'>, AttributeError("'NoneType' object has no attribute 'write'",), <traceback object at 0x7f90328b37c8>)
  File "/opt/microsoft/dsc/Scripts/python3/client.py", line 167, in <module>
    trace ('socket: '+str(sys.argv[1]))
  File "/opt/microsoft/dsc/Scripts/python3/client.py", line 21, in trace
    sys.stdout.write (text + '\n')

Exiting - closing socket
error on socket: (104) "Connection reset by peer"
unable to read value
2023/02/02 16:12:35: DEBUG: /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py(120):
2023/02/02 16:12:35: DEBUG: /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py(127):
End of Main method for /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py runing with python sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)

This has worked in the past. Now suddenly it stopped working. This behavior is consistent on all newly provisioned VMs (RHEL 8). I also used a very simple nx script which just touches a file and checks if exists.

OS: RHEL 8.7 DSC: 1.2.4 OMI: 1.6.11 Python: 3.6.8

When I run /opt/microsoft/dsc/bin/ConsistencyInvoker, I am receiving a generic error:

{
    OwningEntity=OMI:CIMOM
    MessageID=OMI:MI_Result:1
    Message=Completed processing test operation. The operation returned False.
    MessageArguments={}
    PerceivedSeverity=7
    ProbableCause=0
    ProbableCauseDescription=Unknown
    CIMStatusCode=1
    OMI_Code=1
    OMI_Category=0
    OMI_Type=MI
    OMI_ErrorMessage=A general error occurred, not covered by a more specific error code.
}
eehret commented 1 year ago

I believe you've run into the same issue I noticed. If I'm right, I think you'll notice that your omid process crashes every time your DSC is running. omi version 1.6.10 or higher are incompatible with the DSCForLinux VM extension. If your OMI gets upgraded, as yours has, DSC will break due to OMI crashing.

Sadly there was a CVE in OMI that had to get patched, but it seems the DSC team didn't keep their solution compatible with OMI 1.6.10 and higher.

In our case if we downgrade OMI back to 1.6.9 (the version that ships with the current DSC solution) then it starts working again. But of course due to the CVE we can't actually operate like that....

I have already reported this problem to the product group responsible for DSCForLinux. They're working on a fix.