xService Cannot Manage Failure Actions

RandomNoun7 commented 4 years ago

The xService resource cannot currently manage a service's Failure Actions.

The traditional way to set these actions is via sc.exe.

I propose that xService be modified to read the REG_BINARY field FailureActions registry value for the service.

The contents of the field are described in this Stack Overflow Post and Documented by Microsft on the MS Docs Site

I have a start to this functionality working that allows me to read the values from the registry and to compare against a desired state.

The next steps will be to implement settings the values in the registry, and then refactoring the code to make it cleaner and more maintainable.

The new proposed properties for the resource are:

ResetPeriodSeconds - The time after which to reset the failure count to zero if there are no failures, in seconds. FailureCommand - The command line of the process for the CreateProcess function to execute in response to the SC_ACTION_RUN_COMMAND service controller action. RebootMessage - The message to be broadcast to server users before rebooting in response to the SC_ACTION_REBOOT service controller action. Failure1Action Failure1Delay Failure2Action Failure2Delay Failure3Action Failure3Delay

The FailureAction represent 3 possible slots for failure actions. The first two define what to do after first and second failure and the delay in milliseconds to wait before performing the action

The third slot is the action to be performed on subsequent failures that occur within the ResetPeriodSeconds time period.

Valid actions are documented on the MSDocs Site as: NONE REBOOT RESTART RUN_COMMAND

This issue is a restatement of https://github.com/PowerShell/PSDscResources/issues/83 and https://github.com/PowerShell/PSDscResources/issues/69

RandomNoun7 commented 4 years ago

So far in the course of testing, I have discovered two new things that will have to be accounted for.

The Recovery Option tab of the services properties dialog can only show you 3 recovery options, but in fact you can have a larger number of them stored in the registry. I describe the behavior of the recovery options taking into account the larger number of allowed options in this comment on the PR.

I have also found that there is another property I need to manage called FailureActionsOnNonCrashFailures which corresponds to the checkbox in the Recovery Options tab labeled Enable Actions For Stops With Errors. I describe the behavior in more detail in this comment on the PR.

Dealing with the extra registry key for that checkbox is the easier of the two issues.

To account for the possibility of an unkown number of failure actions that may need to be managed, I think I will have to change strategy slightly and remove the 6 numbered failure action parameters, and switch to a single failureActionsCollection parameter. This new parameter would take an array of hashtables. Each hash table would be required to have two keys action and delaySeconds.

I could then iterate over the collection and ensure that each is properly reflected in the registry key data.

PlagueHO commented 4 years ago

Thanks for working on this @RandomNoun7 - it is much appreciated. I have been watching your draft PR with interest :grin:

RandomNoun7 commented 4 years ago

Well, annoyingly, and as hard as this may be to believe, I am only just now running into the fact that DSC really can't deal with hashtables on their own. Now I need to decide if I want to deal with [Microsoft.Management.Infrastructure.CimInstance[]] in all of my code, deal with converting back and forth all the time, or just go back to only supporting 3 slots for failure actions so I don't have to deal with arrays at all.

That last option wuold certainly make my life easier, but I'm not sure I can bring myself to do it because I think there are instances with fresh Windows installs where there are already more than 3 failure actions stored in the registry, and not supporting that configuration in a new resource parameter and saying "If you want to use this you will have to accept data loss" is just not great.

PlagueHO commented 4 years ago

Hi @RandomNoun7 - yes, this is a side affect of being dependent on CIM for the MOF. There are two patterns available to solve this:

Use a Embedded Instances (like how xWebSite does it): https://github.com/dsccommunity/xWebAdministration/blob/master/source/DSCResources/MSFT_xWebSite/MSFT_xWebSite.schema.mof#L10
Use a sub resource (e.g. create a new resource called xServiceFailureAction). @rchaganti wrote a good article on this over on PowerShell Magazine here: https://www.powershellmagazine.com/2017/05/23/psdsc-doing-it-right-resource-granularity/

Personally, I feel the 2nd option is the way to go here. I'm certain @gaelcolas also has thoughts on this as well.

RandomNoun7 commented 4 years ago

I'm not sure the second option would help me avoid the need for an array? In order to construct the registry key properly I have to know all of the failure actions that will go into the key, their properties, and the order they go in, and whether there is a restart command or a reboot message, all at the same time.

If I have it right, the second option would be to create a resource called xServiceFailureAction that would represent a single failure action like Restart the service after 100 seconds, and the service the action belongs to, and then the user would write a configuration that would declare some number of this resource and assign them to a specified service.

Unfortunately, I don't see how any one instance of that resource would have the information it needs to properly construct the registry key.

Given that this the first time I'm implementing a feature of this kind in DSC please do let me know if I have that wrong.

RandomNoun7 commented 4 years ago

I'v hit a bit of speedbump managing the check box for Enable actions for stops with errors. Any thoughts or ideas are welcome. https://stackoverflow.com/questions/61463543/setting-a-windows-services-enable-actions-for-stops-with-errors-check-box-via

RandomNoun7 commented 4 years ago

I have found someone that wrote a PInvoke Implementation that seems to manage this flag. I want to test it and if it works and the author consents, to largely change strategy and go with PInvoke for this instead. https://gist.github.com/jborean93/889288b56087a2c5def7fa49b6a8a0ad

PlagueHO commented 4 years ago

Hi @RandomNoun7 - I think it is OK to use PInvoke method if no other alternative is available (I've used it in CertificateDsc). It is a little bit more difficult to create unit tests for though (it is possible but our CI pipelines don't know to run tests against C#) - so we end up having to rely on integration tests (which is OK if unavoidable).

It seems strange that there is no other way of doing this - seems to be a bit of an oversight somewhwere. I did a bit of searching but couldn't pull anything up - I sort of expected it to be exposed in CIM/WMI.

It appears that the PInvoke code is doing some escalation of privileges (could be wrong). That may not play well with the DSC LCM (or it might be just fine). Just something to be aware of.

PlagueHO commented 4 years ago

One thing I'd suggest (that I should have done in CertificateDsc) is put the C# code in a separate .cs file and import that in the Add-Type rather than embedding the code. That way we can potentially run it through linting and other tools.

RandomNoun7 commented 4 years ago

I like the idea of separating out the code file. If I could find another method besides PInvoke I would definitely use it. I'm still searching to figure out how, if not that registry key, this setting is being persisted so I could just manage that, but searching online and a lot of time in PROCMON aren't turning it up. It really seems like the whole idea of this service failure actions business is a half baked afterthough on Microsofts part and I wish I could talk to someone to understand why and what on earth is going on with this feature.

PlagueHO commented 4 years ago

It does seem odd that there isn't a straight forward/easy way of doing this. I'll ask around internally to see if I can find anyone who might be in the know here. But failing that we may have to go the PInvoke way...

RandomNoun7 commented 4 years ago

It occurs to me that one way to preserve the unit testability would be to use PInvoke solely for managing that one flag. I could even rip out everything in that code not needed for that specific purpose to make the .cs file itself a litle easier to understand.

I am curious to know that comes of your asking around internally though.

I should mention as well though that I think I was a little unkind in my criticism of how failure actions are implemented. As with most things in software development I'm certain that a lot of smart people were solving hard problems in the best way available given constraints unknown to use now.

PlagueHO commented 4 years ago

I was wondering: could you call SC.EXE to do this flag? To Set:

& sc.exe @('qfailureflag', $ServiceName, 1) # True
& sc.exe @('qfailureflag', $ServiceName, 0) # False

To Get:

& sc.exe @('qfailureflag', $ServiceName)

Now, to find the right internal community to ask :) hehe

But I completely agree - this may just have been something that slipped under the radar and/or just wasn't high enough demand (e.g. SC.EXE can handle it). I've occasionally come across small gaps like that and wondered how they slipped through. I'm thinking that in a lot of cases GPO's handled them OK so it was just left as is. Another example is configuring a Network Proxy settings - the only way I could do that was encoding/decoding binary strings and dumping them into the registry (not unlike what you're doing here). Usually the hardest part is not the writing of the setting it is actually reading it.

RandomNoun7 commented 4 years ago

I was about to reply that the oversight extends to sc.exe itself, and that it doesn't work either, but then I took a really close look at your code snippets up there.

All of the documentation I had seen up to this point had led me to believe that the syntax for setting this flag was literally something like & sc.exe failureflag WSearch flag=1. I see now above though that you aren't passing flag=1 but just 1. I tested that change and it worked exactly as expected.

Much my furstration up to this point has partially been due to the fact that I believed sc.exe was also not working properly, so it's not fun to discover that I'v just been using it incorrectly like this, but there it is. It's doubly frustrating that sc has been reporting success and exiting zero this whole time despite the fact that I'v been using incorrect syntax.

So, with that said, absolutely yes, we can invoke sc for this one thing and that would probably be better than PInvoke for that single purpose. I would still be in favor of the registry approach for the other stuff just so I don't have to be in the business of parsing sc's strings to do the rest of it.

One thing that I'll have to keep in mind is that sc.exe is now known to report success and zero exit code when it hasn't actually done anything, so immediately after using it I'll have to query the registry real quick to make sure the setting actually took, but that seems to be reliable at this point.

RandomNoun7 commented 4 years ago

I would still love to know what sc is doing besides just setting that registry key, but this at least gives me a path forward to get the job done.

dsccommunity / xPSDesiredStateConfiguration

xService Cannot Manage Failure Actions #678