PowerShell / DSC

This repo is for the DSC v3 project
MIT License
194 stars 24 forks source link

Handle resources that require restarts #50

Open SteveL-MSFT opened 1 year ago

SteveL-MSFT commented 1 year ago

Summary of the new feature / enhancement

Some changes to the system may require a reboot before additional changes can be applied. dsc itself shouldn't perform a reboot, but we may need a standardized way for resources to return that a reboot is required like _rebootRequested: true and then dsc just reports that as a standard result. Then something else performs the reboot when ready and re-runs the config until it completes or hits another reboot.

Proposed technical implementation details (optional)

No response

mgreenegit commented 1 year ago

Do we have a spec to document output requirements? If not, maybe that is another ask to PM?

michaeltlombardi commented 1 year ago

My understanding/preference for the output requirements of a DSC Resource is that it must return a JSON blob that is valid by its own schema.

So, using an arbitrary and idealized example:

{
    "$schema": "https://aka.ms/dsc/schemas/resource_manifest",
    "manifestVersion": "1.0",
    "type": "TSToy.Example/gotstoy",
    "version": "0.1.0",
    "get": {
        // truncated
    },
    "set": {
        // truncated
    },
    "schema": {
        "embedded": {
            "$schema": "https://aka.ms/dsc/schemas/resource_definition",
            "title": "Golang TSToy Resource",
            "type": "object",
            "required": [
                "scope"
            ],
            "properties": {
                // truncated property list
                "_rebootRequired": {
                    "$ref": "https://aka.ms/dsc/schemas/keywords/rebootRequired"
                }
            }
        }
    }
}

I think for now we can just define that resources that need to report on reboots need to have a _rebootRequired key with a boolean value, but in the long term if people can use a reference to a well-defined keyword, that will help a lot. I've been thinking about this a bit wrt to other keywords, like _ensure and _sensitive too. Having the published schema with some well-known keywords would make authoring easier and help ensure consistency across the resource implementations, rather than relying on people to correctly implement those things by hand.

jambar42 commented 1 year ago

We have a few scenarios where we want the reboot behavior to be variable based on the current context.

Currently when a resource needs a reboot, we set $Global:DSCMachineStatus = 1 within the SET block.

Outside of reboots, variable resource execution timing would be great too. We don't really need to make sure the drivers are up to date every 15 min, once per week or once per month would be fine. We do however want the machines to be autocorrected for other things on a faster cadence, and in some instances, immediately. EX: Someone enabled sql extended stored procedures on a db node, turn it off immediately. Someone adds a new local administrator manually, remove it immediately. The config definition is the source of truth.

In summary, I feel rebootRequired is not descriptive enough to get robust behavior when using DSC for ongoing daily DevOps management. Maybe also some additional metadata like disruptiveSet subscribedWindow preSetResourceInvoke postSetResourceInvoke would also be useful.

Hope this feedback helps. Please reach out if you'd like some examples of how we've solved some of these challenges.

SteveL-MSFT commented 1 year ago

The actual reboot will need to be initiated by a higher level agent (Machine Config, Ansible, Chef/Puppet, etc...) or even a script using task scheduler/cron. DSC should aggregate the reboot information to a top level _rebootRequired as part fo the output config (probably under a metadata section to keep it aligned with ARM).

For disruptive changes, like reseting hardware, this is probably better discussed in a new issue, created https://github.com/PowerShell/DSC/issues/103

For cluster operations, we might just need a specific discussion with that team since it seems very domain specific.

Flighting is also better suited for higher level agents like Machine Config to handle as DSC wouldn't have knowledge of other systems.

ThomasNieto commented 1 year ago

Even if a higher level agent is managing the reboots there is a need for a configuration/resource to indicate that the reboot must occur first before continuing applying the configuration.

SteveL-MSFT commented 6 months ago

WG discussed this and current proposal is for resources that require a reboot to emit _rebootRequired=true as part of their output and dsc will inform the user that a reboot is required. If the user ignores that and reruns the config, it will be up to the resource to know if a reboot has been satisfied or do nothing except re-emit _rebootRequired=true. dsc will simply re-run config after the reboot and as resources are expected to be idempotent, those resources that are in desired state would be no-op until config execution gets past the reboot part.

SteveL-MSFT commented 6 months ago

Based on a new scenario that came up, I'm renaming this to restart instead of reboot to handle other cases like restarting a service, window manager, or even computer system.

In this case, my proposal is if a restart is required, we handle 3 cases (more can be added later, like clusters):

_restartRequired:
  process:
    name: explorer.exe
    id: 1234
_restartRequired:
  service:
    name: sshd
    id: 1234
_restartRequired:
  computer:
    name: MyComputer

Resources that can handle restarting should have a property that indicates a restart should be performed if needed:

_restartIfNeeded: true

The default value is false and returns the _restartRequired as above.

michaeltlombardi commented 6 months ago

Is there a case for a single resource returning an array of processes or services that need to be restarted?

SteveL-MSFT commented 6 months ago

Is there a case for a single resource returning an array of processes or services that need to be restarted?

I can't think of one, but we can allow a choice of an array if that situation comes up.

SteveL-MSFT commented 1 month ago

I think it may be better to have this returned as metadata than a property