dsccommunity / FailoverClusterDsc

This module contains DSC resources for deployment and configuration of Windows Server Failover Cluster.
MIT License
60 stars 54 forks source link

xCluster: Automatic Offline Cluster Node Removal breaks Cluster Aware Updating and S2D workflows #257

Closed jambar42 closed 2 years ago

jambar42 commented 3 years ago

Details of the scenario you tried and the problem that is occurring

When a Storage Spaces Direct cluster node is rebooted by Cluster Aware Updating, it is left in an offline state until the storage repair jobs are complete. If the DSC resource runs during this time, it removes the node from the failover cluster, and breaks the storage repair. The only way to get the node back into the cluster is to run Clear-ClusterNode.

Verbose logs showing the problem

Suggested solution to the issue

Please allow a boolean option for automatic offline node removal to xCluster.

The DSC configuration that is used to reproduce the issue (as detailed as possible)

The operating system the target node is running

OsName : Microsoft Windows Server 2019 Datacenter OsOperatingSystemSKU : DatacenterServerEdition OsArchitecture : 64-bit WindowsVersion : 1809 WindowsBuildLabEx : 17763.1.amd64fre.rs5_release.180914-1434 OsLanguage : en-US OsMuiLanguages : {en-US}

Version and build of PowerShell the target node is running

Name Value


PSVersion 5.1.17763.1852 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.17763.1852 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1

Version of the DSC module that was used

1.16.0

jambar42 commented 3 years ago

I'll work on this item sometime over the next month.

jambar42 commented 3 years ago

https://github.com/dsccommunity/xFailOverCluster/blob/f4c289ae2e09d49c0a69bb081ab55f27c3cdd69e/source/DSCResources/MSFT_xCluster/MSFT_xCluster.psm1#L232

^^^ offending line of code

nickgw commented 2 years ago

@johlju I was coming to create an issue for this because my org has run into this issue as well. Do you have an opinion on whether we should scrap automatically kicking down nodes, or add a switch where we can enable not kicking the nodes.

Second option maintains current functionality, but imo automatically kicking downed nodes was a bad idea in the first place.

johlju commented 2 years ago

I think I rather see a switch that says KeepDownedNodesInCluster and when is $true it does not remove nodes. Then we don't make a breaking change.

nickgw commented 2 years ago

@johlju Made a new PR with KeepDownedNodesInCluster as a parameter!