dsccommunity / SharePointDsc

The SharePointDsc PowerShell module provides DSC resources that can be used to deploy and manage a SharePoint farm
MIT License
243 stars 107 forks source link

Public Updates (PU's) #184

Closed ryanspletzer closed 7 years ago

ryanspletzer commented 8 years ago

This is more of a general question: What would be the preferred approach for handling public updates (PU's) for SharePoint as they come out?

Or should these DSC Resources just be focused on setup / config of SharePoint and such PU's should be handled through separate configuration?

BrianFarnhill commented 8 years ago

Hey @ryanspletzer

So the question of patching has been thrown around by my team internally since the very first versions of xSharePoint. Unfortunately at the moment there isn't anything we can do with xSharePoint to handle installation of a specific update but it is absolutely on the radar. We had decided to hold off on making decisions on how we handle this until we have some public documentation on the new "zero-downtime patching" feature for SharePoint 2016 that has been discussed, so we can make sure whatever we implement for xSharePoint lines up with anything that needs.

Now that being said, I think there is value in getting a discussion going here around how this could work and what people would want to see in a patching resource. So the challenges that I know we are going to face are more to do with scheduling (when is the right time for an individual server to drop itself out while installing binaries, and when is the right time to bring the farm down with doing the PSConfig activities), as well as how we handle the fact that we would be making changes that can't be reversed so we need to make sure that we make it easy for customers to be in control of how and when a patch occurs so they can ensure they have an appropriate roll-back in place in case the patch goes south, and I'm not quite sure what that looks like.

So lets open the floor to some discussion here - how do people think we can handle patching with xSharePoint? Should we even be trying to do this? All thoughts on the topic welcome.

camiloborges commented 8 years ago

That is a good question. Short answer: I think we should have support for it.

Applying a PU is an common and required activity that is part of BAU just as much as creating an user profile property, or reconfiguring the search topology and since DSC provides an abstraction layer we really need to have this resource available. The thing is that there has to be a way to recreated the environment with exactly the same process, which brings the need for version management, and every maintenance should be kept as an repeatable transaction. Whada heck do I mean?

Contoso Environment Change History: (0) Farm is created with a custom configuration and rolled out to production. After that(1) DM solution requires managed properties created, as well as mappings. Shortly after that it (2) Jan 2016 PU is deployed to the farm. (3) PowerBI is implemented Version History 1.0.0 - Farm built 1.0.1 - Created Managed Property and Mappings. 1.1.2 - Put is deployed to the farm 1.1.3 - PowerBI

If There is a need to rebuild the environment the solution is to replay the sequence. IF desired environment can have version 2, which incorporates all changes, but then I am going a step too far.

BrianFarnhill commented 8 years ago

I'm actually gonna have to disagree with you here @camiloborges

One of the primary concepts of desired state configuration is that you only need to define the state you want the servers/services to be in, not how you get there. So in theory, I should be able to roll out straight to v1.1.3 of your example there without going through the other stages, because thats the version that is my "desired state". It's a bit of a shift in thinking about how we do scripting and deployment, but ultimately that's how it needs to be. It's why we need to write resources that will check for whatever the current state is and move it to the desired state.

So if we are going to look at how patching is incorporated in to xSharePoint, we need to look at how we can do that in such a way that we define "here is the patch level I want" and that is all we define, the DSC engine and our resoruces needs to figure out what it takes to get it there. So in my mind that's a matter of something along the lines of the farm build number and compare that to the patch we want to install (but this then means where a target CU needs a specific service pack or something before hand we need to make sure that a meaningful error comes out to let someone know that) and then install from there.

But as an overall stance, I don't want to put anything in that creates a need to version DSC scripts and replay them in a sequence to restore farm - it should be about the desired state only.

camiloborges commented 8 years ago

The philosophy can wait, so let me answer it with DSC as it is designed ;)

Having the patch can and should be a separate resource, it has an reason even even from a brand new implementation.

I hardly ever streamlined patches, probably because I was never really an infra guy :), so I could have a need to install it anyway, just like we install SP and PreReqs. So, it could be that I need to have a

xSPInstallPrereqs InstallPrerequisites { InstallerPath = "C:\SPInstall\Prerequisiteinstaller.exe" OnlineMode = $true } xSPInstall InstallBinaries { BinaryDir = "C:\SPInstall" ProductKey = $ProductKey } xSPProductUpdate PU { BinaryDir = "C:\SPInstall" PUVersion = "Jan2016" }

So, back to philosophy I do agree that this is what DSC was designed to do. And Yes, I am kinda challenging a fundamental principle of it :)

Perhaps I am leveraging real world application of the word 'desired' :) Desired State might be and often is a series of Desired States("All VMS in a VM Host", "join network", "install sharepoint")

Also, Progressive changes can lead to different results from a brand new complete configuration, so having an ability to replay the configuration is a nice thing to have. It doesn't even need to be seen as versions, but a transaction log of all that was applied to a machine.

BrianFarnhill commented 8 years ago

OK so a few points to respond to there - first, I agree that slipstreaming is not the answer. For doing a brand new installation I think it is, but thats a one off and doesn't really address how we look at patching (because in the real world people don't build new farms at the higher patch level to do database attaching as an upgrade mechanism).

To your point about replaying a configuration - you name me a scenario when this matters in SharePoint that you deploying configurations in sequence (and something being a prerequisite doesn't count because DSC handles that) and I'll look at how that can be removed, because like I said, the whole point of DSC is just about "describe the state you want and the resources will just get you there". Replaying things in a sequence is an old school way of looking at this, Lets take for example something that needs a rollback strategy - in DSC you just go to the previous configuration that was applied (granted there are plenty of our own resources here that need more work in that space rather than just provisioning). But when you apply the previous configuration the system shouldn't care that the system was a configuration ahead, or if it came from a previous configuration - it just enacts what is in the configuration it is given. So we really should be working to that goal - if you can think of a specific scenario that can't be done like that though I'm all ears mate :)

Now - the argument about how DSC should work sort of side tracked things a little here, so back to the main issue at hand. Taking the approach of creating a xSPProductUpdate resource (I'm reserving the right to name that better before we do this!) there are a bunch of considerations that would need to go in to this:

  1. When is a server allowed to install the binaries? What happens if multiple servers are installing binaries at the same time and then get caught needing to do a restart and you then lose all of your front ends to simultaneous (or near enough) reboots?
  2. How do you manage the fact that there are pending database upgrades? Again this will cause an outage in 2013 while the updates are applied so we need to be mindful of how we trigger this.

Realistically once we look at those two issues, we are a long way towards figuring out how this could realistically work. So take something like this:

xSPProductUpdate Jan2016CU
{
    InstallerPath = "C:\SP\Updates\CU.exe"
    BinaryInstallWindow = "8:00pm to 11:00pm"
    DatabaseUpgradeWindow = "1:00am to 5:00am"
}

Ignoring the "how we parse the time span strings" bit, there are some things to think about here - if you are doing a push configuration that doesn't automatically refresh and reapply the configurations, do we run it right away? I think probably - in which case you make the window properties optional and if you find a server that has its build version beneath the current farm we run the set method. This could be an approach that works.

You then also need to account for the fact that you don't know when a server will be running a consistency check to see if it is in line with the configuration. if you're checking within 15 min intervals then a wide potential install window is going to be fine, but if you're running longer than that then you need to specify a big wide window to ensure that it gets caught and runs the installer.

Then there are issues around how multiple servers are handled - to the point I made earlier, how do we ensure that all servers aren't patched at once. I know that a lot of customers tend to do their patching based on servers that are within a single fault domain at a time (so that's usually a physical host or common rack of servers on-prem, or within specified fault domains in a hosting providers environment like Azure IaaS). This is different to how we look at DSC configurations where we define roles based on topology - you're a WFE or an App server or a search box or whatever - the physical host doesn't play a role here from a configuration perspective but I know customers who will do the binary installation based on fault domains because in theory I have one of each topology role in each fault domain, so if I start patching everything in fault domain 1 I know i have a second fault domain that has at least one of every role left to keep things running while the binaries are done.

The other thing we need to look at here as well is how we test that one is installed already - I don't want to get in to the habbit of maintaining a list of every build number of every update to determine that "January 2016" maps to ​15.0.4787.1000. We need to look at whether or not the package has a build number associated with it so we can compare that to the farm output to determine if the patch level matches.

So taking all of that in to consideration, do we need to take an approach to this where we define the sequence of servers specifically? We already have to define specific server lists for search topology, and we are adding an optional server list for the distributed cache provisioning, so it's not a new concept in xSharePoint to do this, so something like this maybe?

xSPProductUpdate Jan2016CU
{
    InstallerPath = "C:\SP\Updates\CU.exe"
    InstallWindow = "8:00pm to 11:00pm"
    InstallOrder = @(
        "Server1",
        "Server2",
        "Server3"
    )
}

So in this model we look at running the binary installation one at a time, triggering the installs remotely one at a time from the one server. That addresses the sequencing issues and the fault domain point - but then we need an approach to how we determine when to reboot servers. When you are acting alone in DSC you can set the global "I need to reboot" flag, and if the LCM says that it will allow an automatic reboot it will just do it, otherwise it sits in a "pending reboot" state. When we are doing the installs remotely this creates other issues for us because we can't just reboot a server right away. We then also need to consider how the PSConfig stuff is handled on each server if we do it this way as well.

Overall I think my gut feeling on this is probably closer to the first one, where patching is done per server and we tell it when it's allowed to do the different bits. This means that the servers will report that they are out of compliance until a patch is installed (if I deploy the new configuration with the xSPProductUpdate resource at 2pm and the install window doesn't start until 8pm the resource will report false from the test method and then continue to do so until we unblock the setter).

That was a much longer response than I had planned - but hopefully you now see the considerations we need to decide upon here. Also all of this need to keep in mind that there will be more information coming on how the "zero downtime patching" in 2016 is to be achieved, and we need to make sure we can accommodate that once 2016 goes RTM. I'm not foreseeing that being some massively different process, but it is again something we need to make sure is accommodated.

caadam commented 8 years ago

DSC enables us to quickly and efficiently deploy boxes and entire farms. This kind of 'Infrastructure as Code' concept enables us to treat server like cattle and now pets. We shouldn't be encouraging or enabling people to keep servers & farms for long durations. If a server or farm needs is faulting or needs patching, use DSC to build a new farm, migrate the DBs etc, TEST TEST TEST and do a DNS cutover.

camiloborges commented 8 years ago

Let's ignore the principle and the devops and dsc principles. I don't argue with purists :)

Going back to the focus of the conversation, which is how to push product patches to an environment.

Brian, your whole thinking is completely cool, but is it really achievable(or worthwhile) to have really zero downtime? The priority list is a nice idea, and I think you handled a similar situation with search topology? Or was it the sample multiserver farm whereby you injected thread dependency? That might be good enough, you can perhaps add a delay between servers if you like, or order the sequence so you have a potential for a minimum downtime.

If we are to ignore zero downtime we can just have 2 checks for each box in the sequence.

Sent via iPhone while out and about

On 15 Feb 2016, at 10:03 PM, Cam Adams notifications@github.com wrote:

DSC enables us to quickly and efficiently deploy boxes and entire farms. This kind of 'Infrastructure as Code' concept enables us to treat server like cattle and now pets. We shouldn't be encouraging or enabling people to keep servers & farms for long durations. If a server or farm needs is faulting or needs patching, use DSC to build a new farm, migrate the DBs etc, TEST TEST TEST and do a DNS cutover.

— Reply to this email directly or view it on GitHub.

ykuijs commented 8 years ago

I am a bit torn between two thoughts:

BrianFarnhill commented 8 years ago

@camiloborges I agree that the priority list is a nice idea, but I do think we are stretching ourselves here, at least for a first cut of something. So perhaps we say "here is how the patch thing works, expect downtime in the window you specificy on the resource" or something like that?

@ykuijs If customers are doing it manually I think the question we need to ask why - what would it take for them to trust a script in a DSC resource to do it vs. doing it manually. If we understand why customers wouldn't want to automate it then it might help us know what the blockers to adoption would be.

camiloborges commented 8 years ago

@BrianFarnhill - That is it. Better simple and dependent on correct usage and guidance than perfect and never finished :) Go with the fact there will be downtime. I am yet to see a client that doesn't expect it anyway. :D

I have ideas around the question you asked @ykuijs but I will give him time to reply :P

ykuijs commented 8 years ago

Since updating is a tricky process, customers would like to have full control over the process. Where you can change less impactful settings (like Farm Admins or Outgoing email settings) outside of a maintenance window, DSC is very useful. Patching is much more difficult. Doing this manually or via a custom script, you have control over the exact process and the sequence of patching.

Is it possible to specify within DSC that two servers cannot run at the same time? If so, we can create two resources that depend on each other: One for installing the binaries, which can run at once on all servers and one for the upgrade process, which has to run one server at a time.

I would prefer to use as much DSC built-in logic and not implement our own logic to control the upgrade process. The process Brian described before tends to go the other way, which sounds a little to me like wrapping a patching script into a DSC resource. And I would like to prevent that as much as possible.

camiloborges commented 8 years ago

Hey @BrianFarnhill , so, what about the PU resource have a slightly different behavior in 1 named server? xSPVtcBFProductUpdate { Version="Jan2015" Binaries="Path" PSConfigServer="namedserver.contoso.com" }

then the process only executes PSConfig on the named server.

Can't get simpler, I think

ykuijs commented 8 years ago

Two questions:

  1. How to determine the version on the server? Where/how can you check Version=Jan2015??
  2. Why specify a PSConfigServer? PSConfig needs to run on each server in the farm. Just the first server on which PSConfig runs will upgrade all content databases, etc.

Just thought about the answer to my own question: Since SP2010 PSConfig automatically checks if a upgrade process is running on another server and if not, waits until that one is completed. Not sure if the corresponding Powershell cmdlets do the same btw.

On a side note: Does anyone know if the Set-TargetResource has a maximum time it can spend running?? The upgrade process can be a long running process and if DSC sets a maximum time, this might result in a timeout.

camiloborges commented 8 years ago
  1. a.SharePoint.Core.dll on the server GAC? Get the latest version? 1.b. no where, it should really be tche build version. Could also be optionally grabbed from the package itself, but that would be a little harder :)
  2. I am a retarded :) I simply forgot about this detail. Txs for calling out.

We can't have PSConfig running in more than 1 server before the first one is finished, can we?

ykuijs commented 8 years ago

Exactly, SharePoint only registers the build numbers. But keep in mind that build numbers can be very deceptive: https://blogs.technet.microsoft.com/stefan_gossner/2014/08/18/sharepoint-patching-demystified/ https://blogs.technet.microsoft.com/stefan_gossner/2014/10/23/common-question-why-does-the-version-number-on-the-servers-in-farm-page-not-change-after-installing-october-cu/

It totally depends on what exactly is included with the patch which files are updated and how to detect an update has been installed. As Stefan mentions, you cannot trust the "Servers in Farm" build number. The best way is to check the Patch status page

ykuijs commented 8 years ago

You can run PSConfig on multiple servers at the same time, however the actual upgrade process can only run on one server at a time. Therefore PSConfig places a lock on the configdb when it starts upgrading and beforehand checks if another server already has such a lock. If so, it will wait with the upgrade until the lock is released.

BrianFarnhill commented 8 years ago

I've tagged this as in progress as @ykuijs is leading up the work in this space

ykuijs commented 7 years ago

Now that PR391 is closed, this one can be closed.