Avoiding conflicts between servers through a DSC configuration Database

BrianFarnhill commented 8 years ago

One of the issues we've had from day 1 of SharePointDsc has been how to avoid conflicts between resources running on different servers. Right now we have an "after the fact" approach that involves catching that error messages from SharePoint that talk about "update conflicts" and then flagging to the DSC engine that a reboot is pending as a way to loop back around and trying again, but this isn't great. It also won't help us for things like two servers trying to create a farm at the same time if we merge SPCreateFarm and SPJoinFarm as discussed in #455. We need a new approach here.

After spitballing some ideas for this around with a few of my colleagues the best idea we came up with was to use a database to manage "locks". Here's a run down of how the whole process would work:

A new resources (called SPDscConfigDatabase or something) would be used to provision a configuration database - it would take a property name of the server that would be responsible for provisioning the resource out the database
This resource would create registry entries to store the connection string and details about the config DB here
When a resource that would require a lock runs, it would check for the registry entries created by the resource above and then attempt to create an entry in a table where a primary key constraint would ensure that no two servers could create the same entry for a lock at the same time, so we now can make sure that things only happen on one server at a time
After the resource finishes (pass or fail) it removes the lock entry from the DB

This could help us solve things like the SPFarm resource, as well as then giving us some options for running resources that would typically just run on one server (like SPWebApplication) on multiple servers which can improve the high availability of configuration controlled through SharePointDsc.

To help with backwards compatability as well we could also make sure that if the registry entires aren't set by the new config DB resource then we skip that lock check and just do it. We can also wrap a lot of this sort of thing up in the helper methods to make sure we minimise the amount of change we need to make to other parts of the code throughout all the resources as well.

Now this is definitely a big enough change to make me want to include it in the v2.0 discussions (issue #456) rather than try to make it fit the current model. But I wanted to throw it out there for some discussion to see what people think of the approach.

ykuijs commented 8 years ago

Not sure exactly what you mean here. Is the configuration database you are talking about in step 1 the SharePoint ConfigDB or a custom database?

SharePoint itself is using configDB locking, for example during patching. Can't we reuse this logic?

BrianFarnhill commented 8 years ago

It would be a separate one, so that we could use it before a server is joined to a farm, and also touching the config DB directly is unsupported. I did think about using things like the SPFarm property bag but that can't guarantee isolation and integrity when multiple servers try to set values and things like a custom DB would be able to.

MikeLacher448 commented 8 years ago

I see the issue you're trying to solve, I'm just concerned with adding external dependencies. Especially for customers (like mine), that strictly follow the rule of 'thou shall not use the SharePoint SQL server for non-SharePoint databases', this would require a separate SQL instance just for this database (assuming they don't have a general purpose SQL server to use).

BrianFarnhill commented 8 years ago

@MikeLacher448 Ahhhh, actually that's a good point, I know when we run risk assessments this would trigger that rule as well. OK so that complicates the SQL option a little - is there something else we could look at using that would work in place of SQL though? Or is this something we would need to just cop on the chin as it really could be considered a SharePoint database (sort of). This might be something I could bring up with the product group actually to get some advice on.

camiloborges commented 8 years ago

Hello

Question for you guys. How do you manage 3rd party product databases in such clients. Are they hosted in a different sql instances? In my experience they usually end up in the same instance. Same rule should apply here.

On Thu, Nov 17, 2016 at 11:44 AM, Brian Farnhill notifications@github.com wrote:

@MikeLacher448 https://github.com/MikeLacher448 Ahhhh, actually that's a good point, I know when we run risk assessments this would trigger that rule as well. OK so that complicates the SQL option a little - is there something else we could look at using that would work in place of SQL though? Or is this something we would need to just cop on the chin as it really could be considered a SharePoint database (sort of). This might be something I could bring up with the product group actually to get some advice on.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PowerShell/SharePointDsc/issues/457#issuecomment-261120047, or mute the thread https://github.com/notifications/unsubscribe-auth/AIinrN6TrthCk-ccHwsudpdDjTomzbejks5q-6NWgaJpZM4KwoRm .

BrianFarnhill commented 8 years ago

@camiloborges What "should" happen and what "does" happen are different things. Microsoft's advice has always been to not share the SQL instance used for SharePoint with other databases (typically related to the non-standard settings we do like MAXDOP), so we can't go against that advice here just because people usually do it anyway, we need an option that aligns with existing advice (or product group endorsement to modify that advice)

mariusv76 commented 8 years ago

My 2 cents. When creating a farm currently with DSC you also specify the port for central admin, which creates CA on the server the farm is created. Does it mean the SPFarm resource will need to provision central admin on a specific server? Do we need to add another property to indicate which server will need to host CA or a property to indicate the current server? Or do we need an additional resource that provision CA on a specific server (SPCentralAdmin)?

I would actually prefer the control of where and which server provisions the farm to reside in the DSC script and then using the WaitForAll resource for the remaining servers.

This will allow for no external dependency in the SharePointDSC resources. The dependency is on the script. The arch/cons should hopefully know which server should be in which role and create the dsc script in accordance to their design.

If we keep the SPCreateFarm resource, it can check whether the farm exists, create it if not or join the server if the farm exists, also check for CA on the current server. (If possible). The SPJoinFarm will continue to function as always.

My understanding of DSC is that the "State" is maintained by the system (SharePoint) and not the dsc resources, the resources confirm/checks/verifies that the system is in the correct state, if not it instructs the system to correct it. Thus SharePointDSC should not be invasive on any of the SharePoint servers or any other component of the environment.

johlju commented 8 years ago

Instead of a database, couldn't it be an optional "witness" share somewhere? The setup account need to have write access to it. If not an error should be thrown (if the optional parameter is used). In that "witness" share there could be a file in a specific format which tells the status of all the dependencies for other nodes. Unless specified dependencies are meet than the node waits (-Wait could be optional as well).

Just a conceptual idea.

BrianFarnhill commented 8 years ago

@mariusv-msft Yea good call on the central admin thing, so we still need to control which one goes first. The problem with the depends on approach though (as I've seen it in my customers) is that if you use SPCreateFarm on one server and then WaitForAll and SPJoinFarm on the others, the minute you lose that first server, you actually lose DSC on the entire farm. The WaitForAll will never pass, and since it will pretty much always be near the top of the dependency chain the other servers will stop testing for and correcting anything else because of that failed dependency. That's why I'm looking to make things a little more autonomous between the servers. To your point about being "invasive" to SharePoint I 100% agree, that's why we do our best to stick to using out of the box PowerShell cmdlets and that sort of thing so we are just 'steering' SharePoint with the DSC resources - but from my point of view what I'm thinking we need here is something to help better coordinate that steering between servers without relying on things like WaitForAll because of the dependencies it creates. Perhaps if we switched to a "SPFarm" resource we need to add a flag to indicate if that server should be hosting central admin? That way we could essential control which servers will be eligible to create the farm based on that fact that if the farm doesn't exist and a server is set to not host central admin, it can wait for another server that will host it to do that creation before it then joins in. Could that approach be better do you think?

@johlju I did have the idea of a witness share as well, but my issue with this approach is that we then need to implement something to make that share highly available (with the goal of this approach really being that we want things to continue gracefully when single points fail). So we can just set up a share on one of the SP servers or anything as if that server drops out we lose the witness and can't make decisions. That was the thinking that drove me to SQL since it generally will have HA and DR stuff configured for the rest of the farm anyway, so we could leverage that.

ykuijs commented 8 years ago

I have thought about using a share as well, but this doesn't feel very robust.

I totally agree with the comment made by @MikeLacher448, as Microsoft we always recommend not to place non-SharePoint databases on the same instance as SharePoint. And even though this database is used for SharePoint, it isn't a real SharePoint database.

It is however possible to create custom databases which are considered SharePoint databases. For another solution we have created in our team, we are using the SharePoint database provisioning methods to create and manage our own database. This might be a solution we could use as well. The only downside is that this isn't possible until SharePoint has been installed and a farm created. So we cannot use the database for the SPFarm resources.

So if we figure out the best way to solve the SPFarm issue, the SPFarm resource can create the DSC ConfigDB from within SharePoint. We can then use this database to achieve server locking or even determine which server to run the resources on.

ykuijs commented 8 years ago

@BrianFarnhill What if you want to host the CA on multiple servers? Then you still run the risk of two servers trying to create the farm. Would a parameter "DSCServer" be an option? That server is always used for creating a farm.

And if we can somehow prevent or detect that an admin configures two servers as DSCServer, I think we are there. Yes, we are still relying on one server to create a farm, but if you have an issue with that server that early in the install process, you have another issue :-)

johlju commented 8 years ago

The share witness should not be on SharePoint server, this should preferably be on a SAN or a DFS share, or Storage Server, or a regular single Server with a share - whatever the user decide is the best option for them. I think we need to look outside of the SharePoint farm for help with this. It could as easily be a database witness in a different SQL AlwaysOn environment if the user want that. I'm thinking if you want uptime, you need the infrastructure for it.

BrianFarnhill commented 8 years ago

@ykuijs Would love to see that solution you came up with for the provisioning - wanna email me the details? If that's suitable we could use that for the database and just solve SPFarm like you suggest. We could also look at something like creating a near empty stub database while the farm is being provisioned to indicate that the provisioning is in progress, or something like that to solve SPFarm and then clean it up afterwards. Either way I feel we can solve SPFarm on it's own if that database option is a go-er, so flick me the details and I'll have a look.

ykuijs commented 8 years ago

@BrianFarnhill Did some testing over the past few days. I just shared the information with you

MikeLacher448 commented 7 years ago

two possible ways for testing if a farm already exists, so that your SPFarm resource can determine if it should join or create.. 1 - query SQL for existence of the farm config db 2 - try to hit the topology service on all the other servers in the farm.. if just one responds, assume the farm is built, and try to join. Having a hard time thinking of a good way to get the names of all farm servers passed to the resource, without requiring an extra parameter to be passed containing this info.

neither of these approaches address the issue of needing to put a lock somewhere so not every server tries to create a farm at once.

BrianFarnhill commented 7 years ago

@MikeLacher448 Yea I've been thinking it will end up being number 1 because then we can do it without other parameters. The scenario gets interesting when you think about the database existing because the farm is being built but isn't ready yet though (think about 2 servers hitting the resources a few seconds apart). But I think this is likely how we'll need to make the SPFarm resource work and that gets us in to having a farm so we can do some stuff with locks within the farm.

luigilink commented 7 years ago

Hello All, I have another scenario with a customer. On existing SharePoint Farm, one SharePoint server lose temporarily the connection with the SQL. At the same time, the DSC configuration try to create a new SPConfigurationDatabase because Get-SPFarm return $null. Can we also check with another way that the server is already in a farm, like registry for instance?

mrpullen commented 7 years ago

How about creating a set of stored procedure(s) that evaluates if the SPFarm Config Database exists? If it exists, we would return true; server could join the farm. If it didn't exist it would engage the mutex / semaphore, perform the lock ( which would keep other systems from executing the stored procedure)..

Then this single node would be the owner, it would be able to go ahead and configure the config db. After it has done that it can release the mutex, letting the other servers "get in" and join the farm.

http://www.sqlservercentral.com/articles/Mutex/70908/

Then just clean up the stored procedures.

BrianFarnhill commented 7 years ago

@mrpullen I have some logic on the go in one of my branches where we check for some registry keys that @luigilink pointed out as well as checking for the existence of the database on SQL that seems to be working quite well so far. I need to do some more testing of my code to make sure it holds up in some more complex scenarios but we might be able to do this without that. I do like that idea though, so if I can't get my stuff working without it I'll look in to those as an option too.

ykuijs commented 7 years ago

@BrianFarnhill We have implemented the SPFarm resource, which is using the custom database, but we still haven't got a solution for other resources. Do we still need action on this issue?

BrianFarnhill commented 7 years ago

@ykuijs Very much so, the SPFarm resource will block the SPFarm resource from running in parallel, but i think something that uses tags in the SPFarm property bag will mean we can run any resource from multiple servers at the same time, which can then lead to eliminating the need for a "farm server" to run all the single instance resources, and we can then have higher availability of configuration.

ykuijs commented 7 years ago

@BrianFarnhill Have you looked at the solution I shared with you several months ago, with the custom SP database? That would be an ideal solution to solve this issue, but not sure if this will work well enough in Powershell

BrianFarnhill commented 7 years ago

@ykuijs I did - I couldn't get anyone to sign off on it being a "SharePoint database" though which means we couldn't put it on the same server as the rest of the SharePoint DBs. This is what led me back to the property bag idea on the SPFarm.

ykuijs commented 7 years ago

If the database is created via SharePoint (inherited the SPDatabase class), SharePoint considers it as a SharePoint database. That is the way you can create custom databases for example for a custom service application.

BrianFarnhill commented 7 years ago

@ykuijs If you can pull that off in code then make it so! I thought we found a technical blocker on that one though? If not, then yes that's the way to go I think.

ykuijs commented 7 years ago

@BrianFarnhill Let me dive into this. We might need a custom DLL for this. But your addition for MS Project Server is also doing that, so we can reuse that method.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had activity from the community in the last 30 days. It will be closed if no further activity occurs within 10 days. If the issue is labelled with any of the work labels (e.g bug, enhancement, documentation, or tests) then the issue will not auto-close.

ykuijs commented 6 years ago

Ok, I have been testing my proposed solution and as it turns out it does have some issues: You can load the DLL on-the-fly or create a type dynamically and create a new custom SharePoint database, based on the new type you define in the DLL (inherited from SPDatabase). All SharePointDsc resources that use this DLL/type will be able to use the custom database.

However: Since the DLL hasn't been deployed on all servers, default SharePoint processes like timer jobs or PowerShell cmdlets like Get-SPDatabase will start throwing errors. They are unable to retrieve the type of the database.

The only way of getting around this issue is to deploy the DLL to the GAC of all servers, but I rather would not do that. So I think this method isn't usable....unfortunately.

BrianFarnhill commented 6 years ago

@ykuijs Does that put you back into using the SPFarm property bag as the data store then? It's a little more basic but its universally available

ykuijs commented 6 years ago

@BrianFarnhill I think it does. I do not see any other option at the moment.

dsccommunity / SharePointDsc

Avoiding conflicts between servers through a DSC configuration Database #457