ProjectNami / projectnami

WordPress powered by Microsoft SQL Server
http://projectnami.org
Other
269 stars 139 forks source link

High Availability Multi-Site Installation #305

Closed Vidocq24601 closed 6 years ago

Vidocq24601 commented 6 years ago

Hi Patrick,

Could you provide high level guidance on how you would approach ensuring high availability for the following scenario?

Specifically, what are your recommendations to avoid downtime in the event of an Azure outage that impacts an entire region?

How could something like Traffic Manager be deployed with Project Nami in a way that retains most up to date data and also serve websites in the event of a region failure? Related: what is the optimal method to synchronize identical WebApps and databases across regions? Is this the optimal failover strategy in the event of an outage or is there a superior method to handle such a circumstance?

Thank you in advance for all of your efforts. I look forward to understanding the best failover strategy for a multisite PN deployment.

patrickebates commented 6 years ago

Having personal experience with no less than three critical outages related to the South Central US datacenter (including the one you referenced... last week was so much fun...), I understand your concern all too well. That said, the level of detail you are requesting goes far beyond what support we would normally provide. So at the very least I will touch on some items you haven't.

First, I suggest becoming familiar with the default geo-redundant datacenter for the one(s) you currently use. For example, South Central and North Central US are one pair with geo-redundant Storage options operating between them.

Second, I suggest becoming familiar with the following https://docs.microsoft.com/en-us/azure/sql-database/sql-database-geo-replication-overview

Third, Azure Storage plugin to get the Media library out of the server itself. https://wordpress.org/plugins/windows-azure-storage/

I think you have the right idea regarding Traffic Manager, but due to how WordPress operates you would probably need a failover setup rather than load balancing. To actually cut over to the other datacenter, you will most likely be looking at manually failing both the SQL and the Storage accounts.

Finally, that leaves you vulnerable to something that happened last week. It's not widely known, but a substantial amount of the Azure headend is located in South Central US. Depending on just how big the impact is there, you may encounter a situation where portions of the Azure portal simply won't operate. Had to deal with that myself last week. Unable to trigger failover in some services, unable to deploy code changes to enable workarounds with others, etc.