autopilotpattern / wordpress

A robust and highly-scalable implementation of WordPress in Docker using the Autopilot Pattern
GNU General Public License v2.0
158 stars 41 forks source link

Multi-data center support #27

Open misterbisson opened 8 years ago

misterbisson commented 8 years ago

Data center awareness

WordPress + HyperDB supports running in multiple data centers. The HyperDB config includes comments on how to configure it for data center awareness:

/**
 * Network topology / Datacenter awareness
 *
 * When your databases are located in separate physical locations there is
 * typically an advantage to connecting to a nearby server instead of a more
 * distant one. The read and write parameters can be used to place servers into
 * logical groups of more or less preferred connections. Lower numbers indicate
 * greater preference.
 *
 * This configuration instructs HyperDB to try reading from one of the local
 * slaves at random. If that slave is unreachable or refuses the connection,
 * the other slave will be tried, followed by the master, and finally the
 * remote slaves in random order.
 * Local slave 1:   'write' => 0, 'read' => 1,
 * Local slave 2:   'write' => 0, 'read' => 1,
 * Local master:    'write' => 1, 'read' => 2,
 * Remote slave 1:  'write' => 0, 'read' => 3,
 * Remote slave 2:  'write' => 0, 'read' => 3,
 *
 * In the other datacenter, the master would be remote. We would take that into
 * account while deciding where to send reads. Writes would always be sent to
 * the master, regardless of proximity.
 * Local slave 1:   'write' => 0, 'read' => 1,
 * Local slave 2:   'write' => 0, 'read' => 1,
 * Remote slave 1:  'write' => 0, 'read' => 2,
 * Remote slave 2:  'write' => 0, 'read' => 2,
 * Remote master:   'write' => 1, 'read' => 3,
 *
 * There are many ways to achieve different configurations in different
 * locations. You can deploy different config files. You can write code to
 * discover the web server's location, such as by inspecting $_SERVER or
 * php_uname(), and compute the read/write parameters accordingly. An example
 * appears later in this file using the legacy function add_db_server().
 */

Though MySQL is not the only service that needs data center awareness:

Given the current implementation, it might be necessary to ignore performance issues with Memached and NFS transactions over the WAN. However, a better implementation would:

  1. Resolve cross-data center Memcached questions. This could involve implementing Facebook's mcrouter and replicated pools or ditching Memcached for Couchbase, which provides a Memcached-compatible interface with cross-data center replication
  2. Resolve cross-data center NFS questions. Object storage could be used as an exclusive alternative to filesystem storage, eliminating the need for NFS. It's possible that https://syncthing.net could provide sufficiently fast replication and sufficiently good conflict resolution. It's also possible that Nginx could be configured to force all http POST requests to WP instances a primary data center, to substantially reduce the risk of conflicts due to slow replication across the WAN. That would require that Nginx instances in the non-primary data center be able to connect to WP instances in the primary DC.

    Requirements for full active-active data center support

Story: The application will be deployed in data centers in two different regions connected by a WAN. Browsers may reach either data center with approximately equal frequency. Operators will specify one data center for the primary database instance, and the application will route requests internally to the correct primary instance in the correct DC.

Story: We need a minimal foot print of the application running in a remote data center so that we can quickly recover if the the primary data center fails. The replica data center is not handling any end-user requests under normal use, and there is no provision for automatic fail-over. This approach seeks to reduce challenges by eliminating activity in the replica data center that would cause frustration due to slow performance of requests over the WAN or inconsistency due to writes in separate DCs (Memcached and NFS).

cdsalmons commented 8 years ago

By extending the database with a dropin, a global table can be added to refine what needs to actively sync

`/**

// setup the list of known global database tables add_global_table( 'blogs' ); add_global_table( 'blog_versions' ); add_global_table( 'registration_log' ); add_global_table( 'signups' ); add_global_table( 'site' ); add_global_table( 'sitecategories' ); add_global_table( 'sitemeta' ); add_global_table( 'usermeta' ); add_global_table( 'users' ); add_global_table( 'bp_activity_sitewide' ); add_global_table( 'bp_activity_user_activity' ); add_global_table( 'bp_activity_user_activity_cached' ); add_global_table( 'bp_friends' ); add_global_table( 'bp_groups' ); add_global_table( 'bp_groups_groupmeta' ); add_global_table( 'bp_groups_members' ); add_global_table( 'bp_groups_wire' ); add_global_table( 'bp_messages_messages' ); add_global_table( 'bp_messages_notices' ); add_global_table( 'bp_messages_notices' ); add_global_table( 'bp_messages_recipients' ); add_global_table( 'bp_messages_threads' ); add_global_table( 'bp_messages_threads' ); add_global_table( 'bp_notifications' ); add_global_table( 'bp_user_blogs' ); add_global_table( 'bp_user_blogs_blogmeta' ); add_global_table( 'bp_user_blogs_comments' ); add_global_table( 'bp_user_blogs_posts' ); add_global_table( 'bp_xprofile_data' ); add_global_table( 'bp_xprofile_fields' ); add_global_table( 'bp_xprofile_groups' ); add_global_table( 'bp_xprofile_wire' ); add_global_table( 'bp_activity' ); add_global_table( 'bp_activity_meta' );

require_once WP_CONTENT_DIR . '/db-config.php';

if ( !defined( 'DATACENTER' ) ) { foreach ( $dc_ips as $dc_ip => $dc ) { if ( substr( $_SERVER['SERVER_ADDR'], 0, strlen( $dc_ip ) ) == $dc_ip ) { define( 'DATACENTER', $dc ); break; } } }

if ( file_exists( WP_CONTENT_DIR . '/db-list.php' ) ) { require_once WP_CONTENT_DIR . '/db-list.php'; }`

Then add in an abstraction layer to extend wpdb

class m_wpdb extends wpdb {

Once the class is extended and options defined, a method to connect to the database server/docker and select the correct database

` public function construct( $dbuser, $dbpassword, $dbname, $dbhost ) { register_shutdown_function( array( $this, 'destruct' ) );

    if ( WP_DEBUG && WP_DEBUG_DISPLAY ) {
        $this->show_errors();
    }

    $this->init_charset();

    $this->dbuser = $dbuser;
    $this->dbpassword = $dbpassword;
    $this->dbname = $dbname;
    $this->dbhost = $dbhost;

    // Try to connect to the database
    $global = $this->_get_global_read();
    $this->dbhglobal = @mysql_connect( $global['host'], $global['user'], $global['password'], true );
    $this->dbh = @mysql_connect( $global['host'], $global['user'], $global['password'], true );

    if ( !$this->dbhglobal ) {
        $this->_bail_db_connection_error();
    }

    $this->set_charset( $this->dbhglobal );
    $this->ready = true;
    $this->select( $global['name'], $this->dbhglobal );
}`

Then returning the information

` /* * Returns global database information * @access private * @global array $db_servers The array of databases. * @return boolean|array The global database information on success, otherwise FALSE. */ private function _get_global_read() { global $db_servers;

    if ( is_array( $db_servers['global'] ) ) {
        if ( count( $db_servers['global'] ) > 1 ) {
            $dc = defined( 'DATACENTER' ) ? DATACENTER : false;
            foreach ( $db_servers['global'] as $global ) {
                if ( $global['dc'] == $dc && $global['read'] > 0 ) {
                    return $global;
                }
            }

            // If still here we can't find a local readable global database so return first readable one
            foreach ( $db_servers['global'] as $global ) {
                if ( $global['read'] > 0 ) {
                    return $global;
                }
            }

            // Nope, none of those either so exit.
            return false;
        } else {
            return $db_servers['global'][0];
        }
    }

    return false;
}`

If this is the direction you want to go, let me know and I'll send you the rest of the multi-db code

cdsalmons commented 8 years ago

Another option that could work is RethinkDB I was thinking since it can pub/sub geo-spacial JSON, it can inject new keys in realtime into Consul and/or Memcached

This video is what got my wheels turning