Thoughts on Running Two Stores with CORE POS

gohanman commented 8 years ago

WFC now officially has a working, two-site CORE installation [minor bugs I haven't found yet notwithstanding]. I think it merits a recap and analysis.

Backend Architecture

The server setup does largely follow the proposed HQ Model. I tend to think of the main store as HQ and any additional stores as Satellites.

HQ has a web server running Office and a database server. There's relatively little that differs at HQ from a typical single store location. I believe it's just one scheduled task and two additional webservice calls

The Satellite store also has a web server running Office and a database server. The Satellite database server is simply a MySQL slave replicating the HQ database. This component is entirely optional but I think would be foolish to skip. Regardless of how the Office installations work replication is by far the most bandwidth-efficient way to maintain offsite backups if the HQ database is more than a couple GBs.

The Satellite Office installation talks to the HQ database server. This Office instance is less useful for end users and in hindsight that's pretty obvious. In a typical page request after accounting for caching, the browser's going to send one HTTP request to the web server, the web server is going to make several requests to the database, and then send one HTTP response back to the browser. So talking to the webserver that's closest to its database is ultimately faster than talking to the webserver that's closest to the client browser (where "closest" means network distance; if the servers aren't physically sited at each store results could be quite different).

I still think this is the right approach from a data standpoint. Trying to write changes to both database and periodically synchronize them is a lot of work and cannot be done perfectly without running smack into the CAP theorem. There are different avenues to potentially improve things, but as it stands the Satellite Office installation exists primarily to manage scheduled tasks that are aware of its store ID as well as interacting directly with the HQ Office installation via HTTP requests.

Database Theory Interlude

One way to make the Satellite Office installation better for end-users would be to lean much more heavily on the [optional] pool of read-only databases. The vast majority of queries aren't writes. Managing the database connections appropriately would take a lot of development work but if done correctly a Satellite Office installation could direct most of its queries to the local replica. The usageStats table would have to go away so there isn't automatically a write attached to every request. There are ample FOSS web server log analysis tools though that could almost certainly provide much better metrics about what pages/tools are being used.

The other, radically different approach would be never synchronization the databases at all. Both stores would write to their own databases and changes would be shared as batches in some importable/exportable format that each store could apply as they see fit. I think this would ultimately be easier to implement and understand than periodic syncing - which IMO becomes progressively more difficult as it expands to more and more tables but difficult for end users to comprehend if it doesn't include all the tables.

Data Flows: Products

Each Office installation is responsible for pushing changes to its own lanes. So when either Office installation updates a product it needs to:

Push the updated record to its own lanes
Notify the other Office Installation(s) via JSON-RPC so the other installation(s) can update their lanes.

So far this works quite well. It's near instantaneous although I worry a little about scale. The way it currently works one store notifies the other of a change, waits for that store to update its own lanes, and gets the other store's response. It'd be ideal if the updating store sent a notification and immediately closed the connection without waiting for a response. I'm just not sure if the remote web server would continue executing the requested script to completion or cut it off prematurely.

Data Flows: Members

It's a different webservice call but this is nearly identical to products.

Data Flows: Transactions

This is where whether or not the Satellite database server exists factors in. I have one [named] database on that server that is not replicated from HQ and the Satellite store lanes all interact with it. This is safe from data issues so long as all operations on dtransactions are INSERTs.

As before, this is optional. Satellite lanes could write directly to the HQ database. In my mind the main advantages of this approach are:

The lanes are less likely to go into standalone
End of transaction operations like receipt printing and drawer kicking are faster when shipping data to the server is faster
You can't suspend/resume across stores. "Can't" sounds weird as a feature, but I think the situations where sending a transaction to another store actually makes sense would be vastly outnumbered by mistakes accidentally resuming the wrong transaction.

Once the data is on the Satellite database server it still needs to get back to the HQ database. This is handled by the SatelliteStore plugin. The plugin includes three sync related tasks: one is purely SQL based and I have not tested at all. The other two are separate send and receive steps using a Redis server in between. That's what I'm using.

The Satellite store(s) serialize their data as JSON and push it into Redis lists. This is really fast and the built in key-value storage makes it very easy for sending stores to keep track of the last record ID that they sent (this does depend on sequential keys so completely replacing a Satellite store database would involve seeding increment columns carefully). The HQ store runs the receive task to pop records off the Redis lists and save them to the HQ database.

Admittedly it introduces an extra layer of technology to set up and manage, but in practice there's very little of either. I find it an easy service to interact with both from a developer standpoint and an administrator standpoint wanting to check on sync status.

Representing Items

This has been included in earlier releases, but the model looks much like Harvest's. Each item has one record per-store. The by default encourages items to be identical but they can currently vary on any field. Legacy tools that aren't aware of this dual record tend to make messes although I can't image that's fixable. Incomplete items - i.e., records exists for store 1 but not store 2 or vice versa - are not immediately apparent from the UI or probably should be. At minimum they should be more easily detectable and have an auto-correct to create any missing records.

Miscellaneous

There was surprisingly little coding involved in the rollout. I count only 11 commits in the last week amounting to 3 bugs and 4 features (Redis sync, store-specific parameters, the member sync notification webservice, and SPINS file-naming for two stores). It helps obviously that many new features are built around my needs, but nothing major had to be added or overhauled at the last minute (/knock on wood).

The whole store got tagged and signed via CORE. The simple list builder in the ShelfAudit plugin was really useful for making lists of items that could be copy/pasted into advanced search to print tags or signs.

It's Datacap and iSC250s across the board. I've periodically noticed the old signature bug - customer signs and presses done, doesn't show up on screen - so that problem likely is not the IDTech's fault. I don't have any particular leads how to fix it but knowing it isn't hardware narrows things a bit.

I very nearly did the whole deployment without touching a C# compiler. The pre-build NewMagellan was missing a couple lines to reduce the bit-depth of the iSC250 signature bitmaps. The basic feature still worked at the lanes but shipping the images back to the server broke because the raw image exceeded the database field's capacity.

I may add to this as other stuff occurs to me.

gohanman commented 8 years ago

Driver I'm running @ the new store: https://github.com/CORE-POS/NewMagellan/releases/tag/1458230314

gohanman commented 8 years ago

A few more adjustments have trickled in. The StoreBatchMap table wasn't being used correctly to apply batches only to selected stores when batches were forced (it did behave when batches ran via scheduled task).

The default store selection, mostly but not exclusively for reports, is now determined via client IP address. Provided the stores are using different subnets this works quite smoothly. I considered creating a user preferences mechanism so logged in users could define their own default store but that seemed like it would have more overhead to use and manage.

I added a Symfony component to handle IP comparisons so that's another modern buzzword to go on a list somewhere. I've been pretty liberal with grabbing packages from Composer for multi-store mode. I think they're all still optional in that nothing with flat-out crash without them, but much of the cross-store communication stuff won't work without the dependencies installed.

gohanman commented 8 years ago

I'm debating having an entirely UI-less setting: 38811f96735276f33521451a84ce890058f89065

With satellite lanes shipping data to a different [named] database than HQ lanes they're defaulting to a different [named] database w/ Database::mDataConnect. On at least two occasions so far though the satellite lanes need to refer to a table that's in a different [named] database.

I think the downside to a UI-less setting is pretty obvious. But I don't know where to put it. Logically, mAlternative belongs on the Necessities tab with the other database server settings but it's not at all necessary in most circumstances. Adding a multistore tab for exactly one setting seems silly. And burying it some other random place isn't going to be intuitive either.

CORE-POS / IS4C