Multistore Backend Overhaul

gohanman commented 10 years ago

In the "very long term planning" category, I rewrote Fannie's "Stores" configuration and dropped all the old references to those config variables. I don't believe anyone was actually relying on this.

I added a new table named "Stores" containing information necessary for the local store to connect to remote stores. I added a server-side column to dtransactions named "store_id". Right now I just have a default value assigned to the column rather than configuring store_id at a lane level. The lanes are currently unaware of what store they belong to; they just ship data to the server and the server tags it with the appropriate store_id.

Right now there isn't any notion of an "HQ" or master store server. I'm sure there will be exceptions, but I'm picturing in the general case:

Stores will push changes to operational data out to each other (manually or via some form of replication).
Stores will pull new transaction data from each other.

Keeping all transaction data at all stores seems ideal from a backup standpoint. It should also be more resilient to network failure. The store will always be up to date within its polling interval. Besides store_id, the new(ish) store_row_id column is required to keep track of which remote records have been pulled. Indexes on those columns are recommended.

d215cae5693013436c21609a1797d2779dd73e30

chaploeb commented 10 years ago

Since this is long-term planning, I'll express that I have long-term concerns about the plan of keeping all transaction data at all stores. I think it is fine for everyone using CORE now, but my co-op's long-term vision involves the potential to have 5 or more stores. That's a lot of data to keep replicated across that number of stores. We should think about whether alternative architectures are possible.

David Chaplin-Loebell, IT Director Office: (215) 843-2350 x127

On Wed, Apr 23, 2014 at 3:10 PM, Andy Theuninck notifications@github.comwrote:

In the "very long term planning" category, I rewrote Fannie's "Stores" configuration and dropped all the old references to those config variables. I don't believe anyone was actually relying on this.

I added a new table named "Stores" containing information necessary for the local store to connect to remote stores. I added a server-side column to dtransactions named "store_id". Right now I just have a default value assigned to the column rather than configuring store_id at a lane level. The lanes are currently unaware of what store they belong to; they just ship data to the server and the server tags it with the appropriate store_id.

Right now there isn't any notion of an "HQ" or master store server. I'm sure there will be exceptions, but I'm picturing in the general case:

Stores will push changes to operational data out to each other (manually or via some form of replication).

Stores will pull new transaction data from each other.

Keeping all transaction data at all stores seems ideal from a backup standpoint. It should also be more resilient to network failure. The store will always be up to date within its polling interval. Besides store_id, the new(ish) store_row_id column is required to keep track of which remote records have been pulled. Indexes on those columns are recommended.

d215caehttps://github.com/CORE-POS/IS4C/commit/d215cae5693013436c21609a1797d2779dd73e30

Reply to this email directly or view it on GitHubhttps://github.com/CORE-POS/IS4C/issues/342 .

Community-owned food markets open to everyone. www.weaversway.coop

gohanman commented 10 years ago

How would you picture the data flows in that scenario? Would there be one master server - at a store or even a separate office - that does have all transaction data for all stores? I think that would work provided the pull behavior is configurable per-store. E.g., Store 1 pulls data from Stores 2 through 5 on a regular basis; Stores 2 through 5 do not pull any data.

Allowing for alternative architectures is a very good idea. Frankly, since I have no experience with multiple stores I really doubt I'll get it perfectly right on the first try. Building with the assumption that flexibility will be required makes sense. Opinions from people who have more experience with this scenario are also quite helpful, of course.

chaploeb commented 10 years ago

Yeah, making pull behavior configurable per-store probably addresses most concerns; that makes the replication topology pretty flexible. If you decided you needed an HQ server you could potentially implement one (it's just a store with no activity or lanes of its own, but which pulls data from all other stores). If you wanted to have most stores have all data but you had a satellite store with a slow network link which didn't pull from larger stores, you could do that.

What about a "parent" store that is responsible for a satellite store? That store might want to pull data from the satellite but not from other large stores. It might make sense to be able to specify which other stores a store pulls data from-- all, none, or a specific list.

Another possibility to think about (again, slow network links) is whether a parent store might want to pull from a satellite store, and then be responsible for replicating that store's data elsewhere. This might be too complex to worry about, though.

David Chaplin-Loebell, IT Director Office: (215) 843-2350 x127

On Wed, Apr 23, 2014 at 4:03 PM, Andy Theuninck notifications@github.comwrote:

How would you picture the data flows in that scenario? Would there be one master server - at a store or even a separate office - that does have all transaction data for all stores? I think that would work provided the pull behavior is configurable per-store. E.g., Store 1 pulls data from Stores 2 through 5 on a regular basis; Stores 2 through 5 do not pull any data.

Allowing for alternative architectures is a very good idea. Frankly, since I have no experience with multiple stores I really doubt I'll get it perfectly right on the first try. Building with the assumption that flexibility will be required makes sense. Opinions from people who have more experience with this scenario are also quite helpful, of course.

Reply to this email directly or view it on GitHubhttps://github.com/CORE-POS/IS4C/issues/342#issuecomment-41207728 .

Community-owned food markets open to everyone. www.weaversway.coop

gohanman commented 10 years ago

I'm reassured to hear an HQ defined the way I imagined it: a store that just happens to have zero lanes.

A simple pair of boolean "push" and "pull" in the Stores table would likely cover a lot of different relationships for which stores interact & how. The basic parent relationship described would work.

Chaining would be more complicated. Say parent store (1) pulls data from satellite store (2). Now when HQ store (3) polls the parent store, it needs to ask for data from both store_id=1 and store_id=2. The stores data structure would need to capture almost the whole topography so a store could understand what's going on multiple network hops (tiers?) away.

chaploeb commented 10 years ago

The more I think about this the more I think chaining is overkill and/or could be handled by tools outside of CORE if it were needed.

David Chaplin-Loebell, IT Director Office: (215) 843-2350 x127

On Wed, Apr 23, 2014 at 4:56 PM, Andy Theuninck notifications@github.comwrote:

I'm reassured to hear an HQ defined the way I imagined it: a store that just happens to have zero lanes.

A simple pair of boolean "push" and "pull" in the Stores table would likely cover a lot of different relationships for which stores interact & how. The basic parent relationship described would work.

Chaining would be more complicated. Say parent store (1) pulls data from satellite store (2). Now when HQ store (3) polls the parent store, it needs to ask for data from both store_id=1 and store_id=2. The stores data structure would need to capture almost the whole topography so a store could understand what's going on multiple network hops (tiers?) away.

Reply to this email directly or view it on GitHubhttps://github.com/CORE-POS/IS4C/issues/342#issuecomment-41213716 .

Community-owned food markets open to everyone. www.weaversway.coop

jdpurdyvi commented 10 years ago

@gohanman I like where you're going with this.

For the Wedge, we're relying on HQ receiving opdata from any/all satellite stores overnight. It's similar to how we batch import our online sales already. For day-to-day reports, satellite operations VPN into HQ for master reports. Should the internet go down, the reporting server is duplicated at satellite locations, but with only site-specific data. HQ has the ability to VPN into satellite locations to view same-day reports from the site-specific reporting server.

We're putting off pushing data back to site-specific reporting servers for now. I'm sure it will come up though.

gohanman commented 10 years ago

Ideally, I'd like to keep the VPN (or equivalent) connection up permanently and duplicate transaction data in real-ish time. I don't know what kind of interval will be realistic performance-wise, but I'd like to have syncing at least every half hour. That should be the only required cross-site communication on the transaction side. Reporting and other calculations based on transaction can happen independently.

chaploeb commented 10 years ago

That sounds like the right approach.

One thing to think about is what happens if the link is down for an extended period of time-- is there a way to get resynced via sneakernet? We've run into trouble with our current POS because we've had extended (multi-day) downtimes for the store-to-store link, and there was no practical way to avoid syncing the backlog once the link came back up, which caused additional problems of its own.

David Chaplin-Loebell, IT Director Office: (215) 843-2350 x127

On Fri, Apr 25, 2014 at 11:57 AM, Andy Theuninck notifications@github.comwrote:

Ideally, I'd like to keep the VPN (or equivalent) connection up permanently and duplicate transaction data in real-ish time. I don't know what kind of interval will be realistic performance-wise, but I'd like to have syncing at least every half hour. That should be the only required cross-site communication on the transaction side. Reporting and other calculations based on transaction can happen independently.

Reply to this email directly or view it on GitHubhttps://github.com/CORE-POS/IS4C/issues/342#issuecomment-41408787 .

Community-owned food markets open to everyone. www.weaversway.coop

gohanman commented 10 years ago

In theory, sure. You could disable the pull task(s), export the data, transport it to the other store(s), and import. Or you could insert a placeholder record with a particular store_row_id to control the point where syncing resumes and then manually fill in the skipped records later.

gohanman commented 10 years ago

New stores table and pull tasks are in master as of release 0.9.13.

I believe this is essentially done in terms of transaction data and getting it from store servers to its final destination(s). The operational data side will likely be a lot more complicated with decisions about which products (and even subfields) get pushed out to which locations.

gohanman commented 10 years ago

I may have stumbled into another potential piece of the puzzle: using web services for intra-server communication over HTTP. If there's a Fannie(ish) web server at each store, they can transmit requests and responses directly to one another. This could be helpful for push style notifications to supplement database-level changes.

Say for instance product data is being synchronized through some form of replication. One store edits a product. It can then push out a notification saying "Hey, I changed this record". The remote store server is aware of the change without resorting constant polling, and since this is happening at a higher level than the database there's more flexibility to take action. Maybe the remote server pushes the change out to its own lanes or maybe has some kind of queue for a person to review. The important thing is actual program logic gets executed at that point.

My mockup uses JSON primarily because it can interact with javascript easily for more dynamic UIs but also because I dislike SOAP. b63720a4bbf3587442c002daed641cad5d6d89dd

roberski commented 10 years ago

I'm playing around with this, I notice that both stores create a "Current location" that is set to store one and say they are store one. How do I make one know that it is store 2? Are the store numbers always local?

gohanman commented 10 years ago

It's checking stores.dbHost to figure out which entry is itself. Swapping the database info for stores 1 & 2 should cause it to switch. It only creates an entry if one doesn't exist. That may be a bad idea. I haven't thought out what actual "install instructions" would look like for a new store.

gohanman commented 10 years ago

Products handling, RFC and such:

This has been floating in limbo long enough. Having seen a bunch of ideas, I think it's time to start sketching a plan. I like bits and pieces of everything I've seen. I think this covers the requirements:

A backend structure with one record per UPC for reporting purposes
A frontend structure with one record per UPC for ringing up items
A backend structure with per-store customizations

Plan, take one:

The products table will allow, but not require, per-store entries - i.e., primary key (upc, store_id). This gives maximum flexibility in per-store customization. To add sanity back to the mix, there will be a supplementary table ProductRules with structure (upc, columnName, customizable). Maybe this table actually drives what's available in the UI or maybe it's just for auditing purposes, but either way CORE can prepopulate this table on installation with sensible defaults. Rather than a complete free-for-all there's at least a recommendation what should and should not vary.

The lane table will be built from the products table by coalescing the record with that store's ID and the master record with store_id zero. What I find nifty about this approach is the master record is optional. You can have master records for every item and only have store-specific records as needed, or you can just have records for each store. Either way coalescing should build a proper, one-record-per-UPC record for the given store. Again, there should definitely be an out-of-the-box recommendation but I like knowing it's possible to revise the approach in the future if needed with minimal headache. I'm not particularly fond of using "inventory" as a term for this. LaneProducts or StoreProducts seems more descriptive. The lane itself has relatively few references to the products table so a rename won't be a significant rewrite. I don't think there's any good reason to have this table differ structurally from products other than the unique-UPC restriction.

I'm very tempted to simply use the above LaneProducts / StoreProducts for the reporting table requirement. This does require some significant work overhauling reports, but I don't think there's any way to avoid significant work somewhere in the system.

roberski commented 10 years ago

Inadvertently restarted discussion in #376.

gohanman commented 9 years ago

Resurrected in #528

CORE-POS / IS4C

Multistore Backend Overhaul #342