CORE-POS / IS4C

Cooperative Operational Retail Environment
http://www.core-pos.com
GNU General Public License v2.0
64 stars 44 forks source link

Local, numflag, and extended attributes #833

Closed gohanman closed 7 years ago

gohanman commented 7 years ago

This isn't a new problem but I can foresee it becoming more of a problem in the coming year. The way product attributes are handled and stored in transaction logs is not ideal. I'll outline the status quo and then get into the problem and possible solutions.

Status Quo

The products table has a column named numflag that's treated as a series of 31 boolean flags (the last bit is ignored to sidestep CPU architecture differences). As of version 1.5 bits 1 through 16 are assigned by users based on their needs and bits 17 through 31 are reserved for the CORE-POS project to cover common use cases and future growth. The products table has an additional table named local that can can be zero (non-local), one (local), or higher values if the user has multiple gradations of local.

Where things get into trouble is that when an item rings up at the lane, the value in products.local is written to dtransactions.numflag.

Problem(s)

The last one is the big one. Since the value of products.local is recorded in the transaction log, albeit in an unexpected place, we can determine whether the item was considered local at the time it was sold. With the other attributes stored as flags in numflag we can only determine the item's current status. For some types of attributes or items that may be fine if the flag represents something that logically cannot change. But if the attribute does change periodically then only capturing the current status is not sufficient. I expect produce to be the biggest example as sourcing changes but I'm sure there are other instances, too.

Considerations

Before solutions, I think there are a couple points worth considering. The most important is that expanding bigArchive can be a very time-expensive operation. That doesn't mean it can't be the answer but deployment planning and downtime during ALTERs is worth considering.

The bit flag format, while convoluted from a developer standpoint, is highly effective at avoiding ALTERs. It's a reasonable test-bed to try an attribute and see if its popularity holds over time enough to merit a separate column.

Solutions

The straightforward one is to add a dtransactions.local column and store products values in their corresponding columns. The downsides from my perspective are the expense of the operation and the lack of a long term solution. The add-a-column conversation may well come up again.

JSON columns are somewhat interesting. Most RDBMSes have or are implementing a similar feature. I think it's a non-starter right now with the MySQL 5.7.8+ requirement but a catch-all "attributes" column that can store data more readably than a bitstring is appealing. I would want to know more about the underlying storage. It could waste colossal amounts of space storing the same markup over and over and over and may even bog down lane => server communication noticeably.

We could assign one of the CORE-reserved bits in numflag to "local". Stores with more than one would have to use some of the user bits as additional local flags. The main hang-ups would be migration scripts and lane-side understanding of the flags to appropriately mark local items on receipts.

We could create an attribute snapshot table in trans_archive. I'm picturing it as daily - e.g., on dateID 20170116 the item UPC 1234567890123 had numflag value 64. This does result in some data loss since edits can and will happen midday, but I doubt those edits are reliably happening at the exact moment an item changes anyway. It ought to be about as close as possible. I like how this scales in terms of flexibility to add columns. If the JSON route ever makes sense the snapshotting would provide de-duplication (i.e., we sold an item 10 times today but only stored its attribute string representation once).


At the moment the attribute snapshot table is my favorite option but I want to spend some more time thinking about it.

gohanman commented 7 years ago

Theoretical snapshot table:

Column Use
prodSnapshotID Standard generic ID column
dateID Either part of primary key or uniqueness constraint with upc
upc Other half of primary key or uniqueness constraint. The record represents this item on this day
numflag The value of products.numflag on that day

I don't see much value in reproducing the entire products table here. That's already handled by prodUdpate in terms of tracking changes. The role of this table would be purely in reporting that needs to find status-at-time quickly and can't afford to step through prodUpdate sequentially.

I also omitted storeID intentionally. I don't think it's worth spending space on attributes that don't vary on a per-store basis.

To make this effective, dlog tables need to have a date ID column to join against. That does open the same can of worms with adding new columns to bigArchive but in this case I think a date ID can be calculated in the view if the column doesn't exist in the underlying table. There's likely going to be a performance hit that way but at least the DB admin has more flexibility to choose trade-offs.

gohanman commented 7 years ago

Another amalgam of ideas:

Attach a JSON attributes column to products. This could be a new column or just renaming numflag as attributes. Migrating data ought to be automated either way. Until MySQL 5.7 adoption is more widespread this would have to be a TEXT column. These attributes could then be archived as a daily, per-item snapshot rather than stored directly in transaction logs.

This checks several boxes.

Local could be handled as a JSON attribute in theory, but I don't see any real upside. Leaving local's existing behavior unchanged reduces the number of moving parts here and that does seem worthwhile.

EDIT: Actually, this isn't the greatest idea. The point of the daily snapshot table would be to join against transaction tables to get item status at a given point in time. With JSON stored as TEXT it wouldn't be very easy to group/filter/etc based on attributes.

gohanman commented 7 years ago

Sketch - converting existing bit flags to JSON attributes:

function flagsToJSON($numflag, ProdFlagsModel $flags)
{
    $json = array();
    foreach ($flags->find() as $flag) {
        $value = (1 << ($flag->bit_number() - 1)) & $numflag;
        $json[$flag->description()] = $value;
    }

    // or json_encode($json)
    return $json;
}
gohanman commented 7 years ago

I think I finally figured this out.

One table holds the history of product attributes.

Column Type Purpose
attributeID INT primary key, auto incremented
upc VARCHAR(13) index; which item these attributes apply to
modifed DATETIME might as well track that
attributes TEXT A JSON-encoded attribute collection

and then there's a mapping table

Column Type Purpose
dateID INT join against transaction data
upc VARCHAR(13) join against transaction data
attributeID INT the attribute entry that applied to the item that day

The mapping table would have to get populated daily with something like

INSERT INTO AttributeMap
SELECT DATE_FORMAT(CURDATE(), '%Y%m%d'),
  upc,
  MAX(attributeID)
FROM Attributes
GROUP BY upc

This setup lets you join transaction data against the attributes that applied on a given day without storing the same JSON over and over every day in cases where the attributes haven't changed.

If the JSON part proves unwieldy, it should be fairly easy to widen the attribute history and give things dedicated columns without running into the scale problems that crop up in widening bigArchive.

gohanman commented 7 years ago

Implementation here is parallel to the existing prodFlags and numflag. The JSON version of the attributes are populated or updated when the existing flags are updated.