Closed gohanman closed 7 years ago
Theoretical snapshot table:
Column | Use |
---|---|
prodSnapshotID | Standard generic ID column |
dateID | Either part of primary key or uniqueness constraint with upc |
upc | Other half of primary key or uniqueness constraint. The record represents this item on this day |
numflag | The value of products.numflag on that day |
I don't see much value in reproducing the entire products
table here. That's already handled by prodUdpate
in terms of tracking changes. The role of this table would be purely in reporting that needs to find status-at-time quickly and can't afford to step through prodUpdate
sequentially.
I also omitted storeID
intentionally. I don't think it's worth spending space on attributes that don't vary on a per-store basis.
To make this effective, dlog
tables need to have a date ID column to join against. That does open the same can of worms with adding new columns to bigArchive
but in this case I think a date ID can be calculated in the view if the column doesn't exist in the underlying table. There's likely going to be a performance hit that way but at least the DB admin has more flexibility to choose trade-offs.
Another amalgam of ideas:
Attach a JSON attributes column to products
. This could be a new column or just renaming numflag
as attributes
. Migrating data ought to be automated either way. Until MySQL 5.7 adoption is more widespread this would have to be a TEXT
column. These attributes could then be archived as a daily, per-item snapshot rather than stored directly in transaction logs.
This checks several boxes.
bigArchive
can stay the way it is.Local could be handled as a JSON attribute in theory, but I don't see any real upside. Leaving local's existing behavior unchanged reduces the number of moving parts here and that does seem worthwhile.
EDIT: Actually, this isn't the greatest idea. The point of the daily snapshot table would be to join against transaction tables to get item status at a given point in time. With JSON stored as TEXT
it wouldn't be very easy to group/filter/etc based on attributes.
Sketch - converting existing bit flags to JSON attributes:
function flagsToJSON($numflag, ProdFlagsModel $flags)
{
$json = array();
foreach ($flags->find() as $flag) {
$value = (1 << ($flag->bit_number() - 1)) & $numflag;
$json[$flag->description()] = $value;
}
// or json_encode($json)
return $json;
}
I think I finally figured this out.
One table holds the history of product attributes.
Column | Type | Purpose |
---|---|---|
attributeID | INT | primary key, auto incremented |
upc | VARCHAR(13) | index; which item these attributes apply to |
modifed | DATETIME | might as well track that |
attributes | TEXT | A JSON-encoded attribute collection |
and then there's a mapping table
Column | Type | Purpose |
---|---|---|
dateID | INT | join against transaction data |
upc | VARCHAR(13) | join against transaction data |
attributeID | INT | the attribute entry that applied to the item that day |
The mapping table would have to get populated daily with something like
INSERT INTO AttributeMap
SELECT DATE_FORMAT(CURDATE(), '%Y%m%d'),
upc,
MAX(attributeID)
FROM Attributes
GROUP BY upc
This setup lets you join transaction data against the attributes that applied on a given day without storing the same JSON over and over every day in cases where the attributes haven't changed.
If the JSON part proves unwieldy, it should be fairly easy to widen the attribute history and give things dedicated columns without running into the scale problems that crop up in widening bigArchive.
Implementation here is parallel to the existing prodFlags
and numflag
. The JSON version of the attributes are populated or updated when the existing flags are updated.
This isn't a new problem but I can foresee it becoming more of a problem in the coming year. The way product attributes are handled and stored in transaction logs is not ideal. I'll outline the status quo and then get into the problem and possible solutions.
Status Quo
The
products
table has a column namednumflag
that's treated as a series of 31 boolean flags (the last bit is ignored to sidestep CPU architecture differences). As of version 1.5 bits 1 through 16 are assigned by users based on their needs and bits 17 through 31 are reserved for the CORE-POS project to cover common use cases and future growth. Theproducts
table has an additional table namedlocal
that can can be zero (non-local), one (local), or higher values if the user has multiple gradations of local.Where things get into trouble is that when an item rings up at the lane, the value in
products.local
is written todtransactions.numflag
.Problem(s)
The last one is the big one. Since the value of
products.local
is recorded in the transaction log, albeit in an unexpected place, we can determine whether the item was considered local at the time it was sold. With the other attributes stored as flags innumflag
we can only determine the item's current status. For some types of attributes or items that may be fine if the flag represents something that logically cannot change. But if the attribute does change periodically then only capturing the current status is not sufficient. I expect produce to be the biggest example as sourcing changes but I'm sure there are other instances, too.Considerations
Before solutions, I think there are a couple points worth considering. The most important is that expanding
bigArchive
can be a very time-expensive operation. That doesn't mean it can't be the answer but deployment planning and downtime duringALTER
s is worth considering.The bit flag format, while convoluted from a developer standpoint, is highly effective at avoiding
ALTER
s. It's a reasonable test-bed to try an attribute and see if its popularity holds over time enough to merit a separate column.Solutions
The straightforward one is to add a
dtransactions.local
column and storeproducts
values in their corresponding columns. The downsides from my perspective are the expense of the operation and the lack of a long term solution. The add-a-column conversation may well come up again.JSON columns are somewhat interesting. Most RDBMSes have or are implementing a similar feature. I think it's a non-starter right now with the MySQL 5.7.8+ requirement but a catch-all "attributes" column that can store data more readably than a bitstring is appealing. I would want to know more about the underlying storage. It could waste colossal amounts of space storing the same markup over and over and over and may even bog down lane => server communication noticeably.
We could assign one of the CORE-reserved bits in numflag to "local". Stores with more than one would have to use some of the user bits as additional local flags. The main hang-ups would be migration scripts and lane-side understanding of the flags to appropriately mark local items on receipts.
We could create an attribute snapshot table in
trans_archive
. I'm picturing it as daily - e.g., on dateID 20170116 the item UPC 1234567890123 had numflag value 64. This does result in some data loss since edits can and will happen midday, but I doubt those edits are reliably happening at the exact moment an item changes anyway. It ought to be about as close as possible. I like how this scales in terms of flexibility to add columns. If the JSON route ever makes sense the snapshotting would provide de-duplication (i.e., we sold an item 10 times today but only stored its attribute string representation once).At the moment the attribute snapshot table is my favorite option but I want to spend some more time thinking about it.