dcramer / peated

https://peated.com
Apache License 2.0
63 stars 13 forks source link

Expand Bottles to Aggregate Variants #210

Open dcramer opened 1 month ago

dcramer commented 1 month ago

We're going to take a clean pass at bottling and solve this once and for all.

The plan is the following:

In general, I think what we're trying to do here is implicitly create "series" but in a more structured manner than something like Whiskybase does.

For bottle attributes, one of the biggest things we have to determine is which attributes enforce that its a variant. This is primarily going to be single cask focused.

Here's a list with the help of ChatGPT. Some of these are deterministic via the name, others are not whatsoever.

Things that show up in the name and are non deterministic:

Some unknowns:

Other things we should consider:

Realistically most of these are going to be focused on the variants. The parent probably only includes a few key items (which cannot change between children):

dcramer commented 1 month ago

For the variants, I think Ardbeg is a really strong litmust test, specifically this bottle:

https://peated.com/bottles/40876

Another important thing we must consider: Casks are sometimes important sometimes not. Every SMWS bottle is single cask, so we need to make sure the case of "a single variant" is really smooth and looks no different than a single non-variant bottle flow (e.g. Laphroiag 10).

One last thing that needs to be thought about now that I'm writing this out:

What about special releases of typical labelings? IMO they should probably go under a variant. e.g. the 225th Anniversary Edition.

dcramer commented 1 month ago

a note about proof/abv/etc:

These are not primary variant factors, so we may also need to determine makes a variant unique, and what data should be approximate.

For example, if the proof is "120-130", we dont actually care. Thats not enough to create a new variant, its just variable (as is expected, tbqh). So do we even store proof? ABV is an important thing to some degree, but how do we deal w/ the fact that it varies?

dcramer commented 1 month ago

The now most critically important question: what do I call them?

Thinking 'edition' for now.

dcramer commented 1 month ago

Working branch is feat/editions.

Going to keep the 'bottle' table for the aggregations, and use the new bottle_edition` table - first to copy all the existing data into it - and eventually to be the canonical reference for full bottle details.

Every exist bottle will at minimum have one row in bottle_edition, and then we'll collapse a bunch of bottles into each other.

dcramer commented 1 month ago

Rethinking this with fresh eyes this morning.

  1. We'll keep bottle as is, and expand its attributes.
  2. We'll add (likely) a parentId to bottle.

This should make it cleaner to actually get this change done, as right now looking at renaming things, and breaking up variants from bottles.. its just too many changes and its not completely objective.

For the bottles details page, this means youll still be able to permalink every bottle, and we'll simply add an "Editions" (tbd) section on it that shows the other bottles. That will show both for the parent bottles as well as all other editions.

We'll also still need the 'edition' (nullable) string column on the bottles table.

dcramer commented 1 month ago

Pushed singleCask and caskStrength flags (and name detections).

Working on getting edition in now, and migration BottleAlias.name to be a mirror of Bottle.fullName, which means Bottle.fullName will become less used in the UI (e.g. when we want to break up "Laphroiag" "12-year-old" and "225th Anniversary" components).

dcramer commented 1 month ago

Im realizing my primary issue is likely from trying to generate a unique label as a string.

Let's take this random 40 year:

Tomatin 40-year-old

The vintage year matters, more so than it does with many others. Do you have to duplicate the vintage year in to the edition now? Thats silly. What you want to do is just fill out the bottle information in as much detail as you can, and have the system understand if its a duplicate or not.

The problem is two things:

1) A human readable names 2) Missing information that could exist in the future

The first issue I think can be addressed through generated names. We can look at all bottles in a series on a write, and generate. description name (particular with the subtext field). Or we can be dumb about it for now and just do some rule-based heuristics for the display name.

The second issue is likely just going to need dupe detection. There's various techniques we can use to identify duplicates, help merge them, and help avoid future duplicates. Mostly this comes down to making the bottle search and add bottle flows very easy to identify potential matches.

So I think the next step, after I clean up some data, is likely to figure out the unique constraint solution.

I'll try to keep edition one field for now, and continue to overload it with batch/series/etc information.

dcramer commented 1 month ago

Fresh eyes this morning, I have a mental model for how to deemphasize editions in the database (thus removing a lot of the noise to beginners). The core concern that I need to solve to pull this off yet though is the approach to naming editions.

Right now there are a lots of variables in play, but effectively we need the Bottle.name to become the bottling series, and the Bottle.edition to become the descriptor of the individual bottle.

I want to take a common scenario that poses the UX problem I'm having:

Angel's Envy Cask Strength 2020

However, in this case, 2020 is also the Release Year. I wanted to avoid filling in duplicate details - we already have some silliness with the name vs statedAge. Maybe we should just ignore the release year field as a goal right now though? Force filling in the edition for these variable details, try to pusht he user to enter the right information, and then build some tooling to improve over time.

dcramer commented 1 month ago

Two open scenarios that are more tricky:

  1. Tomatin 12-year-old, and Tomatin 12-year-old Sherry Cask. Are these separate bottles or just separate editions? I lean towards the former, but where do we draw the line?

  2. Diageo Special Releases - often these are normal bottles with a limited release. They clearly seem to imply an edition of a bottle, is the edition "Diageo Special Releases 2023" , as an example?

dcramer commented 1 month ago

Here's a thought exercise:

Exercise:

Ok those two work, but we're still stuck here:

dcramer commented 1 month ago

Some more challenges:

Kilchoman Spring Release 2010

Whats the series name? Spring Release?

dcramer commented 1 month ago

One obvious rule we can add:

This doesn't solve for "what about the release/batch/edition". We could continue to keep that as a separate field.

None of this helps us with deduping yet, or creating those series concepts.

dcramer commented 1 month ago

Need to determine if finish is a worthwhile field to add. Its really equiv to edition for how we want to utilize it in a lot of ways, but I dont know that'd we'd want to aggregate different finishes together within the same series.

dcramer commented 2 weeks ago

After living with this for a couple weeks, I'm not sold on this editions field. Sure its hypothetically better to dedupe things, but it feels more tedious from a manual input point of view. You're sitting there, staring a bottle label, and you just have to ask yourself "wtf is the name and wtf is the edition". That's not fun.

I may revert it and combine edition back into name. Doesn't mean we can still pull out some of the things above.