Quaffel / get-me-drunk-efficiently

Webservice providing cocktail recommendations based on how drunk you want to be
4 stars 0 forks source link

Handle "whole"/"piece" unit as a whole item #50

Closed Quaffel closed 2 years ago

Quaffel commented 2 years ago

When fetching data from Wikidata, we normalize the volumes ourselves. This also holds for the unit with the unit symbol "1" which represents "wholes", such as "1 banana" or "1 lemon slice". In Wikidata, this unit (Q199 ("1")) is used whenever no unit is specified.

This leads to some interesting issues:

I performed a small analysis to figure out how significant this issue is. A query on all of the ingredients that make use of this unit yields the following results:

Ingredient Occurring amounts Affected cocktails
zest 1 Boulevardier
single espresso 1 Espresso Martini
zest 1 Hanky-Panky cocktail
brandy 4 Horse's Neck (x2)
barley malt syrup 0.75 Jane's Mudders Milk (x2)
egg yolk 3 Jane's Mudders Milk
cola 0.5 Kalimotxo
red wine 0.5 Kalimotxo
olive 1 Klingon Grok
blackberry 2 Bramble
raspberry 5 Lillet Wild Berry
blueberry 5 Lillet Wild Berry
Lillet 1 Lillet Wild Berry
strawberry 3 Lillet Wild Berry
lemonade 4 Lillet Wild Berry
mint leaf 2 Lillet Wild Berry
mint sprig 6, 4 Mojito, Mint Julep
olive 1 Mugato Martini
egg white 1, 1, 1, 1, 1 Clover Club Cocktail, New York Sour (x2), Pink Lady, Pisco Sour, Ramos Gin Fizz
pitted cherry 1 Randy Yeoman
maraschino cherry 2 Rob Roy
sugar cube 3, 1 Rüdesheimer Kaffee, Sazerac
cherry 1 The Great Bird of the Galaxy
chicken egg 1 The Red Hour (x2)
onion 0.5 Vampiro (x2)
orange slice 0.5, 0.5, 0.5 Americano, Whiskey Sour, Negroni
lime wedge 4, 2, 1 Caipirinha, Tequila and Tonic, Tschunk (x2)
lime slice 1, 1, 1 The Paradise Syndrome, Cuban Sunset, Kirk-a-kola
Ice Cube 1 Randy Yeoman
sugar cube 1 Champagne Cocktail
frozen banana 1 Mudder's Milk
lemon disk 1 Nikolaschka

Entries with a (x2) indicator were queried twice instead of just once. For me, it is unclear under which circumstances duplicates can occur.

Quaffel commented 2 years ago

Improved Wikidata's database for the following entries:

SPARQL query for the Wikidata Query Service to retrieve all units used in all items that we consider cocktails:

SELECT DISTINCT ?unitLabel ?ingredientLabel ?cocktailLabel
WHERE 
{
  # Retrieves all entities that are at least one of the following:
  # - an instance of "cocktail"
  # - a subclass of "cocktail"
  # - an instance of a subclass of "cocktail"
  ?cocktail wdt:P31?/wdt:P279* wd:Q134768.

  # Filter out classes (instances of Q16889133; "classes")
  FILTER NOT EXISTS {
    ?cocktail wdt:P31 wd:Q16889133
  }

  ?cocktail p:P186 ?ingredientStatement.
  ?ingredientStatement ps:P186 ?ingredient;
                            pqv:P1114/wikibase:quantityAmount ?ingredientAmount;
                            pqv:P1114/wikibase:quantityUnit ?unit.

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

SPARQL query to retrieve all ingredients that have no specified unit (i.e., Q199 ("1")):

SELECT DISTINCT ?unitLabel ?ingredientLabel ?cocktailLabel
WHERE 
{
  ?cocktail wdt:P31?/wdt:P279* wd:Q134768.

  FILTER NOT EXISTS {
    ?cocktail wdt:P31 wd:Q16889133
  }

  ?cocktail p:P186 ?ingredientStatement.
  ?ingredientStatement ps:P186 ?ingredient;
                            pqv:P1114/wikibase:quantityUnit ?unit.

  VALUES ?unit { wd:Q199 }

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Quaffel commented 2 years ago

Now that no cocktail uses "wholes" to describe the amount of fluid ingredients, we should no longer convert them and start displaying them as such. Even though the fact that "whole ingredients" do not constitute to the overall fluid volume is simplified (as ice cubes melt and sugar cubes dissolve), it should be good enough for our purposes.