fac-13 / research

This is where we document our learning
5 stars 0 forks source link

Concatenate String Values with Aggregate Functions in PostgreSQL #33

Open haydnba opened 6 years ago

haydnba commented 6 years ago

Concatenate String Values with Aggregate Functions in PostgreSQL

The problem:

The proper way to handle a many-to-many relationship in a relational database management system is a pair of separate Primary-Keyed tables linked by a third mediating table associating their keys as Foreign-Keys e.g.

Pizzas to Toppings - one pizza may have many toppings, and one topping may be on many pizzas, so:

CREATE TABLE pizza (
    id SERIAL PRIMARY KEY NOT NULL,
    name VARCHAR(64),
    price NUMERIC NOT NULL
);

CREATE TABLE topping (
    id SERIAL PRIMARY KEY NOT NULL,
    name VARCHAR(128),
    type VARCHAR(64)
);

CREATE TABLE pizza_topping (
    pizza_id INT REFERENCES pizza(id),
    topping_id INT REFERENCES topping(id)
);

If we want to pull all the pizzas with their toppings from the database we will need to use the linking table to associate each pizza record to its toppings.

Using a JOIN in our query we will expect the cartesian product of the two Primary-Keyed tables ('pizza' and 'topping') as expressed by the linking table 'pizza_topping' - something like:

American     | Pepperoni | 10.49
American     | Pepper    | 10.49
Farmhouse    | Ham       | 9.75
Farmhouse    | Mushroom  | 9.75
Four-Seasons | Ham       | 12.39
Four-Seasons | Mushroom  | 12.39
Four-Seasons | Artichoke | 12.39
Four-Seasons | Pepper    | 12.39
Hawaian      | Ham       | 9.99
Hawaian      | Pineapple | 9.99

and so on...

The result above would be produced by the following query:

SELECT pizza.name, topping.name, pizza.price
FROM pizza
INNER JOIN pizza_topping
ON pizza_topping.pizza_id = pizza.id
INNER JOIN topping
ON pizza_topping.topping_id = topping.id
ORDER BY pizza.name

But this means that you are duplicating your records. If you don't want a new results row for every unique combination of pizza and topping, but just for every unique pizza BUT you still want to list every topping specific to each pizza record, you need to use an aggregate function within a subquery as follows:

SELECT pizza.name, 
(SELECT array_agg(topping.name) 
FROM topping 
INNER JOIN pizza_topping
ON pizza_topping.topping_id = topping.id
WHERE pizza_topping.pizza_id = pizza.id),
pizza.price
FROM pizza
ORDER BY pizza.name

The above query should return something like the following:

American     | [Pepperoni, Pepper]                | 10.49
Farmhouse    | [Ham, Mushroom]                    | 9.75 
Four-Seasons | [Ham, Mushroom, Artichoke, Pepper] | 12.39
Hawaian      | [Ham, Pineapple]                   | 9.99

The critical elements of the query are

  1. the use of the array_agg() (it could be string_agg instead - you would need to supply a delimiter) within the subquery
  2. the WHERE clause which associates the subquery aggregation to each current pizza Primary Key - if this clause were omitted then the subquery would return an array of every topping record for each pizza and not just those associated to the given pizza

See this, and this on StackOverflow for more examples.

Play around with this SQL Fiddle.

helenzhou6 commented 6 years ago

Wow thanks so much for this - was just trying to figure my head around why you need linking tables (rather than just using JOIN and this clears it up nicely 😊 Have some complimentary emojis: πŸ‘ΎπŸ¦„πŸ‰

haydnba commented 6 years ago

Thanks for such lovely emojis!!!