dwyl / learn-postgresql

🐘 Learn how to use PostgreSQL and Structured Query Language (SQL) to store and query your relational data. 🔍
212 stars 23 forks source link

PG extensions, a list #115

Open ndrean opened 1 week ago

ndrean commented 1 week ago

After watching this video:

Screenshot 2024-11-18 at 22 38 43

Screenshot 2024-11-20 at 17 11 19

https://www.postgresql.org/docs/17/contrib.html

https://github.com/dhamaniasad/awesome-postgres?tab=readme-ov-file#extensions

[TODO] trunk ?

ndrean commented 1 week ago

:pushpin: PostgREST

Screenshot 2024-11-18 at 22 48 52

PostgREST is a standalone web server that automatically turns your PostgreSQL database into a RESTful API. It reads your database schema and creates a full set of CRUD endpoints without requiring you to write any code.

It stands between the Phoenix backend and the PostgreSQL database and exposes endpoints. It may need an SSL connection between PostgreSQL and PostgREST, whilst the Phoenix server and the PostgREST container are probably in the same VPC. A special role will be setup for PostgREST in PostgreSQL.

Key features

JWT Workflow with PostgREST and Phoenix

Explanations It adds an extra layer of safety between Phoenix and PostgreSQL with a JWT token saved in the session cookie - Roles and Authorization in PostgreSQL: - Roles (admin, user, etc.) and authorization scopes (e.g., RLS policies) are defined and enforced in PostgreSQL. - User Login: - When a user logs in, Phoenix generates a "user_token" stored in a secure, HTTP-only session cookie and saved in the database associated with the user's profile. - When using PostgREST, a separate JWT should be generated for database access. This JWT typically includes claims such as user_id, role (e.g., admin, user), and exp (expiration time in UNIX timestamp format). - The JWT protects the database access between Phoenix and PostgREST for each request. - When the LiveView mounts, it uses a `live_session` with an `on_mount` callback to authenticate the user based on the data stored in the session. - This callback typically uses the "user_token" (generated by the code deployed using `mix phx.gen.auth`) to fetch the current user from the database. - Token(s) Storage: - The "user_token" is stored in a secure, HTTP-only session cookie and in the database1 - If using a JWT for PostgREST, it should be stored securely, typically in memory or a secure storage mechanism, and sent with each request to PostgREST. - Database Query via PostgREST: - When Phoenix needs to query the database on behalf of the user: - It retrieves the JWT from the session cookie or memory knowing the "user_id". - It sends an HTTP request to PostgREST, passing the JWT in the Authorization header. - PostgREST validates the JWT using the shared secret and checks the exp claim. If valid, it queries PostgreSQL using the JWT claims. [TODO] what if JWT expired? Should Phoenix do a proactive check on `exp` to avoid 401 from PostgREST?
flowchart TD
    subgraph Phoenix["Phoenix App (4000)"]
        PC[PostgrestClient]
    end

    subgraph PostgREST["PostgREST Server (3000)"]
        PRM[Pool Connection Manager]
        PRM -->|manages| CP[Connection Pool]
    end

    subgraph PostgreSQL["PostgreSQL Server (5432)"]
        PG[(Database)]
    end

    PC -->|HTTP Requests| PRM
    CP -->|DB Connections| PG

    style Phoenix fill:#a8e6cf
    style PostgREST fill:#dcedc1
    style PostgreSQL fill:#ffd3b6
ndrean commented 1 week ago

:pushpin: __pg_vector__

Save and query your embeddings with Postgres

Screenshot 2024-11-18 at 22 52 45

ndrean commented 1 week ago

:pushpin: PostGIS

PostGIS extends the capabilities of the PostgreSQL relational database by adding support for storing, indexing, and querying geospatial data.

PostGIS features include:

Spatial Data Storage: Store different types of spatial data such as points, lines, polygons, and multi-geometries, in both 2D and 3D data.

Spatial Indexing: Quickly search and retrieve spatial data based on its location.

Spatial Functions: A wide range of spatial functions that allow you to filter and analyze spatial data, measuring distances and areas, intersecting geometries, buffering, and more.

Geometry Processing: Tools for processing and manipulating geometry data, such as simplification, conversion, and generalization.

Raster Data Support: Storage and processing of raster data, such as elevation data and weather data.

Geocoding and Reverse Geocoding: Functions for geocoding and reverse geocoding.

Integration: Access and work with PostGIS using third party tools such as QGIS, GeoServer, MapServer, ArcGIS, Tableau.

Screenshot 2024-11-18 at 22 54 49

ndrean commented 6 days ago

:pushpin: LISTEN/NOTIFY as a PubSub

Screenshot 2024-11-20 at 17 13 21

only connected clients that LISTEN will receive a message. This is not a message queue.

when a NOTIFY is executed inside a transaction, the notify events are not delivered until and unless the transaction is committed.

Postgrex Notification with Elixir

Example Your Postgres server is running and exposes PORT 5432. Open a terminal to run an Elixir session (`iex`), and paste: ```elixir Mix.install([:postgrex]) {:ok, _} = Supervisor.start_link( [ {Postgrex, name: :pg, database: "postgres", username: "postgres", password: "postgres"}, {Postgrex.Notifications, name: :pg_notify, database: "postgres", username: "postgres", password: "postgres"} ], strategy: :one_for_one ) defmodule Channel do def listen(channel, listen_ref) do receive do {:notification, _pid, ^listen_ref, ^channel, payload} -> [payload: payload] |> IO.inspect() # listen(channel, listen_ref) end end end {:ok, listen_ref} = Postgrex.Notifications.listen(:pg_notify, "channel") ``` Add: ```elixir Channel.listen("channel", listen_ref) ``` In another terminal, open a `psql` client and target the same database: ```sh psql -h localhost -U postgres -d postgres ``` and paste either: ```sql notify channel, 'first'; select pg_notify('channel', 'first'); ``` In your Elixir terminal, you should see: ```elixir [payload: "first"] ``` Back in the Postgres client, send several messages: ```sql notify channel, 'second'; notify channel, 'third'; ``` In the Elixir session, type again: ```elixir Channel.listen("channel", listen_ref) # [payload: "second"] ``` and again type ```elixir Channel.listen("channel", listen_ref) # [payload: "third"] ``` Now from Elixir, do: ```elixir Postgrex.query!(:pg, ~s|NOTIFY channel, 'From another user!'|, []) ``` Then: ```elixir Channel.listen("channel", listen_ref) #[payload: "from another user!] ``` and you check in the Postgres client: ```sql listen channel; LISTEN Asynchronous notification "channel" with payload "from other user" received from server process with PID 46395. ```

Trigger a saved function to notify on [INSERT, UPDATE , DELETE, TRUNCATE ] in Postgres

Postgres doc [TODO]

ndrean commented 5 days ago

:pushpin: citext

The citext extension in Postgres provides a case-insensitive data type for text. This is particularly useful in scenarios where the case of text data should not affect queries, such as usernames or email addresses, or any form of textual data where case-insensitivity is desired.

Screenshot 2024-11-20 at 17 14 16

Review of "standard" quotations and how it works with "uppercasing" String values use _single quotes_: ```sql INSERT INTO products (product_name, price) VALUES ('Orange juice', 1); ``` Identifiers use _double quotes_ __if__ they contain a space, and it becomes _case sensitive_: ```sql CREATE TABLE "Employee Records" (employee_id serial PRIMARY KEY); ``` so the table "Employee Records" is not the table "Employee records". Most of the time, you don't have a space in your identifier, so you can use unquoted identifiers, and it becomes _case insensitive_: ```sql CREATE TABLE "Employee" (employee_id serial PRIMARY KEY); # identical create table employee (employee_id serial primary key); ``` so that both work: ```sql SELECT * FROM employee; select * from Employee; ```
CITEXT Example ## Base example ```sql CREATE TABLE employee ( employee_id serial PRIMARY KEY, email TEXT UNIQUE ); ``` Our `UNIQUE` constraint is case sensitive: ```sql INSERT INTO employees (email) VALUES ('ME.Too@mail.com'), ('me.too@mail.com'); SELECT COUNT(*) FROM employees; # => 2 ``` This query returns nothing as it is case sensitive. ```sql SELECT * FROM employees WHERE email='me.Too@mail.com'; # => 0 ``` ## Using `citext` The migration: ```sql CREATE EXTENSION IF NOT EXISTS citext; ``` Instead of `TEXT` as the field type, we use `CITEXT`: ```sql DROP TABLE employees; CREATE TABLE employees ( employee_id serial PRIMARY KEY, email CITEXT UNIQUE ); INSERT INTO employees VALUES ('Me.Too@mail.com"); ``` We can query regardless of the letter case: ```sql SELECT COUNT(*) FROM employees WHERE email="me.too@mail.com"; # => 1 ``` You can recover the cases sensitivity when _casting_ into `TEXT`; ```sql SELECT COUNT(*) FROM employees where email::text='me.too@mail.com'; # => 0 ``` Our `UNIQUE` constraint works as expected: ```sql INSERT INTO employees (email) VALUES ('me.too@mail.com'); #= > ERROR: duplicate key value violates unique constraint "employees_email_key" ``` We can filter with or without casting: ```sql INSERT INTO employees (email) VALUES ('you.Too@mail.com'); SELECT COUNT(*) FROM employees where email::text LIKE '%too%'; # => 0 SELECT COUNT(*) FROM employees where email LIKE '%too%'; # => 2 ```
ndrean commented 5 days ago

:pushpin: PSQL

Screenshot 2024-11-20 at 17 08 08

psql is a terminal-based front-end to PostgreSQL. It enables you to type in queries interactively, issue them to PostgreSQL, and see the query results.

psql command line tutorial and cheat sheet

ndrean commented 5 days ago

:pushpin: PgBouncer

pgbouncer is a PostgreSQL connection pooler. Any target application can be connected to pgbouncer as if it were a PostgreSQL server, and pgbouncer will create a connection to the actual server, or it will reuse one of its existing connections.

The aim of pgbouncer is to lower the performance impact of opening new connections to PostgreSQL.

This can happen when many applications reach the database, more than the database is supposed to support. This can be true in a Kubernetes environment.

In order not to compromise transaction semantics for connection pooling, pgbouncer supports several types of pooling when rotating connections:

ndrean commented 5 days ago

:pushpin: earthdistance

Screenshot 2024-11-20 at 17 06 38

In the migration:

CREATE EXTENSION cube;
CREATE EXTENSION earthdistance
Example Paris is `(48.9, 2.3)` and London is `(51.5, -0.1)`. The distance in km (from meters casted into integer) is: ```sql SELECT (earth_distance(ll_to_earth(48.9, 2.3), ll_to_earth(51.5, -0.1)) / 1000)::integer AS result; ``` gives: ```sql result ------------------- 336 ``` If you are getting the data from a table, you would use something like this: ```sql WITH p1 AS (SELECT ll_to_earth(48.9,2.3) AS value), p2 AS (SELECT ll_to_earth(51.5, -0.1) AS value) SELECT (earth_distance(p1.value, p2.value) / 1000)::INTEGER AS result from p1,p2; ```
ndrean commented 5 days ago

:pushpin: pg_search

Screenshot 2024-11-20 at 16 28 32

ndrean commented 5 days ago

:pushpin: Postgres as a CRON server: PG_CRON

Screenshot 2024-11-20 at 16 35 17

Screenshot 2024-11-20 at 16 57 12

Install pg_cron

You have to add it to your image the installer postgresql-17-cron if you build a Debian image.

FROM postgres:17.1-bookworm
RUN apt-get update &&  \ 
  apt-get -y install postgresql-17-cron && \ 
  apt-get clean \ 
  && rm -rf /var/lib/apt/lists/*

[TODO] Command ?? to run Postgres and set the "postgresql.conf" config for cron.database_name = 'postgres' and shared_preload_libraries = 'pg_cron'.

and put this in the migration:

CREATE EXTENSION pg_cron;
GRANT USAGE ON SCHEMA cron TO ??? <---

[TODO]: test this

ndrean commented 5 days ago

:pushpin: Postgres as HTTP client

Screenshot 2024-11-20 at 16 59 29

Install

Add to the Debian image:

apt install postgresql-17-http

And migration:

CREATE EXTENSION http;

[TODO]: test this

ndrean commented 5 days ago

:pushpin: Pivoting tables aka as "cross tables" with tablefunc

Screenshot 2024-11-20 at 23 17 09

The migration:

CREATE EXTENSION tablefunc;
Example of pivoting tables ```sql create Table ProductSales ( Productname varchar(50), Year int, Sales int ); Insert into ProductSales values ('A',2017,100), ('A',2018,150), ('A',2019,300), ('A',2020,500), ('A',2021,450), ('A',2022,675), ('B',2017,null), ('B',2018,900), ('B',2019,1120), ('B',2020,750), ('B',2021,1500), ('B',2022,1980); ``` As an example, on field is NULL. We use `COALESCE` to cast into 0 null values. ```sql SELECT * FROM crosstab('SELECT Productname, year::TEXT, COALESCE(sales, 0)::INT as sales FROM ProductSales ORDER BY Productname, year;' ) AS ProductSales( Productname varchar(50), "2017" INT, "2018" INT, "2019" INT, "2020" INT, "2021" INT, "2022" INT ); ``` ```sql productname | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 -------------+------+------+------+------+------+------ A | 100 | 150 | 300 | 500 | 450 | 675 B | 100 | 900 | 1120 | 750 | 1500 | 1980 ``` If we don't use `crosstab`, we can use `FILTER` from PostgreSQL which can be used with an aggregate function. Here we use `MAX` as a product only appears once per year. ```sql SELECT Productname, COALESCE(MAX(sales) FILTER (WHERE year = 2017), 0) as "2017", COALESCE(MAX(sales) FILTER (WHERE year = 2018), 0) as "2018", COALESCE(MAX(sales) FILTER (WHERE year = 2019), 0) as "2019", COALESCE(MAX(sales) FILTER (WHERE year = 2020), 0) as "2020", COALESCE(MAX(sales) FILTER (WHERE year = 2021), 0) as "2021", COALESCE(MAX(sales) FILTER (WHERE year = 2022), 0) as "2022" FROM ProductSales GROUP BY Productname ORDER BY Productname; ``` ```sql productname | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 -------------+------+------+------+------+------+------ A | 100 | 150 | 300 | 500 | 450 | 675 B | 100 | 900 | 1120 | 750 | 1500 | 1980 ``` If we want to add a total per yaer, we do a `UNION` and a `CASE` to "filter" the SUM (the `0` avoids potential NULL, in the same spirit as when doing COALESCE. ```sql SELECT * FROM crosstab('SELECT Productname, year::TEXT, COALESCE(sales, 0)::INT as sales FROM ProductSales ORDER BY Productname, year;' ) AS ProductSales( Productname VARCHAR(50), "2017" INT, "2018" INT, "2019" INT, "2020" INT, "2021" INT, "2022" INT ) UNION ALL SELECT 'Total', SUM(CASE WHEN Year = 2017 THEN Sales ELSE 0 END), SUM(CASE WHEN Year = 2018 THEN Sales ELSE 0 END), SUM(CASE WHEN Year = 2019 THEN Sales ELSE 0 END), SUM(CASE WHEN Year = 2020 THEN Sales ELSE 0 END), SUM(CASE WHEN Year = 2021 THEN Sales ELSE 0 END), SUM(CASE WHEN Year = 2022 THEN Sales ELSE 0 END) FROM ProductSales AS Result; ``` ```sql productname | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 -------------+------+------+------+------+------+------ A | 100 | 150 | 300 | 500 | 450 | 675 B | 100 | 900 | 1120 | 750 | 1500 | 1980 Total | 200 | 1050 | 1420 | 1250 | 1950 | 2655 ```
ndrean commented 3 days ago

:pushpin: pg_background

Screenshot 2024-11-22 at 23 01 18

Install

CREATE EXTENSION pg_background;
ndrean commented 2 days ago

A few PostgreSQL functions and clauses use case

Examples ## Base table ```sql CREATE TABLE employees ( id SERIAL PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), department VARCHAR(50), salary DECIMAL(10, 2) ); -- Sample data with clear salary variations per department INSERT INTO employees (first_name, last_name, department, salary) VALUES ('John', 'Smith', 'Engineering', 95000.00), ('Mary', 'Johnson', 'Engineering', 120000.00), ('Sarah', 'Brown', 'Marketing', 75000.00), ('Michael', 'Davis', 'Marketing', 82000.00), ('Emma', 'Taylor', 'HR', 65000.00), ('William', 'Anderson', 'HR', 63000.00), ('Lisa', 'Thomas', 'Sales', 90000.00), ('Robert', 'Martin', 'Sales', 95000.00); ``` ## Add a column with a computed full name: `CONCAT` ```sql SELECT id, first_name, last_name, department, salary, CONCAT(first_name, ' ', last_name) AS full_name FROM employees; ``` ```sql id | first_name | last_name | department | salary | full_name ----+------------+-----------+-------------+-----------+------------------ 1 | John | Smith | Engineering | 95000.00 | John Smith 2 | Mary | Johnson | Engineering | 120000.00 | Mary Johnson 3 | Sarah | Brown | Marketing | 75000.00 | Sarah Brown 4 | Michael | Davis | Marketing | 82000.00 | Michael Davis 5 | Emma | Taylor | HR | 65000.00 | Emma Taylor 6 | William | Anderson | HR | 63000.00 | William Anderson 7 | Lisa | Thomas | Sales | 90000.00 | Lisa Thomas 8 | Robert | Martin | Sales | 95000.00 | Robert Martin ``` If you want to update your table, you run a migration: ```sql ALTER TABLE employees ADD COLUMN full_name VARCHAR(101); ``` and then populate the new table: ```sql UPDATE employees SET full_name = CONCAT(first_name, ' ', last_name); ``` ## Add a column of the employees salary range: `CASE` We want to add a column as an enum "Low", "Medium" and "High" to each employee based on the salary: ```sql SELECT CONCAT(first_name, ' ', last_name) AS full_name, CASE WHEN salary < 80000 THEN 'Low' WHEN salary >= 80000 AND salary < 100000 THEN 'Medium' ELSE 'High' END AS salary_range FROM employees ORDER BY salary_range; ``` ```sql full_name | salary_range ------------------+-------------- Mary Johnson | High Emma Taylor | Low William Anderson | Low Sarah Brown | Low Robert Martin | Medium Michael Davis | Medium Lisa Thomas | Medium John Smith | Medium ``` ## Calculate a distribution: nb of employees per salary range: `CASE` ```sql SELECT COUNT(CASE WHEN salary < 80000 THEN 1 END) AS "# Low", COUNT(CASE WHEN (salary >= 80000 AND salary < 100000) THEN 1 END) AS "# Medium", COUNT(CASE WHEN salary >= 100000 THEN 1 END) AS "# High" FROM employees; ``` ```sql # Low | # Medium | # High -------+----------+-------- 3 | 4 | 1 ``` You can add a row giving the aggregated full names of all the employees in a given range with `STRING_AGG` and `UNION` ```sql -- First row: Employee names grouped by salary range SELECT 'Employees' AS "Type", STRING_AGG(CASE WHEN salary < 80000 THEN CONCAT(first_name, ' ', last_name) END, ', ') AS "Low", STRING_AGG(CASE WHEN salary >= 80000 AND salary < 100000 THEN CONCAT(first_name, ' ', last_name) END, ', ') AS "Medium", STRING_AGG(CASE WHEN salary >= 100000 THEN CONCAT(first_name, ' ', last_name) END, ', ') AS "High" FROM employees UNION ALL -- Second row: Count of employees in each salary range SELECT 'Count' AS "Type", COUNT(CASE WHEN salary < 80000 THEN 1 END)::TEXT AS "Low", COUNT(CASE WHEN salary >= 80000 AND salary < 100000 THEN 1 END)::TEXT AS "Medium", COUNT(CASE WHEN salary >= 100000 THEN 1 END)::TEXT AS "High" FROM employees; ``` ```sql Type | Low | Medium | High -----------+--------------------------------------------+-------------------------------------------------------+-------------- Employees | Sarah Brown, Emma Taylor, William Anderson | John Smith, Michael Davis, Lisa Thomas, Robert Martin | Mary Johnson Count | 3 | 4 | 1 ``` SELECT * FROM crosstab('SELECT Productname, year::TEXT, COALESCE(sales, 0)::INT as sales FROM ProductSales ORDER BY Productname, year;' ) ## Filtering with `LIKE` or `IN` ```sql SELECT e.*, e.first_name || ' ' || e.last_name AS full_name FROM employees e WHERE e.first_name: IN ['Mary', 'Robert']; ``` and also: ```sql SELECT * FROM employees WHERE (first_name LIKE '%Mary%' OR first_name LIKE '%Robert%' ); ``` ```sql id | first_name | last_name | department | salary | full_name ----+------------+-----------+-------------+-----------+--------------- 2 | Mary | Johnson | Engineering | 120000.00 | Mary Johnson 8 | Robert | Martin | Sales | 95000.00 | Robert Martin ```