LukePrior / nbn-upgrade-map

Interactive map showing premises eligible for the NBN FTTP upgrade program.
https://nbn.lukeprior.com/
MIT License
124 stars 11 forks source link

Reduce DB image size #255

Closed lyricnz closed 1 year ago

lyricnz commented 1 year ago
lyricnz commented 1 year ago

Maybe updated base-image from debian-buster (10) to debian-bookwork (12).

lyricnz commented 1 year ago

Current DB image is 3.73GB:

lukeprior/nbn-upgrade-map-db:202308      f54a8193136a   2 weeks ago    3.73GB

Using dive https://github.com/wagoodman/dive :

❯ CI=true dive lukeprior/nbn-upgrade-map-db:latest
  Using default CI config
Image Source: docker://lukeprior/nbn-upgrade-map-db:latest
Fetching image... (this can take a while for large images)
Analyzing image...
  efficiency: 89.7670 %
  wastedBytes: 400430348 bytes (400 MB)
  userWastedPercent: 10.9345 %
Inefficient Files:
Count  Wasted Space  File Path
    2        321 MB  /data/address_principals.csv.gz
    3         34 MB  /var/lib/postgresql/15/main/pg_wal/000000010000000000000001
    2         14 MB  /var/lib/postgresql/15/main/base/5/16697

Layers:

image
lyricnz commented 1 year ago

Also consider changing the base image to something smaller, and maybe upgrading to postgres 16.

lyricnz commented 1 year ago

Also, duplication could be reduced by using stage-0 base (for postgresql + common config), then two further stages using the same base, for the download/import, and for the minimal final stage.

lyricnz commented 1 year ago

For example, we don't need GIS in the final image - in fact (given we use a CSV) it could even me a more lightweight database overall.

lyricnz commented 1 year ago

Reduced image to 3.01GB

image

lyricnz commented 1 year ago

PR using postgis/postgis:15-3.4-alpine for the import, and postgres:16-alpine for the runtime.

https://github.com/LukePrior/nbn-upgrade-map/pull/257

lyricnz commented 1 year ago

Test run (using docker-compose):

❯ docker-compose -f extra/docker/docker-compose.yaml --profile test up
[+] Running 2/2
 ✔ Container docker-db-1   Created                                                                                                                                          41.4s
 ✔ Container docker-app-1  Created                                                                                                                                           0.9s
Attaching to docker-app-1, docker-db-1
docker-db-1   |
docker-db-1   | PostgreSQL Database directory appears to contain a database; Skipping initialization
docker-db-1   |
docker-db-1   |
docker-db-1   | 2023-09-17 05:22:00.454 UTC [1] LOG:  starting PostgreSQL 16.0 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
docker-db-1   | 2023-09-17 05:22:00.455 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
docker-db-1   | 2023-09-17 05:22:00.456 UTC [1] LOG:  listening on IPv6 address "::", port 5432
docker-db-1   | 2023-09-17 05:22:00.459 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
docker-db-1   | 2023-09-17 05:22:00.469 UTC [24] LOG:  database system was shut down at 2023-09-16 20:32:05 UTC
docker-db-1   | 2023-09-17 05:22:00.485 UTC [1] LOG:  database system is ready to accept connections
docker-app-1  | 2023-09-17 05:22:01,061 INFO MainThread Checking for externally updated geojson results...
docker-app-1  | 2023-09-17 05:23:16,400 INFO MainThread ...done
docker-app-1  | 2023-09-17 05:23:16,437 INFO MainThread Creating DB index...
docker-db-1   | 2023-09-17 05:23:17.442 UTC [29] ERROR:  canceling autovacuum task
docker-db-1   | 2023-09-17 05:23:17.442 UTC [29] CONTEXT:  while scanning block 58505 of relation "gnaf_cutdown.address_principals"
docker-db-1   |     automatic vacuum of table "postgres.gnaf_cutdown.address_principals"
docker-app-1  | 2023-09-17 05:23:17,520 INFO MainThread Checking for unprocessed suburbs...
docker-app-1  | 2023-09-17 05:23:17,520 INFO MainThread Checking for announced suburbs that haven't been updated in 21 days...
docker-app-1  | 2023-09-17 05:23:17,521 INFO MainThread Checking for all suburbs...
docker-app-1  | 2023-09-17 05:23:17,539 INFO MainThread Processing Hulongine, WA
docker-app-1  | 2023-09-17 05:23:17,539 INFO MainThread Fetching all addresses for Hulongine, WA
docker-app-1  | 2023-09-17 05:23:17,549 INFO MainThread Fetched 18 addresses from database
docker-app-1  | 2023-09-17 05:23:17,552 INFO MainThread Loaded 18 addresses from output file
docker-app-1  | 2023-09-17 05:23:17,553 INFO MainThread Submitting 18 requests to add NBNco data...
docker-app-1  | 2023-09-17 05:23:24,740 INFO nbn_0 Completed 18 requests
docker-app-1  | 2023-09-17 05:23:24,748 INFO MainThread Completed. Tally of tech types: {'SATELLITE': 17, 'WIRELESS': 1}
docker-app-1  | 2023-09-17 05:23:24,748 INFO MainThread Location ID types: {'LOC': 16, 'Other': 2}
docker-app-1  | 2023-09-17 05:23:24,756 INFO MainThread Writing results to results/WA/hulongine.geojson
docker-app-1  | 2023-09-17 05:23:27,163 INFO MainThread Updating progress.json
docker-app-1 exited with code 0
lyricnz commented 1 year ago

20% smaller image, and less "code", and a new DB version (15->16; without postgis)

lyricnz commented 1 year ago

Postgis-16 was just released, will try that now (will be slightly more efficient build when uncached due to layer reuse, no change to output).

lyricnz commented 1 year ago

Pr ready

LukePrior commented 1 year ago

Thanks

lyricnz commented 1 year ago

@LukePrior care to run the GHA to update the image on dockerhub (as used by CI)?

lyricnz commented 1 year ago

Thanks