LukePrior / nbn-upgrade-map

Interactive map showing premises eligible for the NBN FTTP upgrade program.
https://nbn.lukeprior.com/
MIT License
124 stars 11 forks source link

Change postgres column types to use less space #269

Closed lyricnz closed 1 year ago

lyricnz commented 1 year ago

Following on from the work in https://github.com/LukePrior/nbn-upgrade-map/issues/255 and related to (but maybe obsoleted by?) the work in https://github.com/LukePrior/nbn-upgrade-map/issues/268 we might be able to save a little space by using shorter columns, and/or a numeric.

Current schema is

  gnaf_pid text NOT NULL,
  address text NOT NULL,
  locality_name text NOT NULL,
  postcode text NULL,
  state text NOT NULL,
  latitude numeric(10,8) NOT NULL,
  longitude numeric(11,8) NOT NULL

However:

lyricnz commented 1 year ago

I think postgres is fairly efficient - the storage of a text field is only the number of characters+1. So we'd save 1 byte per row by changing gnaf_pid, and 3 bytes per row changing postcode to a shortint.

At 14.7m rows, this would save ~57MB on data

lyricnz commented 1 year ago

Postgres doesn't have a one-byte integer (to use for a State enum) so changing the state to an int would only save 1-2 bytes. Probably not worth the hassle and extra code. Changing it to a CHAR(3) would do the same.

Edit: could use a CHAR type (a single character)

lyricnz commented 1 year ago

Updating gnaf_pid to CHAR(14) and postcode to SMALLINT:

❯ docker images
REPOSITORY                     TAG       IMAGE ID       CREATED          SIZE
<none>                         <none>    86ed372521d9   39 minutes ago   2.97GB
lukeprior/nbn-upgrade-map-db   latest    8a2b859be4ba   4 days ago       3.01GB

Or, more accurately 2972585117 vs 3014560925 (1.4% saving)

lyricnz commented 1 year ago

FWIW it seems (experimentally) that using CHAR(n) columns types actually consumes n+1 bytes of storage, which is the same as TEXT, so no point in changing gnaf_pid to CHAR(14). Ditto for state as CHAR(3).

lyricnz commented 1 year ago

We could save a couple of bytes per row using a CHAR for state and mapping it to/from client side, but this doesn't seem worth the bother.