CDCgov / MicrobeTrace

The Visualization Multitool for Molecular Epidemiology and Bioinformatics
https://microbetrace.cdc.gov/
Apache License 2.0
88 stars 38 forks source link

Transition to GeoBuf #248

Open AABoyles opened 4 years ago

AABoyles commented 4 years ago

In our on-going struggle to keep MicrobeTrace as small as possible, we could win some kilobytes by transitioning the json files in data/ from GeoJSON to TopoJSON.

DanielJDufour commented 4 years ago

I would also suggest considering GeoBuf (https://github.com/mapbox/geobuf) :-)

DanielJDufour commented 4 years ago

As an experiment, I converted the data. Here's the results:

5552 -rw-r--r--   1 danieldufour  staff   2.7M Jun 28 07:15 counties.json
1024 -rw-r--r--   1 danieldufour  staff   509K Jun 28 07:54 counties.json.br
1792 -rw-r--r--   1 danieldufour  staff   864K Jun 28 07:55 counties.json.gz
1376 -rw-r--r--   1 danieldufour  staff   685K Jun 28 07:54 counties.pbf
4416 -rw-r--r--   1 danieldufour  staff   2.2M Jun 28 07:54 counties.topo.json
1024 -rw-r--r--   1 danieldufour  staff   467K Jun 28 07:54 counties.topo.json.br
1400 -rw-r--r--   1 danieldufour  staff   697K Jun 28 07:55 counties.topo.json.gz
 512 -rw-r--r--   1 danieldufour  staff   255K Jun 28 07:09 countries.json
 256 -rw-r--r--   1 danieldufour  staff    69K Jun 28 07:54 countries.json.br
 256 -rw-r--r--   1 danieldufour  staff   103K Jun 28 07:55 countries.json.gz
 152 -rw-r--r--   1 danieldufour  staff    73K Jun 28 07:54 countries.pbf
 424 -rw-r--r--   1 danieldufour  staff   210K Jun 28 07:54 countries.topo.json
 136 -rw-r--r--   1 danieldufour  staff    66K Jun 28 07:54 countries.topo.json.br
 176 -rw-r--r--   1 danieldufour  staff    86K Jun 28 07:55 countries.topo.json.gz
 368 -rw-r--r--   1 danieldufour  staff   184K Jun 28 07:15 land.json
 128 -rw-r--r--   1 danieldufour  staff    61K Jun 28 07:54 land.json.br
 256 -rw-r--r--   1 danieldufour  staff    72K Jun 28 07:55 land.json.gz
  64 -rw-r--r--   1 danieldufour  staff    30K Jun 28 07:54 land.pbf
 376 -rw-r--r--   1 danieldufour  staff   185K Jun 28 07:54 land.topo.json
 128 -rw-r--r--   1 danieldufour  staff    61K Jun 28 07:54 land.topo.json.br
 152 -rw-r--r--   1 danieldufour  staff    72K Jun 28 07:55 land.topo.json.gz
 104 -rw-r--r--   1 danieldufour  staff    50K Jun 28 07:15 stars.json
  16 -rw-r--r--   1 danieldufour  staff   7.4K Jun 28 07:54 stars.json.br
  24 -rw-r--r--   1 danieldufour  staff   9.7K Jun 28 07:55 stars.json.gz
  48 -rw-r--r--   1 danieldufour  staff    22K Jun 28 07:54 stars.pbf
  88 -rw-r--r--   1 danieldufour  staff    41K Jun 28 07:54 stars.topo.json
  16 -rw-r--r--   1 danieldufour  staff   7.4K Jun 28 07:54 stars.topo.json.br
  24 -rw-r--r--   1 danieldufour  staff   9.5K Jun 28 07:55 stars.topo.json.gz
2352 -rw-r--r--   1 danieldufour  staff   1.1M Jun 28 07:09 states.json
 512 -rw-r--r--   1 danieldufour  staff   243K Jun 28 07:55 states.json.br
 896 -rw-r--r--   1 danieldufour  staff   407K Jun 28 07:55 states.json.gz
 488 -rw-r--r--   1 danieldufour  staff   242K Jun 28 07:54 states.pbf
1944 -rw-r--r--   1 danieldufour  staff   971K Jun 28 07:54 states.topo.json
 512 -rw-r--r--   1 danieldufour  staff   219K Jun 28 07:55 states.topo.json.br
 680 -rw-r--r--   1 danieldufour  staff   340K Jun 28 07:55 states.topo.json.gz
5064 -rw-r--r--   1 danieldufour  staff   2.5M Jun 28 07:15 tracts.csv
1280 -rw-r--r--   1 danieldufour  staff   595K Jun 28 07:55 tracts.csv.br
1792 -rw-r--r--   1 danieldufour  staff   879K Jun 28 07:55 tracts.csv.gz
2312 -rw-r--r--   1 danieldufour  staff   1.1M Jun 28 07:15 zipcodes.csv
 640 -rw-r--r--   1 danieldufour  staff   273K Jun 28 07:55 zipcodes.csv.br
 896 -rw-r--r--   1 danieldufour  staff   392K Jun 28 07:55 zipcodes.csv.gz
DanielJDufour commented 4 years ago

It appears that geobuf (.pbf) generally beats topojson, however topojson + brotli often beats geobuf and sometimes by a substantial margin. For example, stars.topo.json.br is 7.4k whereas stars.pbf is 22K. What do you think? How often is MicrobeTrace served with gzip or brotli?

AABoyles commented 4 years ago

Thanks for running this experiment, @DanielJDufour!

It appears that geobuf (.pbf) generally beats topojson, however topojson + brotli often beats geobuf and sometimes by a substantial margin. For example, stars.topo.json.br is 7.4k whereas stars.pbf is 22K. What do you think? How often is MicrobeTrace served with gzip or brotli?

I doubt anyone is using MicrobeTrace as brotli (or even gzip, for that matter). It works with the little express server we wrote back when we use heroku for our main deployment, and that leveraged brotli (and fellback to gzip for incompatible browsers). However, MicrobeTrace in production today is served using a completely different server which basically only serves up the uncompressed static files (unless they've made big architectural changes lately). Given that, switching to geobuf could be a win (especially if we decide we need to shrink the footprint again for some reason).

DanielJDufour commented 4 years ago

Hi, @jaywokim . Shall I submit a PR for using Geobuf?

jaywokim commented 4 years ago

Yes, please. That would be great.