In previous attempts #3624, we attempted to ingest the streams as Vector Tiles stored in S3. However, this leads to a number of unknowns #3625, #3626, #3627. All this is avoided if we can reuse our existing pipeline of importing the data into the database, even though is by far the largest dataset ever imported (~32GB compressed, ~110GB uncompressed).
While importing this dataset is cumbersome, the streams run remarkably well and with the same performance as the smaller datasets. This existing pipeline saves generated tiles to S3, whose caching / performance is also improved.
Overview
In previous attempts #3624, we attempted to ingest the streams as Vector Tiles stored in S3. However, this leads to a number of unknowns #3625, #3626, #3627. All this is avoided if we can reuse our existing pipeline of importing the data into the database, even though is by far the largest dataset ever imported (~32GB compressed, ~110GB uncompressed).
While importing this dataset is cumbersome, the streams run remarkably well and with the same performance as the smaller datasets. This existing pipeline saves generated tiles to S3, whose caching / performance is also improved.
Closes #3625 Closes #3642
Demo
https://github.com/user-attachments/assets/eab8113b-3c3e-41e5-b88f-32c0b670388e
Notes
We'll have to figure out how to optimize this for local development, because currently installing the data locally is quite cumbersome.
Testing Instructions
This should be tested on staging where this is deployed, since installing the data locally would take an inordinate amount of time.