OvertureMaps / io-site

MIT License
31 stars 4 forks source link

HTTP/1.1 -> HTTP/2 for parquet resources - hosting #122

Closed H-Plus-Time closed 1 month ago

H-Plus-Time commented 1 month ago

TLDR: the random access + blob storage latency problem reareth it's ugly head, again.

So, the download step is looking pretty good (nice to see a little of my wasm work put to good use :smile: ), but appears to be running up against the limitations of combining browsers (where HTTP/1.1 is capped at 7 concurrent requests per origin) with blob storage providers (which all cap out at HTTP/1.1, with the exception of GCS on HTTP/3); incidentally the same reason pmtiles are best served from a range-request aware CDN (ignoring caching, the difference in protocol makes a huge difference).

Pointing the catalogue at a Cloudfront distribution of the geoparquet files would be enough, though it would probably be worthwhile nudging infra to mirror the releases to a public GCS.

That should help tremendously with the time aspect (eliminating the HEAD request in object-store-wasm for each file will be helpful too, though the impact is a lot less noticeable over HTTP/2).

kylebarron commented 1 month ago

Thanks for the advice @H-Plus-Time!

Bonkles commented 1 month ago

Yeah this is great- I was wondering why we were so bound up re: the HTTP fetches out to the geoparquet, but this makes sense.

We do have some other code in the works to speed up the downloads, among which is client-side bounding box filtering before we try to assemble the dataset. In early testing this makes us need to consult a dozen files or so maximum with every theme/type turned on.

We've already set up a CDN to serve the tiles out of, so we can investigate similar for the geoparquet.

Bonkles commented 1 month ago

Since we already have existing work here to reduce the fileset we're pulling from, and I have created an internal issue to track creating the CF distro, closing this. :)