Closed dabreegster closed 2 months ago
Unless I'm measuring something wrong, the current approach with geomedea incurs more bandwidth, but through way less requests and latency.
It's not entirely surprising that geomedea might request more data.
In FGB, there is a single buffer of uncompressed features. In FGB, since there is no compression, the index tells us exactly where each feature is in the file. Using this I implemented smart feature batching, so feature requests will only merge adjacent features into a single request if they are "close enough".
To take advantage of compression, geomedea groups features into pages, so you have to download an entire page even if you only need one feature in the page. Because geomedea's features are in compressed pages, request batching would be a little different. It can still be done, but I guess it'd be "page batches" rather than "feature batches". I haven't implemented this yet, but it should be doable in a non-breaking way.
Could you do me a favor?
RUST_LOG=debug
And give me the lines matching: Finished using an HTTP client. used_bytes
e.g.:
Finished using an HTTP client. used_bytes=839712, wasted_bytes=293690, req_count=4
Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1
wasted_bytes should correspond to the bytes that could be gained by having more clever page-batching.
wasted_bytes should correspond to the bytes that could be gained by having more clever page-batching.
I had a go at "more clever page-batching" here: https://github.com/michaelkirk/geomedea/pull/12
I was looking at the network traffic for your existing FGB integration - and I feel like there must be a bug in the FGB client. It makes no sense for all those small nearby requests (4 bytes?!).
I'm looking into that now.
After updating to the latest 417d4f43cd35aa98aea19a0b17632c8309b50466:
These two cases are now competitive with fgb, so I'm almost definitely going to switch to this. :)
With the new property encoding...
Elephant reads 6.3MB over 23 requests. Finished using an HTTP client. used_bytes=156716, wasted_bytes=0, req_count=5 Finished using an HTTP client. used_bytes=3776001, wasted_bytes=1490472, req_count=17 Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1
Bristol reads 2.5MB over 25 requests Finished using an HTTP client. used_bytes=144956, wasted_bytes=1344, req_count=9 Finished using an HTTP client. used_bytes=1078601, wasted_bytes=473493, req_count=15 Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1
So the new encoding is not giving that huge of an advantage, but still opens the way to doing something nicer later with delta encoding.
I'm going to merge this in now and continue to play with encoding / perf later on. It's a huge improvement with low work, so thanks so much for the new format, adding WASM support, and these page batching fixes!
Here's Elephant & Castle with https://github.com/flatgeobuf/flatgeobuf/pull/376
tldr; there was a bad bug in the http fetch implementation, triggered by those 1.05MB requests. It hadn't came up in the shape of my own data and requests, so thanks for helping to uncover it.
With the bug fix, the two formats seem to be in the same ballpark of network transfer for your queries.
edit for completeness, here's the same with geomedea (one more request, 15% less bytes transferred):
CC @michaelkirk, I'm trying out geomedea for the use case I described in Discord!
Bristol doesn't have many GTFS trip shapes intersecting the area, while E&C in London has loads.
Unless I'm measuring something wrong, the current approach with geomedea incurs more bandwidth, but through way less requests and latency.