digidem / mapeo-core-next

The upcoming version of Mapeo Core
MIT License
7 stars 1 forks source link

Should we store protobufs gzipped? #701

Closed gmaclennan closed 2 weeks ago

gmaclennan commented 3 weeks ago

I realized recently when working on mapbox vector tiles (which as encoded as protobufs) that gzip compression can save quite a bit of space. In the "wild" protobufs are mainly used in network transports, which are normally gzipped (or otherwise compressed) anyway.

If we gzip compress Mapeo records at rest (e.g. on disk) we could save both disk space and reduce network traffic. This would be a breaking change, so it should ideally happen before MVP launch. It would add overhead reading and writing data, but gzip compression tends to be very fast (it would only be a bottle-neck for reading, since write performance is not an issue - syncing would not require gzipping, because the data will be arriving already gzipped).

Is this worth trying to do before MVP?

EvanHahn commented 2 weeks ago

I think there are two ways we could gzip data:

I'm not 100% confident in my response here, but I don't think this is worth doing for the MVP.

gmaclennan commented 2 weeks ago

That's really helpful, thanks Evan. As you say, I think our data structure is quite different from vector tile data (which is where I was seeing significant size reduction from gzipping protobufs), which have lots of repeated text data. I think based on this data we can safely close this issue and consider this investigated and discarded as something that we might need to do. It's good to know that it's not something that we need.