EarthFrequencies / earth-frequencies-data

Open data describing the radio frequency allocation for each region and country.
Other
1 stars 1 forks source link

feat(proto): use protobuf format #16

Closed jrmlhermitte closed 2 years ago

jrmlhermitte commented 2 years ago

Description

This is a prototype suggestion to use the protobuf format for serializing this data. By serializing to protocol buffers, we can gain a lot of compression.

For example, I tested this on the canadian allocations data and found that the protobuf format is 28.3% the size of pure json. The drawback of using protobufs is that since we're stripping the field names (hence most of the compression) we have to store their definitions elsewhere through a protobuf file. I don't think right now we're limited by space, but it's something worth considering and documenting as we're designing the API and exploring our options.

Encoding Method Size (bytes) Percent size relative to one line JSON
JSON, no indentation, no newline 144550 100%
JSON, new line only 150766 104.3%
JSON, new line + 1 space indentation 199154 138%
JSON, new line + 2 space indentation 247542 173%
protobuf 40919 28.3%
protobuf B64 encoded 54560 37.7%

Pros and Cons for using protobuf

Pros

Cons

The proto definition

syntax = "proto3";

package frequencies;

message FrequencyBand {
    double lower = 1;
    double upper = 2;
}

message FrequencyAllocation {
    string service = 1;
    bool primary = 2;
    repeated string footnotes = 3;
}

message FrequencyAllocationBlock {
    FrequencyBand band = 1;
    repeated FrequencyAllocation allocations = 2;
}

message FrequencyAllocations {
    string name = 1;
    string region = 2;
    optional string parent_region = 3;
    uint32 year = 4;
    bytes meta = 5; 
    repeated FrequencyAllocationBlock allocation_blocks = 6;
}