Description

This is a prototype suggestion to use the protobuf format for serializing this data. By serializing to protocol buffers, we can gain a lot of compression.

For example, I tested this on the canadian allocations data and found that the protobuf format is 28.3% the size of pure json. The drawback of using protobufs is that since we're stripping the field names (hence most of the compression) we have to store their definitions elsewhere through a protobuf file. I don't think right now we're limited by space, but it's something worth considering and documenting as we're designing the API and exploring our options.

Encoding Method	Size (bytes)	Percent size relative to one line JSON
JSON, no indentation, no newline	144550	100%
JSON, new line only	150766	104.3%
JSON, new line + 1 space indentation	199154	138%
JSON, new line + 2 space indentation	247542	173%
protobuf	40919	28.3%
protobuf B64 encoded	54560	37.7%

Pros and Cons for using protobuf

Pros

significantly smaller footprint
we have one universal definition file defining the data. there is no ambiguity or language specific choices. It should be noted that JSON + JSON Schema is also language agnostic. However, the definition files can tend to be harder to read.
protos are forwards and backwards compatible. this makes it easy to evolve the API
tools exist to convert these proto files into typed models into the most common languages

Cons

the data is no longer self-contained; we need to refer to a proto file to understand it
the data is not longer easy to read
model changes require changing the proto definitions which makes making changes to the data slightly more inconvenient.
you need an additional library to read them. For JS, this requires a 788KB unpacked library file here. This is on top of the generated js code which for this data ends up approximately taking 35KB.

The proto definition

syntax = "proto3";

package frequencies;

message FrequencyBand {
    double lower = 1;
    double upper = 2;
}

message FrequencyAllocation {
    string service = 1;
    bool primary = 2;
    repeated string footnotes = 3;
}

message FrequencyAllocationBlock {
    FrequencyBand band = 1;
    repeated FrequencyAllocation allocations = 2;
}

message FrequencyAllocations {
    string name = 1;
    string region = 2;
    optional string parent_region = 3;
    uint32 year = 4;
    bytes meta = 5; 
    repeated FrequencyAllocationBlock allocation_blocks = 6;
}

EarthFrequencies / earth-frequencies-data

feat(proto): use protobuf format #16

Description

Pros and Cons for using protobuf

Pros

Cons

The proto definition