alan-turing-institute / uatk-spc

Synthetic Population Catalyst
https://alan-turing-institute.github.io/uatk-spc/
MIT License
20 stars 12 forks source link

Reconsider proto3 #42

Closed dabreegster closed 1 year ago

dabreegster commented 1 year ago

https://alan-turing-institute.github.io/uatk-spc/code_walkthrough.html#protocol-buffers points out a few problems with proto3. Mainly, lack of support for required fields. In trying to adjust numeric enum values now, I've discovered that an entry with 0 must be present -- usually UNKNOWN.

Both of these create friction with what we want to express in SPC. We don't want any concept of binary compatibility between old/new versions of code/data. When there's a schema change, we regenerate all data and release under a versioned directory. A consumer can match up the version of data to the version of the schema they're using. And second, part of the benefit of SPC is not having to grapple with possibly missing or invalid values. SPC deals with all of that and delivers cleaner output with documented invariants.

Also, we'll be reading the data in Javascript soon for web maps. Whatever we use needs good support there. (https://github.com/protobufjs/protobuf.js/ for proto3, as an example).

I'll try out a few options and leave notes below

dabreegster commented 1 year ago

Proto2

https://github.com/alan-turing-institute/uatk-spc/tree/proto2 quickly adds required fields and enums without a 0 value. Muuuch better. I haven't tried JS support yet. This is probably the simplest option to switch to.

It'd be great if we could use enums as map keys, to simplify venues_per_activity and flows_per_activity. proto2 doesn't do this

dabreegster commented 1 year ago

Flatbuffers

https://google.github.io/flatbuffers/md__schemas.html mentions support for required.

https://google.github.io/flatbuffers/flatbuffers_guide_use_rust.html Manipulating from Rust is funky, uses accessors because of endianness. Not a fan.

I don't see any support for map types at all

dabreegster commented 1 year ago

Cap'n Proto

https://capnproto.org/language.html

Enums have to start from 0 (and the numeric value may not be exposed or shouldn't be used).

It's got type-safe maps with any kind of keys! The encoding is internally just a list.

No required fields (deliberately). https://capnproto.org/faq.html