Add support for standard library types

ibokuri commented 1 year ago

General

If you'd like to see a certain std type gain support in Getty, please leave a comment and I'll add it to the hit list.

Also, feel free to work on any of the types listed below. If you have any questions, you can ask them on our Discord or in this issue.

The Hit List

[x] ArrayHashMap (includes Auto, String)
[x] ArrayHashMapUnmanaged (includes Auto, String)
[x] ArrayListAligned (includes ArrayList)
[x] ArrayListAlignedUnmanaged (includes ArrayListUnmanaged)
[x] IntegerBitSet (includes half of StaticBitSet)
[x] ArrayBitSet (includes half of StaticBitSet)
[x] DynamicBitSetUnmanaged
[x] DynamicBitSet
[x] BoundedArray
[x] BufMap
[x] BufSet
- Thanks @polykernel!
[ ] ComptimeStringMap
[x] BoundedEnumMultiset (includes EnumMultiset)
- Should be serialized as a Getty Map. https://github.com/getty-zig/getty/issues/120#issuecomment-1657369596
- Thanks @polykernel!
[x] IndexedArray (includes EnumArray)
- Thanks @polykernel!
[x] IndexedSet
- Thanks @polykernel!
[x] IndexedMap
- Thanks @polykernel!
[x] LinearFifo
- Thanks @polykernel!
[x] HashMap (includes Auto, String)
[x] HashMapUnmanaged (includes Auto, String)
[x] SinglyLinkedList
[x] TailQueue
[ ] MultiArrayList
- Serialization support is done.
- Deserialization support broke due to https://github.com/ziglang/zig/commit/2c639d657002ac66749d08c4977cbb201d113ce1.

[ ] net.Address

Serialization support is done.

Deserialization support broke due to some memcpy issue in std net.zig:

$ zig build test
run deserialization test: error: thread 1007393 panic: @memcpy arguments alias
/Users/jason/.asdf/installs/zig/master/lib/std/net.zig:553:70: 0x102138407 in resolve (deserialization test)
        @memcpy(result.sa.addr[16 - index ..][0..index], ip_slice[0..index]);
                                                                 ^
/Users/jason/.asdf/installs/zig/master/lib/std/net.zig:85:54: 0x102138dc7 in resolveIp6 (deserialization test)
    return Address{ .in6 = try Ip6Address.resolve(buf, port) };
                                                 ^
/Users/jason/.asdf/installs/zig/master/lib/std/net.zig:58:23: 0x10213900f in resolveIp (deserialization test)
    if (resolveIp6(name, port)) |ip6| return ip6 else |err| switch (err) {
                  ^
/Users/jason/Projects/Personal/getty/src/de/blocks/net_address.zig:63:50: 0x10213a413 in test.deserialize - std.net.Address (deserialization test)
            .want = std.net.Address.resolveIp(ipv6, 0) catch return error.UnexpectedTestError,
                                             ^
/Users/jason/.asdf/installs/zig/master/lib/test_runner.zig:99:29: 0x1020cb44b in mainServer (deserialization test)
            test_fn.func() catch |err| switch (err) {
                        ^
/Users/jason/.asdf/installs/zig/master/lib/test_runner.zig:33:26: 0x1020c4d57 in main (deserialization test)
    return mainServer() catch @panic("internal test runner failure");
                     ^
/Users/jason/.asdf/installs/zig/master/lib/std/start.zig:598:22: 0x1020c48f7 in main (deserialization test)
        root.main();

[ ] net.Ip4Address (?, might be covered by net.Address)
[ ] net.Ip6Address (?, might be covered by net.Address)
[ ] net.AddressList
[x] PackedIntArrayEndian (includes PackedIntArray)
[x] PackedIntSliceEndian (includes PackedIntSlice)
[x] PriorityDequeue
- Thanks @polykernel!
[x] PriorityQueue
- Thanks @polykernel!
[x] SegmentedList
- Thanks @polykernel!
[x] SemanticVersion
[x] Uri
- Thanks @polykernel!

polykernel commented 1 year ago

How should a data structure with multiple possible ways of serialization be serialized? The motivating example is PriorityQueue, one possible serialization is taking elements in the order returned by popping the queue while another is to iterate over the queue with an iterator. Furthermore, for data structures such as EnumMultiset, there is not one obvious serialized form (i.e. a multiset can be serialized as a map or a list).

ibokuri commented 1 year ago

I usually try to follow these rules:

Do what most people would expect, or whatever would serve as a reasonable default.
The value being serialized shouldn't be modified.

So for PriorityQueue, I'd prefer iterating over it instead of popping values off to avoid modifying the queue.

As for EnumMultiset, serializing it as a sequence seems appropriate to me. I usually think of sets as sequences and the doc comment for EnumMultiset states that it's backed by a dense array.

ibokuri commented 1 year ago

@polykernel, after thinking for a bit, I feel like serializing EnumMultisets as maps (something like {"enum_foo": 1}, where 1 is the number of enum_foos in the set) makes more sense. Logically, serializing them as sequences seems nice but practically speaking that pretty much always just results in a ton of unnecessary tokens and parsing time for everybody.

Thoughts on representing EnumMultisets as maps instead?

polykernel commented 1 year ago

I think it is sensible to represent EnumMultisets as maps by default given multisets are usually represented as maps in practice, but it might be useful in some cases to serialize them as sequences. Perhaps, there could a block specific attribute to control the serialized format but I am not sure if having block specific attributes are desired or scalable.

ibokuri commented 1 year ago

Ahh okay, I haven't worked with multisets often so I wasn't aware that they're usually maps. I'll note that down in the original post.

polykernel commented 1 year ago

I think it is sensible to represent EnumMultisets as maps by default given multisets are usually represented as maps in practice

@ibokuri Sorry, I worded this terribly. By represented as maps, I actually mean implemented as/similarly to maps rather than represented as maps in serialized form. On second thought, I realized I overgeneralized the statement, I know in C++ (at least in libstdc++ and libc++), multiset is implemented like map except the value being stored is the same as the key, but I am definitely not qualified to assess what is usual implementation strategy of multiset is in general.

After some more pondering, I came up with a list comparing the advantages/disadvantages for both seq and map serialization, please let met know if there are points I missed.

# Seq
+ Preserves semantics: a multiset is semantically a type of unordered collection
- Redundant processing: multiplicity information is lost in the process of serialization
  which requires unnecessary processing by the receiving end to recover
- Succinctness: the size of the encoding is proportional to the number of values in the multiset

# Map
+ Succinctness: the size of the encoding is proportional to the number of unique values in the multiset
+ Readability: a key-value mapping is more readable than a sequence with unspecified ordering
- Breaks semantics: a multiset is not semantically equivalent to a map, but rather an unordered
  collection with additional information

Base on the comparison, it seems serializing to maps is the better option. Furthermore, it may be worthwhile to support deserializing from maps as well. I will take a shot at implementing this when I have some time.

getty-zig / getty

Add support for standard library types #120

General

The Hit List