getty-zig / getty

A (de)serialization framework for Zig
https://getty.so
MIT License
183 stars 14 forks source link

Add `isElementAllocated`/`isValueAllocated`/`isPayloadAllocated` functions to SeqAccess/MapAccess/VariantAccess interface #127

Closed polykernel closed 1 year ago

polykernel commented 1 year ago

Problem

There are cases where it is desirable to know whether the last element deserialized by nextElementSeed, the last value deserialized by nextValueSeed or the last variant deserialized by payloadSeed is allocated on the heap. For instance, a data structure may apply some transformation to an element to be inserted such that the original value is unneeded, or a data structure may create defensive copies of inserted values making the original values unreachable.

I observed this use case while working working on (de)serialization support for std.IndexedSet as part of https://github.com/getty-zig/getty/issues/120 which translate values inserted into it to indices, making the values themselves ephemeral.

Proposal

Add the functions isElementAllocated, isValueAllocated, isPayloadAllocated functions to SeqAccess, MapAccess and VariantAccess interface respectively. This allows checking the storage location of a value deserialized from access methods (e.g., nextKey) and makes it possible to conditionally free the value only if it is allocated on the heap. Using these functions permit deserialization without an allocator when no allocation would be performed, following the principle of least privilege. For example,

while (try seq.nextElement(allocator, T)) |elem| {
  defer if (seq.isElementAllocated(T)) {
    free(allocator.?, Deserializer, elem);
  }
  ...
}
while (try map.nextKey(allocator, K)) |k| {
  ...
  const v = try map.nextValue(allocator, V);
  defer if (map.isValueAllocated(V)) {
    free(allocator.?, Deserializer, v);
  }
  ...
}

Alternatives

Leave the status quo as is, the described problem can be remedied by unconditionally freeing the last deserialized element/values. For example,

while (try seq.nextElement(allocator, T)) |elem| {
  defer free(allocator.?, Deserializer, elem);
  ...
}
while (try map.nextKey(allocator, K)) |k| {
  ...
  const v = try map.nextValue(allocator, V);
  defer free(allocator.?, Deserializer, v);
  ...
}

This works because types which are not allocated do not implement a free function however it has the consequence of always requiring an allocator for deserialization even if when no allocation would be performed

Additional Context

No response

ibokuri commented 1 year ago

Can you update the proposal so that it includes adding an isPayloadAllocated to the VariantAccess interface? That way, everything will be consistent across the access interfaces and would give unions the same benefits that you're describing for sequences and maps.

Not to mention, the consistency would allow me to finally define (in a sane manner) Getty's allocation story during deserialization:


  1. Generally speaking, "where applicable" means whenever a deserialized value is not part of the final value returned to the end user. For instance, if a heap-allocated string is visited to produce an integer, the string isn't part of the final, returned integer and so we'd free it even upon success.
polykernel commented 1 year ago

Sorry for the late response, my availability have been sporadic for the last few days. I have updated the proposal to include isPayloadAllocated for the VariantAccess interface. As for the definition of Getty's allocation flow, I think it might be worthwhile to also define the responsibility of the free function clearly. It is my understanding that free is intended only for freeing a value in after it has been completely deserialized, thus if a value is incompletely deserialized (i.e. an error has occurred during its deserialization), custom logic may be needed to free the incomplete value. It turns out that for a vast number of types, the logic for both cases is exactly the same but there are some notably exceptions such as arrays and other fixed-size data structures.

ibokuri commented 1 year ago

Sorry for the late response, my availability have been sporadic for the last few days.

No worries!

I think it might be worthwhile to also define the responsibility of the free function clearly. It is my understanding that free is intended only for freeing a value in after it has been completely deserialized [...]

Yep. I had thought I added it to the doc comment for free but apparently not. I'll make a todo for that. I also need to update that doc comment anyways since the allocation model described in it is outdated at this point...

[...] if a value is incompletely deserialized (i.e. an error has occurred during its deserialization), custom logic may be needed to free the incomplete value. It turns out that for a vast number of types, the logic for both cases is exactly the same but there are some notably exceptions such as arrays and other fixed-size data structures.

Right. I think the visitors that Getty define handle incomplete values properly with custom logic. But custom visitors written by users should know that they should use custom logic as well in those cases, so a doc comment would be nice for them.

ibokuri commented 1 year ago

I'll assign this to you for now since I think you had some work started already. Let me know if you can't or don't want to do it. I don't mind implementing it 👍

polykernel commented 1 year ago

Yeah sure, I just need to patch up my changes a bit and add some documentation.