Open ibokuri opened 11 months ago
After a bit of thought, I've come to the following design for the Lifetime
declaration.
StringLifetime
Static
: The value lives for the lifetime of the program.Stack
: The value lives on the stack.Heap
: The value lives on the heap.Owned
: The value lives on the stack or heap and is managed by some entity.ValueLifetime
Static
: The value lives for the lifetime of the program.Heap
: The value lives on the heap.Owned
: The value lives on the stack or heap and is managed by some entity.Stack
values must be copied.Static
andHeap
values can be copied or used directly.Owned
values must be copied unless the visitor knows that the value's lifetime is safe, in which case it may be used directly.Static
andStack
values never need to be deallocated.Heap
values never have to be deallocated. In the deserializer, Heap
values must be deallocated if the visitor didn't use it.Owned
values never have to be deallocated. In the deserializer, Owned
values can:
Same as "Visitor Methods - Usage".
Same as "Visitor Methods - Deallocation" except that in the visitor, Heap
values must be deallocated upon error or if the value is not returned.
Edit: Static
messes up getty.de.free
, so I'm removing it from the proposal. How often does one return a static string directly for deserialization anyways...
One problem with the current solution for the deserializer portion of the issue is that it only enables visitors to use a single, contiguous slice from visitString
's input. If the visitor wants to use non-contiguous portions of input
as part of its final value, there needs to be a way for the visitor to specify the starting and ending positions of each slice it uses.
To fix this, we can modify the return type of visitString
to become the following:
// The starting and ending positions of a string.
pub const Range = struct {
start: usize,
end: usize,
};
// Indicates to the deserializer which parts of `input` were used directly by
// the visitor.
//
// If a visitor uses the entirety of `input` as part of its final value, then the
// `Single` variant can be used, which doesn't require any extra allocations.
// Otherwise, the visitor will need to allocate a slice for `Multiple`, which the
// deserializer will clean up afterwards.
pub const Used = union(enum) {
Single: Range,
Multiple: []const Range,
};
// The new return type of `visitString`.
//
// If `used` is `null`, then the visitor did not use `input` as part of its final
// return value.
pub fn Return(comptime T: type) type {
return struct {
value: T,
used: ?Used = null,
};
}
For example, suppose a user wants to deserialize "John Doe"
into the type struct { first: []u8, last: []u8 }
, where first
is John
and last
is Doe
. As it is now, they'd be force to either make a copy of both John
and Doe
, or use one of the names directly and copy the other. However, with this new return type, they can assign input[0..4]
to first
and input[5..]
to last
, allocate a slice for the used
field in the return value, populate it appropriately, and they're done!
As a sort of sanity check (and to give me a bit of motivation to work on this), I did some very simple deserialization benchmarking with some JSON data that had lots of strings. I deserialized the input data into a struct 100,000 times with both Getty JSON and std.json
, and as expected Getty JSON was much slower.
const std = @import("std");
const json = @import("json");
const c_ally = std.heap.c_allocator;
const T = []struct {
id: []const u8,
type: []const u8,
name: []const u8,
ppu: f64,
batters: struct {
batter: []struct {
id: []const u8,
type: []const u8,
},
},
topping: []struct {
id: []const u8,
type: []const u8,
},
};
const input =
\\[
\\ {
\\ "id": "0001",
\\ "type": "donut",
\\ "name": "Cake",
\\ "ppu": 0.55,
\\ "batters": {
\\ "batter": [
\\ {
\\ "id": "1001",
\\ "type": "Regular"
\\ },
\\ {
\\ "id": "1002",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "1003",
\\ "type": "Blueberry"
\\ },
\\ {
\\ "id": "1004",
\\ "type": "Devil's Food"
\\ }
\\ ]
\\ },
\\ "topping": [
\\ {
\\ "id": "5001",
\\ "type": "None"
\\ },
\\ {
\\ "id": "5002",
\\ "type": "Glazed"
\\ },
\\ {
\\ "id": "5005",
\\ "type": "Sugar"
\\ },
\\ {
\\ "id": "5007",
\\ "type": "Powdered Sugar"
\\ },
\\ {
\\ "id": "5006",
\\ "type": "Chocolate with Sprinkles"
\\ },
\\ {
\\ "id": "5003",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "5004",
\\ "type": "Maple"
\\ }
\\ ]
\\ },
\\ {
\\ "id": "0002",
\\ "type": "donut",
\\ "name": "Raised",
\\ "ppu": 0.55,
\\ "batters": {
\\ "batter": [
\\ {
\\ "id": "1001",
\\ "type": "Regular"
\\ }
\\ ]
\\ },
\\ "topping": [
\\ {
\\ "id": "5001",
\\ "type": "None"
\\ },
\\ {
\\ "id": "5002",
\\ "type": "Glazed"
\\ },
\\ {
\\ "id": "5005",
\\ "type": "Sugar"
\\ },
\\ {
\\ "id": "5003",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "5004",
\\ "type": "Maple"
\\ }
\\ ]
\\ },
\\ {
\\ "id": "0003",
\\ "type": "donut",
\\ "name": "Old Fashioned",
\\ "ppu": 0.55,
\\ "batters": {
\\ "batter": [
\\ {
\\ "id": "1001",
\\ "type": "Regular"
\\ },
\\ {
\\ "id": "1002",
\\ "type": "Chocolate"
\\ }
\\ ]
\\ },
\\ "topping": [
\\ {
\\ "id": "5001",
\\ "type": "None"
\\ },
\\ {
\\ "id": "5002",
\\ "type": "Glazed"
\\ },
\\ {
\\ "id": "5003",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "5004",
\\ "type": "Maple"
\\ }
\\ ]
\\ }
\\]
;
fn stdJson() !void {
for (0..100_000) |_| {
const output = try std.json.parseFromSlice(T, c_ally, input, .{});
defer output.deinit();
}
}
fn gettyJson() !void {
for (0..100_000) |_| {
const output = try json.fromSlice(c_ally, T, input);
defer json.de.free(c_ally, output, null);
}
}
fn gettyJsonArena() !void {
for (0..100_000) |_| {
var arena = std.heap.ArenaAllocator.init(c_ally);
const arena_ally = arena.allocator();
defer arena.deinit();
_ = try json.fromSlice(arena_ally, T, input);
}
}
pub fn main() !void {
//try gettyJson();
//try gettyJsonArena();
//try stdJson();
}
$ hyperfine --warmup 5 ./getty ./getty-arena ./std
Benchmark 1: ./getty
Time (mean ± σ): 718.4 ms ± 3.7 ms [User: 713.4 ms, System: 3.5 ms]
Range (min … max): 715.6 ms … 727.9 ms 10 runs
Benchmark 2: ./getty-arena
Time (mean ± σ): 628.7 ms ± 1.5 ms [User: 622.5 ms, System: 4.6 ms]
Range (min … max): 626.6 ms … 631.3 ms 10 runs
Benchmark 3: ./std
Time (mean ± σ): 482.7 ms ± 1.5 ms [User: 476.2 ms, System: 4.9 ms]
Range (min … max): 481.2 ms … 486.4 ms 10 runs
Summary
./std ran
1.30 ± 0.01 times faster than ./getty-arena
1.49 ± 0.01 times faster than ./getty
The unnecessary allocations are, surely, the main factor for Getty's slowness. However, I should note that std.json.parseFromSlice
uses an arena allocator implicitly whereas json.fromSlice
does not, and you can see from the results that using an arena does indeed make a difference.
With or without an arena, though, Getty's still much slower. So it's time to get started on this issue!
Removed accepted label for now since, as pointed out by fredi, there are major issues with implementing this kind of thing, which I've unfortunately had the chance to run into in my own branch.
For one thing, the multiple ranges idea was a total non-starter. I think my brain farted when I came up with that. The input for visitString
is always a slice, which is a single allocation. Ya can't just free individual pieces of a single allocation. So now we have no way of letting visitors use only parts of input
.
Another issue is getty.de.free
. How will it be able to know if the strings of a value, especially if they have different lifetimes, should be freed or not?
Before I can implement the lifetime optimizations, I had to do a bit of general allocation work beforehand. The above, merged PR implements that allocation work.
In summary, Getty now uses an arena internally for all allocations. This simplifies visitors and deserialization blocks as they no longer have to worry about freeing values and allows end users to free everything whenever they want. Additionally, the arena is passed to the methods of Deserializer implementations so they're simplified a tad as well.
Big thanks to fredi for discussing all this with me and steering me in the right direction :D
The last half consists of the lifetime work, which consists of two parts:
StringLifetime
and ValueLifetime
types.visitString
to return not only the produced value but also an indicator of whether or not the input string was used directly in the final, returned value.The lifetime types will be more or less the same as what I've already proposed:
For visitString
, StringLifetime
will have the following variants:
Stack
: The value is on the stack and its lifetime is shorter than the deserialization process.Heap
: The value is on the heap and its lifetime is tied to the arena allocator returned to the end user.Managed
: The value is on the stack or heap and its lifetime is managed by an entity that is not the arena allocator returned to the end user.For access methods (e.g., nextKeySeed
), ValueLifetime
will have the following variants:
Heap
: The value is on the heap and its lifetime is tied to the arena allocator returned to the end user.Managed
: The value is on the stack or heap and its lifetime is managed by an entity that is not the arena allocator returned to the end user.visitString
's Return TypeBefore, I proposed returning from visitString
the produced value and a slice indicating what part of the input
parameter was used.
With the arena, the slice is no longer necessary. A simple bool
will suffice, where true
means that input
was used directly.
Performance updates after arena changes (using same benchmarking code):
$ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
Time (mean ± σ): 688.8 ms ± 1.9 ms [User: 684.7 ms, System: 3.0 ms]
Range (min … max): 685.8 ms … 692.3 ms 10 runs
Benchmark 2: ./std
Time (mean ± σ): 484.0 ms ± 2.7 ms [User: 480.7 ms, System: 2.2 ms]
Range (min … max): 481.3 ms … 489.1 ms 10 runs
Summary
./std ran
1.42 ± 0.01 times faster than ./getty
Slightly slower than getty-arena
was, but faster than the old getty
version. Since there's no lifetimes right now, Getty doesn't free anything at all, including struct keys. Perhaps keeping around all that cruft all the time is slowing things down?
Performance update after some optimizations in getty-json (no peeking, always allocating strings, heap branch first).
Note that std's runtime has increased overall b/c we're now correctly passing in .alloc_always
. Beforehand, it was returning a slice into the scanner's input which, if we hadn't used a static string input, would be completely invalid.
$ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
Time (mean ± σ): 678.8 ms ± 1.3 ms [User: 675.0 ms, System: 2.9 ms]
Range (min … max): 676.3 ms … 680.7 ms 10 runs
Benchmark 2: ./std
Time (mean ± σ): 573.4 ms ± 2.8 ms [User: 567.5 ms, System: 4.6 ms]
Range (min … max): 569.5 ms … 578.8 ms 10 runs
Summary
./std ran
1.18 ± 0.01 times faster than ./getty
We shaved off around 10ms.
Problem
There is no way for a visitor to know if a pointer value it has received is:
Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.
Proposal
To fix this, the following things need to be added to Getty:
A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.
value
parameter invisitString
and the return value of access methods (e.g.,nextKeySeed
,nextElementSeed
).A way for deserializers to know if
visitString
is using the slice as part of the final value, and how much of that slice is being used.Part One: The Visitor
How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?
To solve this, we can do the following:
Define the following type:
⚠️ Edit: See this comment for new
Lifetime
design. ⚠️The type will indicate the lifetime and ownership properties of pointer values passed to visitors:
Stack
: The value lives on the stack and its lifetime is shorter than the deserialization process.Heap
: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.Owned
: The value lives on the stack or heap and its lifetime is managed by some entity.Owned
value's lifetime is safe, they must always copy such values.When should visitors free the pointer values they receive?
Stack
orOwned
values should never be freed by the visitor.Stack
values will be automatically cleaned up by the compiler, obviously.Owned
values will be cleaned up eventually after deserialization is finished by the entity that owns them.Heap
values passed tovisitString
should never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it.Heap
values returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.Add a
lifetime
parameter tovisitString
that specifies theLifetime
ofinput
.Remove the
is*Allocated
methods from access interfaces. WithLifetime
, we don't need them anymore.Modify the successful return type of access methods to be:
With these changes, visitors can do the following:
Part Two: The Deserializer
How does a deserializer know if
visitString
is using the slice as part of the final value, and how much of that slice is being used?Before diving in, there are a few things to keep in mind:
Heap
values.Stack
values are obviously managed automatically by the compiler.Owned
values are managed outside the deserialization process, so functions likedeserializeString
don't need to worry about them.visitString
might not be a string at all, so we shouldn't rely solely onvisitString
's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.In any case, to solve this, we can do the following:
⚠️ Edit: See this comment for new solution. ⚠️
Change the return type of
visitString
to the following:indices
isnull
, then that meansvisitString
did not useinput
as part of its return value. In which case, the deserializer should freevalue
afterwards.indices
is notnull
, then that meansvisitString
did useinput
as part of its return value.start
andend
specifies the starting and ending indices ininput
at whichvisitString
's return value begins and ends.With this new
indices
field, the deserializer now knows 1) if the visitor is usinginput
directly in its return value, and 2) how much ofinput
is being used.input
is being used, then the deserializer should not freeinput
after callingvisitString
.input
is being used, then the deserializer can usestart
andend
to determine the remaining parts ofinput
that should be freed.