getty-zig / getty

A (de)serialization framework for Zig
https://getty.so
MIT License
183 stars 14 forks source link

Avoiding unnecessary memory allocations in visitors #135

Open ibokuri opened 11 months ago

ibokuri commented 11 months ago

Problem

There is no way for a visitor to know if a pointer value it has received is:

  1. Allocated.
  2. Safe to use as part of the return value.

Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.

Proposal

To fix this, the following things need to be added to Getty:

  1. A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.

    • There are two ways for a visitor to receive a pointer value from a deserializer: the value parameter in visitString and the return value of access methods (e.g., nextKeySeed, nextElementSeed).
  2. A way for deserializers to know if visitString is using the slice as part of the final value, and how much of that slice is being used.

Part One: The Visitor

How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?


To solve this, we can do the following:

  1. Define the following type:

    ⚠️ Edit: See this comment for new Lifetime design. ⚠️

    pub const Lifetime = enum {
        Stack,
        Heap,
        Owned,
    }
    • The type will indicate the lifetime and ownership properties of pointer values passed to visitors:

      1. Stack: The value lives on the stack and its lifetime is shorter than the deserialization process.
        • The value must be copied by the visitor.
      2. Heap: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.
        • The value can either be copied or returned directly.
      3. Owned: The value lives on the stack or heap and its lifetime is managed by some entity.
        • The value can either be copied by the visitor or returned directly if the visitor understands and deems the value's lifetime as safe.
        • Since Getty's default visitors won't have enough info to determine whether an Owned value's lifetime is safe, they must always copy such values.
    • When should visitors free the pointer values they receive?

      1. Stack or Owned values should never be freed by the visitor.
        • Stack values will be automatically cleaned up by the compiler, obviously.
        • Owned values will be cleaned up eventually after deserialization is finished by the entity that owns them.
      2. Heap values passed to visitString should never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it.
      3. Heap values returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.
  2. Add a lifetime parameter to visitString that specifies the Lifetime of input.

  3. Remove the is*Allocated methods from access interfaces. With Lifetime, we don't need them anymore.

  4. Modify the successful return type of access methods to be:

    struct {
        data: @TypeOf(seed).Value, // This may be optional, depending on the access method.
        lifetime: Lifetime,
    }

With these changes, visitors can do the following:

// in visitString...
switch (lifetime) {
  .Stack => // Make a copy of `input`
  .Owned, .Heap => // Make a copy of `input` or return it directly
}

// in visitMap...
while (try map.nextKey(ally, Key)) |key| {
    switch (key.lifetime) {
      .Stack => // Make a copy of `key.data`
      .Owned => // Make a copy of `key.data` or return it directly
      .Heap  => // Make a copy of `key.data` or return it directly & free it as necessary
    }
}

Part Two: The Deserializer

How does a deserializer know if visitString is using the slice as part of the final value, and how much of that slice is being used?


Before diving in, there are a few things to keep in mind:

  1. Access methods are irrelevant for this part. Deserializers will never see them again so no need to worry about them.
  2. This part only apply to Heap values.
    • Stack values are obviously managed automatically by the compiler.
    • Owned values are managed outside the deserialization process, so functions like deserializeString don't need to worry about them.
  3. The return value of visitString might not be a string at all, so we shouldn't rely solely on visitString's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.

In any case, to solve this, we can do the following:

⚠️ Edit: See this comment for new solution. ⚠️

  1. Change the return type of visitString to the following:

    const Return = struct {
        value: Value,
        indices: ?struct { start: usize, end: usize } = null,
    };
    
    fn visitString(
        self: Self,
        ally: ?std.mem.Allocator,
        comptime Deserializer: type,
        input: anytype,
    ) Deserializer.Error!Return
    • If indices is null, then that means visitString did not use input as part of its return value. In which case, the deserializer should free value afterwards.
    • If indices is not null, then that means visitString did use input as part of its return value. start and end specifies the starting and ending indices in input at which visitString's return value begins and ends.
  2. With this new indices field, the deserializer now knows 1) if the visitor is using input directly in its return value, and 2) how much of input is being used.

    • If the entirety of input is being used, then the deserializer should not free input after calling visitString.
    • If only a partial slice of input is being used, then the deserializer can use start and end to determine the remaining parts of input that should be freed.
ibokuri commented 10 months ago

New Lifetimes

After a bit of thought, I've come to the following design for the Lifetime declaration.

For Visitor Methods: StringLifetime

For Access Methods: ValueLifetime

Visitor Methods

Usage

Deallocation

Access Methods

Usage

Same as "Visitor Methods - Usage".

Deallocation

Same as "Visitor Methods - Deallocation" except that in the visitor, Heap values must be deallocated upon error or if the value is not returned.


Edit: Static messes up getty.de.free, so I'm removing it from the proposal. How often does one return a static string directly for deserialization anyways...

ibokuri commented 10 months ago

New Indices

One problem with the current solution for the deserializer portion of the issue is that it only enables visitors to use a single, contiguous slice from visitString's input. If the visitor wants to use non-contiguous portions of input as part of its final value, there needs to be a way for the visitor to specify the starting and ending positions of each slice it uses.

To fix this, we can modify the return type of visitString to become the following:

// The starting and ending positions of a string.
pub const Range = struct {
    start: usize,
    end: usize,
};

// Indicates to the deserializer which parts of `input` were used directly by
// the visitor.
//
// If a visitor uses the entirety of `input` as part of its final value, then the
// `Single` variant can be used, which doesn't require any extra allocations.
// Otherwise, the visitor will need to allocate a slice for `Multiple`, which the
// deserializer will clean up afterwards.
pub const Used = union(enum) {
    Single: Range,
    Multiple: []const Range,
};

// The new return type of `visitString`.
//
// If `used` is `null`, then the visitor did not use `input` as part of its final
// return value.
pub fn Return(comptime T: type) type {
    return struct {
        value: T,
        used: ?Used = null,
    };
}

For example, suppose a user wants to deserialize "John Doe" into the type struct { first: []u8, last: []u8 }, where first is John and last is Doe. As it is now, they'd be force to either make a copy of both John and Doe, or use one of the names directly and copy the other. However, with this new return type, they can assign input[0..4] to first and input[5..] to last, allocate a slice for the used field in the return value, populate it appropriately, and they're done!

ibokuri commented 10 months ago

As a sort of sanity check (and to give me a bit of motivation to work on this), I did some very simple deserialization benchmarking with some JSON data that had lots of strings. I deserialized the input data into a struct 100,000 times with both Getty JSON and std.json, and as expected Getty JSON was much slower.

const std = @import("std");
const json = @import("json");

const c_ally = std.heap.c_allocator;

const T = []struct {
    id: []const u8,
    type: []const u8,
    name: []const u8,
    ppu: f64,
    batters: struct {
        batter: []struct {
            id: []const u8,
            type: []const u8,
        },
    },
    topping: []struct {
        id: []const u8,
        type: []const u8,
    },
};

const input =
    \\[
    \\    {
    \\        "id": "0001",
    \\        "type": "donut",
    \\        "name": "Cake",
    \\        "ppu": 0.55,
    \\        "batters": {
    \\            "batter": [
    \\                {
    \\                    "id": "1001",
    \\                    "type": "Regular"
    \\                },
    \\                {
    \\                    "id": "1002",
    \\                    "type": "Chocolate"
    \\                },
    \\                {
    \\                    "id": "1003",
    \\                    "type": "Blueberry"
    \\                },
    \\                {
    \\                    "id": "1004",
    \\                    "type": "Devil's Food"
    \\                }
    \\            ]
    \\        },
    \\        "topping": [
    \\            {
    \\                "id": "5001",
    \\                "type": "None"
    \\            },
    \\            {
    \\                "id": "5002",
    \\                "type": "Glazed"
    \\            },
    \\            {
    \\                "id": "5005",
    \\                "type": "Sugar"
    \\            },
    \\            {
    \\                "id": "5007",
    \\                "type": "Powdered Sugar"
    \\            },
    \\            {
    \\                "id": "5006",
    \\                "type": "Chocolate with Sprinkles"
    \\            },
    \\            {
    \\                "id": "5003",
    \\                "type": "Chocolate"
    \\            },
    \\            {
    \\                "id": "5004",
    \\                "type": "Maple"
    \\            }
    \\        ]
    \\    },
    \\    {
    \\        "id": "0002",
    \\        "type": "donut",
    \\        "name": "Raised",
    \\        "ppu": 0.55,
    \\        "batters": {
    \\            "batter": [
    \\                {
    \\                    "id": "1001",
    \\                    "type": "Regular"
    \\                }
    \\            ]
    \\        },
    \\        "topping": [
    \\            {
    \\                "id": "5001",
    \\                "type": "None"
    \\            },
    \\            {
    \\                "id": "5002",
    \\                "type": "Glazed"
    \\            },
    \\            {
    \\                "id": "5005",
    \\                "type": "Sugar"
    \\            },
    \\            {
    \\                "id": "5003",
    \\                "type": "Chocolate"
    \\            },
    \\            {
    \\                "id": "5004",
    \\                "type": "Maple"
    \\            }
    \\        ]
    \\    },
    \\    {
    \\        "id": "0003",
    \\        "type": "donut",
    \\        "name": "Old Fashioned",
    \\        "ppu": 0.55,
    \\        "batters": {
    \\            "batter": [
    \\                {
    \\                    "id": "1001",
    \\                    "type": "Regular"
    \\                },
    \\                {
    \\                    "id": "1002",
    \\                    "type": "Chocolate"
    \\                }
    \\            ]
    \\        },
    \\        "topping": [
    \\            {
    \\                "id": "5001",
    \\                "type": "None"
    \\            },
    \\            {
    \\                "id": "5002",
    \\                "type": "Glazed"
    \\            },
    \\            {
    \\                "id": "5003",
    \\                "type": "Chocolate"
    \\            },
    \\            {
    \\                "id": "5004",
    \\                "type": "Maple"
    \\            }
    \\        ]
    \\    }
    \\]
;

fn stdJson() !void {
    for (0..100_000) |_| {
        const output = try std.json.parseFromSlice(T, c_ally, input, .{});
        defer output.deinit();
    }
}

fn gettyJson() !void {
    for (0..100_000) |_| {
        const output = try json.fromSlice(c_ally, T, input);
        defer json.de.free(c_ally, output, null);
    }
}

fn gettyJsonArena() !void {
    for (0..100_000) |_| {
        var arena = std.heap.ArenaAllocator.init(c_ally);
        const arena_ally = arena.allocator();
        defer arena.deinit();

        _ = try json.fromSlice(arena_ally, T, input);
    }
}

pub fn main() !void {
    //try gettyJson();
    //try gettyJsonArena();
    //try stdJson();
}
$ hyperfine --warmup 5 ./getty ./getty-arena ./std
Benchmark 1: ./getty
  Time (mean ± σ):     718.4 ms ±   3.7 ms    [User: 713.4 ms, System: 3.5 ms]
  Range (min … max):   715.6 ms … 727.9 ms    10 runs

Benchmark 2: ./getty-arena
  Time (mean ± σ):     628.7 ms ±   1.5 ms    [User: 622.5 ms, System: 4.6 ms]
  Range (min … max):   626.6 ms … 631.3 ms    10 runs

Benchmark 3: ./std
  Time (mean ± σ):     482.7 ms ±   1.5 ms    [User: 476.2 ms, System: 4.9 ms]
  Range (min … max):   481.2 ms … 486.4 ms    10 runs

Summary
  ./std ran
    1.30 ± 0.01 times faster than ./getty-arena
    1.49 ± 0.01 times faster than ./getty

The unnecessary allocations are, surely, the main factor for Getty's slowness. However, I should note that std.json.parseFromSlice uses an arena allocator implicitly whereas json.fromSlice does not, and you can see from the results that using an arena does indeed make a difference.

With or without an arena, though, Getty's still much slower. So it's time to get started on this issue!

ibokuri commented 10 months ago

Removed accepted label for now since, as pointed out by fredi, there are major issues with implementing this kind of thing, which I've unfortunately had the chance to run into in my own branch.

For one thing, the multiple ranges idea was a total non-starter. I think my brain farted when I came up with that. The input for visitString is always a slice, which is a single allocation. Ya can't just free individual pieces of a single allocation. So now we have no way of letting visitors use only parts of input.

Another issue is getty.de.free. How will it be able to know if the strings of a value, especially if they have different lifetimes, should be freed or not?

ibokuri commented 10 months ago

Before I can implement the lifetime optimizations, I had to do a bit of general allocation work beforehand. The above, merged PR implements that allocation work.

In summary, Getty now uses an arena internally for all allocations. This simplifies visitors and deserialization blocks as they no longer have to worry about freeing values and allows end users to free everything whenever they want. Additionally, the arena is passed to the methods of Deserializer implementations so they're simplified a tad as well.

Big thanks to fredi for discussing all this with me and steering me in the right direction :D

ibokuri commented 10 months ago

The last half consists of the lifetime work, which consists of two parts:

Lifetimes

The lifetime types will be more or less the same as what I've already proposed:

visitString's Return Type

Before, I proposed returning from visitString the produced value and a slice indicating what part of the input parameter was used.

With the arena, the slice is no longer necessary. A simple bool will suffice, where true means that input was used directly.

ibokuri commented 10 months ago

Performance updates after arena changes (using same benchmarking code):

$ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
  Time (mean ± σ):     688.8 ms ±   1.9 ms    [User: 684.7 ms, System: 3.0 ms]
  Range (min … max):   685.8 ms … 692.3 ms    10 runs

Benchmark 2: ./std
  Time (mean ± σ):     484.0 ms ±   2.7 ms    [User: 480.7 ms, System: 2.2 ms]
  Range (min … max):   481.3 ms … 489.1 ms    10 runs

Summary
  ./std ran
    1.42 ± 0.01 times faster than ./getty

Slightly slower than getty-arena was, but faster than the old getty version. Since there's no lifetimes right now, Getty doesn't free anything at all, including struct keys. Perhaps keeping around all that cruft all the time is slowing things down?

ibokuri commented 10 months ago

Performance update after some optimizations in getty-json (no peeking, always allocating strings, heap branch first).

Note that std's runtime has increased overall b/c we're now correctly passing in .alloc_always. Beforehand, it was returning a slice into the scanner's input which, if we hadn't used a static string input, would be completely invalid.

$ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
  Time (mean ± σ):     678.8 ms ±   1.3 ms    [User: 675.0 ms, System: 2.9 ms]
  Range (min … max):   676.3 ms … 680.7 ms    10 runs

Benchmark 2: ./std
  Time (mean ± σ):     573.4 ms ±   2.8 ms    [User: 567.5 ms, System: 4.6 ms]
  Range (min … max):   569.5 ms … 578.8 ms    10 runs

Summary
  ./std ran
    1.18 ± 0.01 times faster than ./getty

We shaved off around 10ms.