getty-zig / getty

A (de)serialization framework for Zig
https://getty.so
MIT License
189 stars 13 forks source link

Struct field without default value gets freed causing segmentation fault, even when "skipped" #105

Closed Namek closed 1 year ago

Namek commented 1 year ago

Description

If there's a struct field that has no default value then during deserialization it gets freed even if the value wasn't inside the JSON file/slice. The skip attribute is ignored in getty's free() function.

main.zig:

const std = @import("std");
const json = @import("json");

const WorldState = struct {
    entities: std.ArrayList(Entity),

    pub fn deinit(self: *WorldState) void {
        self.entities.deinit();
    }
};

const Entity = struct {
    pub const @"getty.sb" = struct {
        pub const attributes = .{
            .Container = .{ .ignore_unknown_fields = true },
            ._cache = .{ .skip = true },
        };
    };
    pub const @"getty.db" = struct {
        pub const attributes = .{
            .Container = .{ .ignore_unknown_fields = true },
            ._cache = .{ .skip = true },
        };
    };

    id: u32 = 0,
    _cache: bool,
};

pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const json_text =
        \\{
        \\  
        \\  "entities": [
        \\     { "id": 32 }
        \\  ]
        \\}
    ;
    const world = try json.fromSlice(gpa.allocator(), *WorldState, json_text);
    std.log.debug("entity count: {d}", .{world.entities.items.len});
    world.deinit();
    gpa.allocator().destroy(world);
}
Segmentation fault at address 0xffffffffffffffff
C:\Users\Namek\AppData\Local\zig\p\1220488ea01ebb4cc8d568679d4f22ff3a5cbc1cbcf1a603ae9d9ad59dc7be6290fc\src\de\free.zig:106:49:
 0x7ff6bae36dfd in free__anon_9823 (bug.exe.obj)
                    if (!field.is_comptime) free(allocator, @field(value, field.name));
                                                ^
C:\Users\Namek\AppData\Local\zig\p\1220488ea01ebb4cc8d568679d4f22ff3a5cbc1cbcf1a603ae9d9ad59dc7be6290fc\src\de\free.zig:58:43: 0x7ff6bae28f32 in free__anon_9408 (bug.exe.obj)
                for (value.items) |v| free(allocator, v);
                                          ^
C:\Users\Namek\AppData\Local\zig\p\1220488ea01ebb4cc8d568679d4f22ff3a5cbc1cbcf1a603ae9d9ad59dc7be6290fc\src\de\free.zig:106:49:
 0x7ff6bae0fc59 in free__anon_8183 (bug.exe.obj)
                    if (!field.is_comptime) free(allocator, @field(value, field.name));
                                                ^
C:\Users\Namek\AppData\Local\zig\p\1220488ea01ebb4cc8d568679d4f22ff3a5cbc1cbcf1a603ae9d9ad59dc7be6290fc\src\de\free.zig:32:33: 0x7ff6bade34f5 in free__anon_5239 (bug.exe.obj)
                            free(allocator, value.*);
                                ^
C:\Users\Namek\AppData\Local\zig\p\1220488ea01ebb4cc8d568679d4f22ff3a5cbc1cbcf1a603ae9d9ad59dc7be6290fc\src\de\impls\visitor\pointer.zig:97:30: 0x7ff6bae367c2 in visitMap__anon_9761 (bug.exe.obj)
                errdefer free(a, value);
                             ^
???:?:?: 0xce1dbeffff in ??? (???)
???:?:?: 0xaaaaaaaaaaaaaaa9 in ??? (???)

which I pinned down to the call @field(value, field.name) in this line: if (!field.is_comptime) free(allocator, @field(value, field.name)); of getty's free.zig.

Workaround

The workaround is to define a default value for the field. Even undefined suffices:

_cache: bool = undefined,

The correct behavior and reasoning

Normally, if the field has no default value then we get the MissingField error because it means the field is not optional. We can observe that behaviour with this example (with no segfault):

const std = @import("std");
const json = @import("json");

const WorldState = struct {
    pub const @"getty.sb" = struct {
        pub const attributes = .{
            .Container = .{ .ignore_unknown_fields = true },
            ._cache = .{ .skip = true },
        };
    };
    pub const @"getty.db" = struct {
        pub const attributes = .{
            .Container = .{ .ignore_unknown_fields = true },
            ._cache = .{ .skip = true },
        };
    };

    number: u32 = 0,
    _cache: bool,
};

pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const json_text =
        \\{
        \\  "number": 32
        \\}
    ;
    const world = try json.fromSlice(gpa.allocator(), *WorldState, json_text);
    gpa.allocator().destroy(world);
}

However, the skip attribute is still ignored. I don't want to specify a default for the field because I may miss its proper initialization in the rest of the code, besides the deserialization itself.

How to Reproduce the Bug

Just run the code https://gist.github.com/Namek/c597e812d4a72df5e5e22d95edc3bb97

Additional Context

Quickly searching through Zig compiler's issues I've found this: https://github.com/ziglang/zig/issues/14712 but I have not tested whether that changes anything.

In any way, I'd like the skip attribute to be honored during the free procedure.

ibokuri commented 1 year ago

free will be reworked in https://github.com/getty-zig/getty/issues/106, and part of that will include adding support for attributes. So hopefully this can be resolved once that ticket's done.

ibokuri commented 1 year ago

So that everyone's on the same page, the reason there's a segfault in the first code example is because the _cache field isn't initialized anywhere during deserialization, so the struct visitor's visitMap method returns a MissingField error to the pointer visitor's visitMap method, causing it to execute its errdefer code which calls free. free goes through each field of the partially initialized Entity value and attempts to free them, but since _cache is uninitialized, we get a segfault. The cleanup code in the pointer visitor shouldn't be calling free but rather allocator.destroy. Making this change results in MissingField being returned for the first code example.

With that in mind, could you clarify a few things for me?

1)

What is the exact behavior you're expecting to see for the first code example you gave? Specifically, the output/error you're expecting to see after a zig build run. It seems like you want a MissingField error to be returned, is that correct?

2)

I don't want to specify a default for the field because I may miss its proper initialization in the rest of the code, besides the deserialization itself.

I'm not exactly sure what you mean here or what your concern is with setting a default value in this case.

To me, it seems perfectly reasonable to set a default value of false or true to _cache (or turning it into ?bool and setting null). The incoming JSON you're deserializing from clearly doesn't require _cache to be set, so that kind of implies that there's an assumed default value for _cache, no?

Namek commented 1 year ago

1)

Yes.

2)

Example:

const world: WorldState = WorldState{
    .number = 6
};

This will not compile because the _cache field has no default specified. And I want it that way.

The example I gave is the smallest I could find to present the segfault.

ibokuri commented 1 year ago

Thanks!

Can you try out the latest version of Getty JSON? The pointer visitor has been fixed and should now return MissingField for your code example.

As for free, I don't think it makes sense to skip fields anymore. free is intended to be used to free values that are 1) deserialized by Getty, and 2) properly and fully initialized (including skipped fields). Passing it partially created values is always gonna be UB, which is what we were running into here with the pointer visitor. But since the visitor no longer calls free in its cleanup code, we should be good.

I'll leave the issue open for now in case your problem isn't solved with the new changes. But feel free to close it if it did.