Hejsil / mecha

A parser combinator library for Zig
MIT License
413 stars 18 forks source link

Allow Allocation #22

Closed erooke closed 3 years ago

erooke commented 3 years ago

Attempts to address #21

Making manyN, manyRange, and many allocate the result of what they parse is not addressed.

data-man commented 3 years ago

Sorry, I don't like this. Personally for me, parsing without allocations is the most important feature of mecha. We need to find a better way.

Hejsil commented 3 years ago

Sorry, I don't like this. Personally for me, parsing without allocations is the most important feature of mecha. We need to find a better way.

All existing parsing code will still not do any allocation and it will be pretty straight forward to avoid allocation for people who don't want it (just pass a failing allocator to your parser). Aallocators are for the parsers that actually need allocation to work, such as most recursive parsers that construct trees. It will also allows us to have a version of many which actually collects the results instead of just returning the parsed string (which would most likely need to be reparsed). Do you have any specific reason in mind as to why we should not have the option to allocate memory?

Hejsil commented 3 years ago

@data-man Wonna elaborate? Otherwise I would like to get this merged.

data-man commented 3 years ago

@Hejsil I'm working on benchmarks. It doesn't take much time.

data-man commented 3 years ago
const std = @import("std");

usingnamespace @import("mecha.zig");

const Rgb = struct {
    r: u8,
    g: u8,
    b: u8,
};

fn toByte(v: u8) u8 {
    return v * 0x10 + v;
}

fn toByte2(v: [2]u8) u8 {
    return v[0] * 0x10 + v[1];
}

const hex = convert(u8, toInt(u8, 16), asStr(ascii.digit(16)));
const hex1 = map(u8, toByte, hex);
const hex2 = map(u8, toByte2, manyN(2, hex));
const rgb1 = map(Rgb, toStruct(Rgb), manyN(3, hex1));
const rgb2 = map(Rgb, toStruct(Rgb), manyN(3, hex2));
const rgb = combine(.{
    ascii.char('#'),
    oneOf(.{
        rgb2,
        rgb1,
    }),
});

pub fn main() anyerror!void {
    const loops = 100_000_000;

    var n: usize = 0;

    const allocator = std.heap.page_allocator;
    const start = std.time.nanoTimestamp();

    while (n < loops) : (n += 1) {
        const a = rgb("#aabbcc").?.value;
        // const a = (try rgb(allocator, "#aabbcc")).value;
        std.math.doNotOptimizeAway(a);
    }
    const end = std.time.nanoTimestamp();
    const stdout = std.io.getStdOut().writer();

    try stdout.print("{} ns\n", .{(end - start)});
}

$ zig build-exe -O ReleaseFast benchmark.zig

My results: 64 ns master branch 89 ns this PR

Hejsil commented 3 years ago

Something is wrong with this benchmark. Played around with it a little bit, incrementing the loops counter. At 10_000_000_000_000_000_000 the program still finishes instantly (0.604s), while at 20_000_000_000_000_000_000 the program does not finish within a time-frame I am patient enough to wait for (waited for 1m45,430s).

Even if the benchmark is correct, doing different runs gives different results which are +-50ns. That is very inconsistent.

erooke commented 3 years ago

I don't think this benchmark shows what you want it too. Running it multiple times shows that whatever it is doing is within the noise I would expect on a machine.

Alloc
-----
1: 51 ns
2: 90 ns
3: 90 ns
4: 91 ns
5: 90 ns

Master
------
1: 71 ns
2: 40 ns
3: 90 ns
4: 90 ns
5: 90 ns

edit: my bad didn't see Hejsil's response. They seem to have covered this already :sweat_smile:

Hejsil commented 3 years ago

Update. Ok, got the benchmark to behave.

const std = @import("std");

usingnamespace @import("mecha.zig");

const Rgb = struct {
    r: u8,
    g: u8,
    b: u8,
};

fn toByte(v: u8) u8 {
    return v * 0x10 + v;
}

fn toByte2(v: [2]u8) u8 {
    return v[0] * 0x10 + v[1];
}

const hex = convert(u8, toInt(u8, 16), asStr(ascii.digit(16)));
const hex1 = map(u8, toByte, hex);
const hex2 = map(u8, toByte2, manyN(2, hex));
const rgb1 = map(Rgb, toStruct(Rgb), manyN(3, hex1));
const rgb2 = map(Rgb, toStruct(Rgb), manyN(3, hex2));
const rgb = combine(.{
    ascii.char('#'),
    oneOf(.{
        rgb2,
        rgb1,
    }),
});

pub fn main() !void {
    var loops: usize = 20_000_000_000;
    const loop_ptr: *volatile usize = &loops;

    var n: usize = 0;

    const allocator = std.heap.page_allocator;
    const start = std.time.nanoTimestamp();

    while (n < loop_ptr.*) : (n += 1) {
        const a = rgb("#aabbcc").?.value;
        // const a = (try rgb(allocator, "#aabbcc")).value;
        std.math.doNotOptimizeAway(a);
    }
    const end = std.time.nanoTimestamp();
    const stdout = std.io.getStdOut().writer();

    try stdout.print("{} ns\n", .{(end - start)});
}
Master: 9274738222 ns
Branch: 4647510447 ns

I wouldn't claim that this branch is actually 2x faster than master, but instead I will say that this benchmark is not good enough to get a real idea as to the performance of mecha.

data-man commented 3 years ago

Some thoughts:

  1. Don't use errors. It has a cost. Returns null for allocations errors.

  2. const use_allocs = @hasDecl(root, "mecha_uses_allocators");

    or

  3. change inner funcs to

    fn func(params: anytype) ?Res { //or other optional type
    if (meta.trait.isTuple(params) and ...) {//checks if params[0] is allocator
       // code with allocator
       // use params[1] as input
    } else {
       // code without allocator
    }
    }

    Not sure that it will work, just PoC. :)

Hejsil commented 3 years ago
1. Don't use errors. It has a cost. Returns `null` for allocations errors.

I would very much like it if you elaborated and provided some clear evidence that they have a "cost" (I'm not even sure what kind of cost you are talking about, but I'm assuming you mean runtime cost). Both Error!Result(usize) and ?Result(usize) have the same size on my system, so the same amount of data is being copied around. Try it for yourself:

pub fn Result(comptime T: type) type {
    return struct { v: T, r: []const u8 };
}

test "" {
    @compileLog(@sizeOf(?Result(usize)));
    @compileLog(@sizeOf(anyerror!Result(usize)));
}
| 32
| 32
./test.zig:6:5: error: found compile log statement
    @compileLog(@sizeOf(?Result(usize)));
    ^
./test.zig:7:5: error: found compile log statement
    @compileLog(@sizeOf(anyerror!Result(usize)));

Looking at the assembly of branching on error vs optional is a little hard to compare, but I made a godbolt link here so you can play around with this too. In my simple example, branching on error produces less code (but we are at the mercy of the optimizer here). Hard to say if there is an actual cost, but in theory try vs orelse return null should produce virtually identical code.

I'm gonna merge this. If you can provide a clear benchmark of a complicate parser that shows clearly, without question, that this change has degraded the performance of mecha then I'm willing to consider reverting this. As of now, there is no clear evidence for this.