Allow mecha to allocate memory

Hejsil commented 3 years ago

As seen in #20 it is pretty hard to make a parser that parses an unknown number of items with the current API. The best we can do right not is parse twice, once for validation and once for getting each result one by one in an iterating manner. What we really want here is the ability to allocate memory from mecha. I think we can even allow a custom context to be passed around as well:

Make all parsers take both a []const u8 and a *mem.Allocator.
- This will allow many and manyRange to allocate their result instead of just returning the parsed string.
- Also pass this allocator to the converter of convert so that it can allocate.
All parsers will now return mecha.Error!Result instead of ?Result:
```
const Error = error{ OutOfMemory, ParserFailed, OtherError };
```
- Mecha will catch ParserFailed but will let the other two errors bubble up as mecha cannot handle these.
If the users needs some sort of context struct to store or look up something, this can be done with @fieldParentPtr
```
const Context = struct {
arena: ArenaAllocator,
};
```

fn parser(a: *mem.Allocator, str: []const u8) ?Result { // We cannot go directly from a to Context when Context is not the implementer // allocator. We can take the double parent pointer approach. const area = @fieldParentPtr(ArenaAllocator, "allocator", a); const context = @fieldParentPtr(Context, "arena", arena); }


* `mecha` will not free any memory on its own. It is the users responsibility to use an allocator that can keep track of and free the memory it allocates (like `ArenaAllocator`).

erooke commented 3 years ago

After #20 I started to play around with making mecha take memory allocations. I agree with parse failures being errors and not attempting to get mecha to manage memory. I have a couple of questions:

Should mecha attempt to enforce the use of arena allocator? In my experiments with adding memory allocation it became apparent that it would be basically impossible to clean up the memory by hand as a parse failure halfway through the chain would give you nothing to work with.
What is the point of OtherError in the proposed error set?
Should all parsers, regardless of if they allocate or not, take an allocator?

In my experiments I thought it would nice/interesting if as a user knew mecha was going to attempt to allocate memory. To that end I defined two types of parsers:

/// The type of all parsers that can work with `mecha`
pub fn Parser(comptime T: type) type {
    return fn (void, []const u8) ParseError!Result(T);
}

/// A parser which works with `mecha` and is allowed to allocate
pub fn ParserAlloc(comptime T: type) type {
    return fn (*mem.Allocator, []const u8) ParseAllocError!Result(T);
}

With the two associated error sets

pub const ParseError = error{ParseFailed};
pub const ParseAllocError = ParseError || mem.Allocator.Error;

I was able to get it working with not too much headache over on ~branch which no longer exists~ Not sure if others think that its worth the extra complexity to allow for such behavior however.

Hejsil commented 3 years ago

* Should mecha attempt to enforce the use of arena allocator? In my experiments with adding memory allocation it became apparent that it would be basically impossible to clean up the memory by hand as a parse failure halfway through the chain would give you nothing to work with.

There are many allocation schemes that are able to clean up the entire "pool", such as ArenaAllocator and FixedBufferAllocator. One could even make an allocator that just keeps track of a list of allocations, so that they can all be free at once if desired. I think it is best to just take an *mem.Allocator and let the user pick the best allocator for their needs.

* What is the point of `OtherError` in the proposed error set?

For conversions where the user of mecha wants to return an error which is not OutOfMemory. They could then look somethink up in their Context to figure out more details about their own specific error.

* Should all parsers, regardless of if they allocate or not, take an allocator?

I think the library should always take an allocator to keep the whole library simple. If the user wonna make sure the parser never allocates, they can pass a FailingAllocator and catch |err| switch (err) { error.OutOfMemory => unreachable }. I don't think this is unreasonable.

Hejsil commented 3 years ago

Let's leave this open until we have a version of many that collects the results. I have an API in mind that only has one many function that takes a set of options.

Hejsil / mecha

Allow mecha to allocate memory #21