Hejsil / mecha

A parser combinator library for Zig
MIT License
473 stars 21 forks source link

How to parse bareword literal? #35

Closed notramo closed 2 years ago

notramo commented 2 years ago

I want to parse a bareword literal:

I tried to use the following code:

const UpperCase = mecha.utf8.range('A', 'Z');
const LowerCase = mecha.utf8.range('a', 'z');

/// A widget literal starts with an upper case letter.
/// Then any number of upper-, or lowercase letters can follow.
var WidgetLiteral = mecha.combine(.{
  // Starts with uppercase.
  // It's an u21 in the result.
  UpperCase,
  // Then other chars follow.
  // It's an []u8 in the result.
  mecha.many(
    mecha.oneOf(.{
      UpperCase,
      LowerCase
    }),
    .{ .collect = true}
  )
});

The problem is that it's not easy to parse into a single []u8 or []u21, as the .combine() output gets parsed into a struct. The first, single UpperCase is an u21, but then the following chars are []u8. Is it possible somehow with a clean solution?

Hejsil commented 2 years ago

I would recommend having a look at mecha.asStr. After the child parser returns a result, asStr will return the input range that was actually parsed. This slice will point into the input and will not be allocated.

notramo commented 2 years ago

Thank you!