Closed ichordev closed 10 months ago
Not directly, unless you know the delimiter at compile time:
/+dub.sdl:
dependency "pegged" version="~>0.4.9"
+/
import pegged.grammar;
import std.stdio;
mixin(grammar(`
DelimitedString(Delimiter):
String <~ :"q" :doublequote :Delimiter :eol (!(eol Delimiter) .)* eol :Delimiter :doublequote
`));
void main()
{
auto parseTree = DelimitedString!(literal!"HELLO")(`q"HELLO
This is a HELLO delimited string
HELLO"`);
writeln(parseTree);
}
Prints:
DelimitedString[0, 47]["This is a HELLO delimited string\n"]
+-DelimitedString.String[0, 47]["This is a HELLO delimited string\n"]
If you don't know the delimiter at compile time, it would be possible to use a semantic action to ensure that the Delimiter
and MatchingDelimiter
are equal, but since delimited strings may have unmatched double quotes in them, there would be no way to know when to stop consuming input.
But I think it is possible to write a parser for delimited strings by hand (or steal it from the DMD lexer) and use that in a grammar like I use the predefined parsers doublequote
and eol
above. See also grammar composition. I have never tried that, though.
It seems there is a Wiki page for that already: https://github.com/PhilippeSigaud/Pegged/wiki/User-Defined-Parsers
So the answer is YES!
A working example: [EDITED for offset error]
/+dub.sdl:
dependency "pegged" version="~>0.4.9"
+/
import pegged.grammar;
import std.stdio, std.algorithm;
void main()
{
auto parseTree1 = Strings(`q"(foo(xxx))"`);
writeln(parseTree1);
auto parseTree2 = Strings(`q"/foo]/"d`);
writeln(parseTree2);
auto parseTree3 = Strings(`q"MESSAGE
The line-break before this message is
consumed, but the one after it is not.
MESSAGE"`);
writeln(parseTree3);
auto parseTree4 = Strings(`q"HELLO
This is a HELLO delimited string.
Double quotes " may be unbalanced!
HELLO"`);
writeln(parseTree4);
}
mixin(grammar(`
Strings:
String <- ( DelimitedString('(', ')')
/ DelimitedString('[', ']')
/ DelimitedString('{', '}')
/ DelimitedString('<', '>')
/ delimitedString ) StringPostfix?
DelimitedString(Delimiter, MatchingDelimiter) <~ :"q" :doublequote :Delimiter
DelimitedCharacters(Delimiter, MatchingDelimiter)*
:MatchingDelimiter :doublequote
DelimitedCharacters(Delimiter, MatchingDelimiter) <- Delimiter DelimitedCharacters(Delimiter, MatchingDelimiter)* MatchingDelimiter
/ WysiwygCharacter(MatchingDelimiter) DelimitedCharacters(Delimiter, MatchingDelimiter)
/ WysiwygCharacter(MatchingDelimiter)
WysiwygCharacter(MatchingDelimiter) <- !MatchingDelimiter .
StringPostfix <- "c" / "w" / "d"
`));
// Our user defined parser handles run-time delimiters:
@safe ParseTree delimitedString(ParseTree p) pure nothrow
{
if (p.end + 3 < p.input.length &&
p.input[p.end..p.end+2] == `q"`)
{
try
{
if (!or!(charRange!('a','z'),charRange!('A','Z'))(p.input[p.end+2..p.end+3]).successful)
{ // q"/foo]/"
const delim = p.input[p.end+2..p.end+3];
auto end = p.input[p.end+3 .. $].countUntil(delim);
if (end < 0) goto fail;
end += p.end+3;
return ParseTree("delimitedString",
true,
[p.input[p.end+3 .. end]],
p.input,
p.end,
end + 2);
}
else
{ // heredoc
auto delimEnd = p.input[p.end..$].countUntil('\n');
if (delimEnd < 3) goto fail;
delimEnd += p.end;
const delim = p.input[p.end+2 .. delimEnd];
auto matchingDelimStart = p.input[delimEnd..$].countUntil('\n' ~ delim);
if (matchingDelimStart < 0) goto fail;
matchingDelimStart += 1 + delimEnd;
return ParseTree("delimitedString",
true,
[p.input[delimEnd+1..matchingDelimStart]],
p.input,
p.end,
matchingDelimStart + delim.length + 1);
}
} catch (Exception) goto fail;
}
fail:
return ParseTree("delimitedString", false, ["delimited string"], p.input, p.end, p.end);
}
@safe ParseTree delimitedString(string input) pure nothrow
{
return delimitedString(ParseTree("", false, [], input));
}
@safe string delimitedString(GetName g) pure nothrow
{
return "delimitedString";
}
Prints
Strings[0, 13]["foo(xxx)"]
+-Strings.String[0, 13]["foo(xxx)"]
+-Strings.DelimitedString!(literal!("("), literal!(")"))[0, 13]["foo(xxx)"]
Strings[0, 10]["foo]", "d"]
+-Strings.String[0, 10]["foo]", "d"]
+-Strings.StringPostfix[9, 10]["d"]
Strings[0, 96]["The line-break before this message is\nconsumed, but the one after it is not.\n"]
+-Strings.String[0, 96]["The line-break before this message is\nconsumed, but the one after it is not.\n"]
Strings[0, 84]["This is a HELLO delimited string.\nDouble quotes \" may be unbalanced!\n"]
+-Strings.String[0, 84]["This is a HELLO delimited string.\nDouble quotes \" may be unbalanced!\n"]
So the answer is YES!
A working example: [EDITED for offset error] [...]
This is very helpful, thank you! I should be able to use your example there as a starting point for how to read the ParseTree
with UDFs. :)
D has delimited strings, which can start and end with an arbitrary identifier:
Is it possible to parse something like this in Pegged? And if so, how?
Keep in mind that the string only ends if the identifier matches the one at the start of the string: