Closed msangel closed 3 years ago
This bug is more complicated than it seems so.
Problem is: Invalid is a fact of whitespace character in resource name for Jekyll, but the rest of possible items there is a valid one. A very simple example:
{% include some.file %}
What is some.file
? according to existed lexer rules, this is an expression. But that's not true. This should be a filename. We don't have a lexeme for a filename but we can add one. And it will be parsable, but then it will break parsing of all the expressions that look like filename: {% for item in site.pages %}
. So there is not a proper solution.
We can go further and create a new lexer mode depending on lexer flavor:
IncludeResourceStart : 'include' WhitespaceChar+ {!isLekyll}? -> pushMode(IN_INCLUDE_JEKYLL_RESOURCE) ;
...
mode IN_INCLUDE_JEKYLL_RESOURCE;
IncludeEnd : '%}' -> popMode, popMode;
OutStart3 : '{{' -> pushMode(IN_TAG);
ExitIncludeResource : WhitespaceChar+ -> popMode;
IncludeResource: .+?;
Then it will be tokenized properly and so we can do safe parsing:
include_tag
: {isLiquid()}? tagStart liquid=Include expr (With Str)? TagEnd
| {isJekyll()}? tagStart jekyll=Include file_name_or_output (jekyll_include_params)* TagEnd
;
// only valid for Flavor.JEKYLL
file_name_or_output
: output #jekyll_include_output
| filename #jekyll_include_filename
;
// most important part now
filename
: IncludeResource+
;
And I believe this will be the most correct solution, but, this will increase the required changes amount dramatically and will require a lot of time to code and test it. Also, this will made testing of vocabulary with external tools more complicated because of the new semantic predicate. And yes, it also will require a lot of new code because this is only for Jekyll, and the Liquid should behave as previously, so everywhere additional checks and so on.
So if anyone will have a willingness to fix this in a proper way, you know what to do. Meanwhile, I will fix this quite simple way - I will allow anything as filename but with greed matching so it will eat as little as possible (till not reached next matching lexeme):
filename
: ( . )+?
;
And the next lexeme that will close this will be: (jekyll_include_params)* TagEnd
, so once match that, the filename will ends.
And yes, as those anything might be anything, I will not visit it, just read its interval as a text;
Interval interval = Interval.of(ctx.filename().start.getStartIndex(), ctx.filename().stop.getStopIndex());
String text = ctx.filename().start.getInputStream().getText(interval);
Like that: