Liquid::Tokenizer and Liquid::C::Tokenizer differed in how they tokenized liquid tags. It was parsing to a single token per-tag, including newlines. However, Liquid::Tokenizer just used @source.split("\n") which omitted newlines and parsed to a single token per-line.
The newline character was preventing the Liquid::BlockBody::LiquidTagToken regex from matching, since . doesn't match the newline without it being a multiline regex, resulting in syntax errors when using the disable_liquid_c_nodes: true or profile: true parse option.
For example, parsing the following liquid tag with disable_liquid_c_nodes
{%- liquid
assign x = 1
assign y = x | plus: 2
echo y
-%}
would result in the following syntax error
Liquid::SyntaxError: Liquid syntax error: Unknown tag 'assign x = 1
'
Solution
Make Liquid::C::Tokenizer compatible with Liquid::Tokenizer by preserving the blank lines, but using a new token type for the C parsing code to easily ignore it.
cc @wizardlyhel who pointed out the problem to me
Problem
Liquid::Tokenizer and Liquid::C::Tokenizer differed in how they tokenized liquid tags. It was parsing to a single token per-tag, including newlines. However, Liquid::Tokenizer just used
@source.split("\n")
which omitted newlines and parsed to a single token per-line.The newline character was preventing the
Liquid::BlockBody::LiquidTagToken
regex from matching, since.
doesn't match the newline without it being a multiline regex, resulting in syntax errors when using thedisable_liquid_c_nodes: true
orprofile: true
parse option.For example, parsing the following liquid tag with
disable_liquid_c_nodes
would result in the following syntax error
Solution
Make Liquid::C::Tokenizer compatible with Liquid::Tokenizer by preserving the blank lines, but using a new token type for the C parsing code to easily ignore it.