Closed tajmone closed 3 years ago
As for the error:
*1* 155 E : Unterminated block comment. Must end with a line consisting of
at least four slashes and nothing but slashes.
It seems like the compiler considers the CR
that precedes LF
in the EOL as an additional character, making the closing delimiter invalid.
After just a brief look, I think this is caused by the regex for block comments includes a leading newline as they should only be allowed in the first column. But this does not fly in the first line in a file.
If you start the block comment after an empty line it compiles. I'll figure out a way to handle this case.
I'll investigate the CRLF problems.
This should be fixed in build 2209.
After just a brief look, I think this is caused by the regex for block comments includes a leading newline as they should only be allowed in the first column. But this does not fly in the first line in a file.
In my ALAN syntaxes I've used the following RegExs for the opening and closing delimiters: ^\/{4}.*$
and ^\/{4,}$
, relying on the line beginning anchor ^
instead of the \n
.
I'll investigate the
CRLF
problems.
If the RegEx uses \n
to match the line-end, than it might fail with CRLF
in some RegEx engines (including PCRE), for \n
might match LF
only, which would make the preceding CR
an extra char that disqualifies the closing delimiter. Using $
(or [$\n]
) should be safer.
This is in the scanner generator so not all "standard" regex symbols are supported.
I confirm that now everything is working as expected!
This is in the scanner generator so not all "standard" regex symbols are supported.
I imagined so. No idea how you worked around the lack of a ^
then.
I would have thought that the scanner would strip away the EOL sequence, or at least normalize CRLF
to LF
, since dragging around the extra CR
could potentially break up things in various places.
I confirm that now everything is working as expected!
Good! Thanks.
This is in the scanner generator so not all "standard" regex symbols are supported.
I imagined so. No idea how you worked around the lack of a
^
then.
Programming! ;-)
I would have thought that the scanner would strip away the EOL sequence, or at least normalize
CRLF
toLF
, since dragging around the extraCR
could potentially break up things in various places.
The scanner is not a text processor but a tokenizer and there are no tokens containing any newline characters so there are no CR:s to be dragged around. So all CR and LF are effectively stripped from input. Strings, which may spann lines, are stripped of them before returning them as tokens to the parser.
There are technical reasons why the file reading cannot be done in "text mode" (which otherwise automatically converts any encountered CRLF to \n
) and thus reveals the CR (which will be matched and removed).
@thoni56, I've come across some odd bugs with block comments.
Using ALAN
3.0beta8 build 2207
under Win 10, tested with both CMD and Bash for Windows (same result); source file is UTF-8 BOM, using nativeCRLF
EOL.Compiled using both
alan sample.alan
andalan -encoding utf8 sample.alan
, same results (the problem is not autodetection of UTF-8 via the BOM).The problem is due to the fact that the source file has
CRLF
line ending, if I switch to Unix styleLF
it works fine (but only if I add a blank line at the beginning).So there seem to be different problems at stake here: incorrect handling of
CRLF
EOL (even under the CMD), and a bug when a source file starts with a block comment.Below are the actual error reports, although they don't really pin-point the problem, but they might help you gain insight on what goes on behind the scenes...
Error 1
With a source file starting with this block comment:
I get the following compiler errors:
Note that the error at line 1 contains a
char, so it seemsError 2
If I add an empty line at the source start, right before the comment block:
the compiler error changes slightly: