colis-anr / morbig

A static parser for POSIX Shell
Other
192 stars 8 forks source link

Carriage return should not be recognized as newline. #91

Closed Niols closed 5 years ago

Niols commented 5 years ago

The standard always talks about "newline", which is:

3.243 Newline Character (<newline>)

A character that in the output stream indicates that printing should start at the beginning of the next line. It is the character designated by \n in the C language.

In Morbig, it is accepted as newline. For instance, if we consider the following script, encoded in a format that uses CRLF line terminators:

for x
do
    echo $x
done

Then Morbig succeeds (and parses what we would expect: the for loop). However, it probably shouldn't because the script is actually looking like:

for x\r
do\r
    echo $x\r
done\r

and do\r is not the right keyword (the expected one being do). Dash, Bash and Zsh all complain about a parse error near do or \n.

This seems a bit stupid to me, but if we plan on being as close to the standard as possible, then we should probably handle this correctly. I haven't yet taken a look at it, but I think it's only a modification in the prelexer.