colis-anr / morbig

A static parser for POSIX Shell
Other
190 stars 8 forks source link

Should here-document delimitation really be nesting-aware? #99

Closed Niols closed 5 years ago

Niols commented 5 years ago

About here documents, I'm wondering: should the delimiter detection really happen depending on nestings? Let's take a look at the following funny little script:

cat <<EOF
foo="$bar
EOF
cat <<\EOF
# I love `Lisp
if true; then
   a=`some stuff`
fi
EOF

It is accepted by Dash as two here documents (what the syntax coloration shows, actually). Morbig fails (luckily) for a quite subtle reason:

Although the second here document should be taken as is because its delimiter is quoted, Morbig interprets it because it didn't detect the first EOF to be the end of the first here document (because of the "). Morbig then encounters the backquote from `Lisp and starts parsing a subshell. This subshell terminates at a=` with a syntax error (the if isn't finished yet).

We could imagine subtle cases where Morbig will accept what looks like two here documents as only one. For instance:

cat <<EOF
foo="$bar
EOF
cat <<\EOF
foo="$bar
EOF

So I am starting to wonder whether the delimitation of here documents shouldn't happen first. And only after would we call the word parser. In those cases, probably, we would fail because of the unterminated " in the first here document. But this is a sensible error.

As always, I am not sure of what is really stated in the standard. What do you think?

yurug commented 5 years ago

I miss interpreted the standard. Here is the relevant paragraph AFAIK:

If no part of word is quoted, all lines of the here-document shall be expanded for parameter expansion, command substitution, and arithmetic expansion. In this case, the in the input behaves as the inside double-quotes (see Double-Quotes). However, the double-quote character ( ' )' shall not be treated specially within a here-document, except when the double-quote appears within "$()", "``", or "${}".

The words "shall not be treated specially" simply means that "double-quote" characters are like any other characters inside an here-document. I thought that it meant "treated as usual".