This pull request reworks the way assignment words are handled in Morbig. It is to be noted that Morbig was correctly promoting parsing tokens from WORD to ASSIGNMENT_WORD and thus yielding reasonable CSTs. What this PR fixes is only the processing of assignment words in word CSTs only.
The most important change is the disappearance of the WordAssignmentWord in word_cst. I think this constructor made very little sense in itself because assignment words were already handled in the CST as CmdPrefix_AssignmentWord constructors carrying a name and a word. There were probably two reasons why this word CST constructor existed:
Because it might be practical for client libraries (eg. static analysers) to already have access to assignment words in other places (eg. as arguments of alias or make). I think this is a bad argument: in the semantics of Shell, those words are not assignment words, it is only the utilities that decide to see them as such, and they should do so in the way that is practical for them. We can offer a helper to do that nicely in word CSTs.
As an intermediary state to store words that could be assignment words before reaching the proper assignment words recognition. I think that makes sense but (a) it clutters the output type with an extra useless constructor, (b) Morbig actually leaves quite a lot of those in its output, and (c) it breaks tilde prefixes recognition that was based on the presence of such constructors.
I think it is better to get rid of it altogether. This PR therefore removes the WordAssignmentWord constructor of word_cst, rewrites Assignment.recognize_assignment to not rely on it, and gets rid of PrelexerState.recognize_assignment as well.
Additionally, I took this opportunity to fix the tilde prefixes recognition mechanism. The main problem was that tilde prefixes recognition changes depending on whether we are in an assignment word, in which case it splits the word on colon characters and recognises tilde prefixes in all the components. This was being overzealous. This recognition is now made in two steps: during prelexing, only standard tilde prefixes (at the beginning of words) are recognised. During parsing, when yielding a CmdPrefix_AssignmentWords, tilde prefixes are recognised again, this time fully.
I updated the expectation files and checked them one by one. I am happy with the result. A lot of them actually have to do with tests on alias and are not very interesting when it comes to parsing assignment words. There are two main interesting test files:
are correctly handled as containing no tilde prefixes.
tests/golden/good/2.9-shell-commands/2.9.1-simple-commands/assignment-words.t/input.sh, funnily enough, explains the subtelty handled in this PR:
CC=gcc make
make CC=cc
ln -s /bin/ls "X=1"
"./X"=1 echo
# On line 1, the word CC=gcc is recognized as a word assignment of gcc
# to CC because CC is a valid name for a variable, and because CC=gcc
# is written just before the command name of the simple command
# make. On line 2, the word CC=cc is not promoted to a word assignment
# because it appears after the command name of a simple command. On
# line 4, since "./X" is not a valid name for a shell variable, the
# word "./X=1" is not promoted to a word assignment and is interpreted
# as the command name of a simple command.
However, Morbig, until now, was precisely recognising CC=cc as an assignment words. Not anymore.
This pull request reworks the way assignment words are handled in Morbig. It is to be noted that Morbig was correctly promoting parsing tokens from WORD to ASSIGNMENT_WORD and thus yielding reasonable CSTs. What this PR fixes is only the processing of assignment words in word CSTs only.
The most important change is the disappearance of the
WordAssignmentWord
inword_cst
. I think this constructor made very little sense in itself because assignment words were already handled in the CST asCmdPrefix_AssignmentWord
constructors carrying a name and a word. There were probably two reasons why this word CST constructor existed:Because it might be practical for client libraries (eg. static analysers) to already have access to assignment words in other places (eg. as arguments of
alias
ormake
). I think this is a bad argument: in the semantics of Shell, those words are not assignment words, it is only the utilities that decide to see them as such, and they should do so in the way that is practical for them. We can offer a helper to do that nicely in word CSTs.As an intermediary state to store words that could be assignment words before reaching the proper assignment words recognition. I think that makes sense but (a) it clutters the output type with an extra useless constructor, (b) Morbig actually leaves quite a lot of those in its output, and (c) it breaks tilde prefixes recognition that was based on the presence of such constructors.
I think it is better to get rid of it altogether. This PR therefore removes the
WordAssignmentWord
constructor ofword_cst
, rewritesAssignment.recognize_assignment
to not rely on it, and gets rid ofPrelexerState.recognize_assignment
as well.Additionally, I took this opportunity to fix the tilde prefixes recognition mechanism. The main problem was that tilde prefixes recognition changes depending on whether we are in an assignment word, in which case it splits the word on colon characters and recognises tilde prefixes in all the components. This was being overzealous. This recognition is now made in two steps: during prelexing, only standard tilde prefixes (at the beginning of words) are recognised. During parsing, when yielding a
CmdPrefix_AssignmentWords
, tilde prefixes are recognised again, this time fully.I updated the expectation files and checked them one by one. I am happy with the result. A lot of them actually have to do with tests on alias and are not very interesting when it comes to parsing assignment words. There are two main interesting test files:
good/2.6-word-expansions/2.6.1-tilde-prefix/login.t
is the one discussed in https://github.com/colis-anr/morbig/pull/164#issuecomment-1539311366 which led to writing https://github.com/colis-anr/morbig/issues/165. After this change, the last lines:are correctly handled as containing no tilde prefixes.
tests/golden/good/2.9-shell-commands/2.9.1-simple-commands/assignment-words.t/input.sh
, funnily enough, explains the subtelty handled in this PR:However, Morbig, until now, was precisely recognising
CC=cc
as an assignment words. Not anymore.@yurug WDYT?