colis-anr / morbig

A static parser for POSIX Shell
Other
190 stars 8 forks source link

Rework handling of assignment words #176

Closed Niols closed 9 months ago

Niols commented 1 year ago

This pull request reworks the way assignment words are handled in Morbig. It is to be noted that Morbig was correctly promoting parsing tokens from WORD to ASSIGNMENT_WORD and thus yielding reasonable CSTs. What this PR fixes is only the processing of assignment words in word CSTs only.

The most important change is the disappearance of the WordAssignmentWord in word_cst. I think this constructor made very little sense in itself because assignment words were already handled in the CST as CmdPrefix_AssignmentWord constructors carrying a name and a word. There were probably two reasons why this word CST constructor existed:

  1. Because it might be practical for client libraries (eg. static analysers) to already have access to assignment words in other places (eg. as arguments of alias or make). I think this is a bad argument: in the semantics of Shell, those words are not assignment words, it is only the utilities that decide to see them as such, and they should do so in the way that is practical for them. We can offer a helper to do that nicely in word CSTs.

  2. As an intermediary state to store words that could be assignment words before reaching the proper assignment words recognition. I think that makes sense but (a) it clutters the output type with an extra useless constructor, (b) Morbig actually leaves quite a lot of those in its output, and (c) it breaks tilde prefixes recognition that was based on the presence of such constructors.

I think it is better to get rid of it altogether. This PR therefore removes the WordAssignmentWord constructor of word_cst, rewrites Assignment.recognize_assignment to not rely on it, and gets rid of PrelexerState.recognize_assignment as well.

Additionally, I took this opportunity to fix the tilde prefixes recognition mechanism. The main problem was that tilde prefixes recognition changes depending on whether we are in an assignment word, in which case it splits the word on colon characters and recognises tilde prefixes in all the components. This was being overzealous. This recognition is now made in two steps: during prelexing, only standard tilde prefixes (at the beginning of words) are recognised. During parsing, when yielding a CmdPrefix_AssignmentWords, tilde prefixes are recognised again, this time fully.

I updated the expectation files and checked them one by one. I am happy with the result. A lot of them actually have to do with tests on alias and are not very interesting when it comes to parsing assignment words. There are two main interesting test files:

@yurug WDYT?

Niols commented 1 year ago

CI is failing on the Windows runner but that seems to be for unrelated reasons. Other PRs have it failing as well (see https://github.com/colis-anr/morbig/pull/177).

Niols commented 10 months ago

@yurug Any chance you might look at this?