colis-anr / morbig

A static parser for POSIX Shell
Other
190 stars 8 forks source link

Cleanup and document word_cst? #56

Closed Niols closed 5 years ago

Niols commented 5 years ago

I'm using the word_csts in an other project (Morsmall / CoLiS-language) and I'm getting quite confused while using them:

https://github.com/colis-anr/morbig/blob/2faccf3fa2c0e57c5f7f62a6af7e16e41baa038e/src/CST.ml#L322-L335

In particular:

  1. What are the constructs that can appear in the word of a WordDoubleQuoted?

  2. Same question for a WordSingleQuoted. In particular, why not have simple WordSingleQuoted of string? I guess it has to do with 3.

  3. I'm still not convinced by these WordName and WordAssignmentWord constructors that are sub-cases of WordLiteral. Wouldn't that be more practical to have only WordLiteral and helpers like is_name : string -> bool and to_assignment_word : string -> name * word?

  4. What kind of constructs can one find in a WordAssignmentWord that occurs inside a WordDoubleQuoted (because I guess it can happen)? And inside a WordSingleQuoted?

  5. I assume that WordEmpty is the same thing as WordLiteral ''?

  6. Do we still need WordOther?

Also, I think we should document all this. I can take care of that, we just need to decide where.

Niols commented 5 years ago

About 5., I actually think we never need WordEmpty. The empty word exists and is the empty list of word components. The "empty word component" does not make much sense imho.

yurug commented 5 years ago
Niols commented 5 years ago

OK for 1., 2., 4. and 6.

About 3.

How does that depend on the parsing context? imho it's more a semantic thing.

If I write echo foo bar=baz 123, Morbig sees two names (echo and foo), one assignment word (bar=baz) and one literal (123). If we were to write an interpreter, we would in fact handle the three words foo, bar=baz and 123 as literals exactly in the same way. Whether echo wants them to be names, literals or assignments would not really be our concern.

Also, sometimes, there are words that cannot (and shouldn't) be parsed as assignment. But after the expansion, a utility can decide that one or several of its arguments should be assignments. Here is an example:

$ A=B
$ readonly $A=C
$ echo $A
B
$ echo $B
C

$A=C isn't an assignment word. It gets expansed to B=C and given to the readonly special built-in. This built-in then looks at its first argument and sets B to be read-only and to have value C. Hence the next two lines. But this decision to see B=C as an assignment word comes from the built-in and might not be true for others (echo $A=C would just print B=C).

By the way, when playing with examples like that, I noticed that Morbig parses A=B=C as the literal A= followed by the assignment word B=C, which is a bit surprising imho.

A=B is a word. But if I write readonly A=B, readonly will receive A=B as an argument and then decide that it wants it to be "assignment-word-like".

About 5.

I think I get it. Am I right if I say that for any word CST w, w and List.filter ((<>) WordEmpty) w are equivalent (that is they represent the same word)? But then, why do we need WordEmpty?