LangProc / langproc-2017-lab

4 stars 8 forks source link

Some underspecified parts in 1-regexes #55

Open VasiliosRallis opened 6 years ago

VasiliosRallis commented 6 years ago
  1. What should happen when the denominator of a fraction is 0?

  2. Can the denominator of a fraction have a - sign after the / sign to indicate that the denominator is negative?

  3. What should happen if we detect an "illegal sequence" (basically if [ ] spans a new line)

m8pple commented 6 years ago

1 - What should happen when the denominator of a fraction is 0?

This is a good edge-case, as it comes down to whether it is banned as a syntactic element, or whether it is given a well-defined semantic interpretation. In the first alternative we could say that a fraction cannot contain an all-zero denominator, which means that a sequence like 4/0 x would have to be interpreted as:

The second alternative would be to come up with a well-defined value for 4/0, which we could quite reasonably do. Two reasonable semantics would be:

My intention was to take the first approach, as it is a really interesting example of banning semantically problematic things at the lexical level, so that they don't have to be given an interpretation. However, I seem to have left out the sentence that stated it.

As a consequence, I'll resolve it simply by saying that no fractions with zero denominators will be encountered, as that means maximum compatibility with what people have already done.

m8pple commented 6 years ago

2 - Can the denominator of a fraction have a - sign after the / sign to indicate that the denominator is negative?

No, or at least not in the language as I'm defining it.

The idea here is that a fraction is a primitive literal or constant in this language, so it will form a single token in any later parsing stage. Given that interpretation it wouldn't make sense to have a sign on the denominator, as the sign would apply to the whole fraction literal.

You may be thinking of things in C, where you might have:

5/-6

However, there is some interesting behavior here, as that is actually four tokens in C: -5 : Number

There is no such thing as a negative literal in C, so it is actually an expression the derivation (parse-tree) of 5/-6 would be:

Division[
    Constant[5] ,
    UnaryMinus[
       Constant[5]
    ]
]

But in this language an entire fraction can be positive or negative, though not the denominator. So something like 4/-6 should not turn into a fraction token, it would be two integer tokens.

m8pple commented 6 years ago

3 - What should happen if we detect an "illegal sequence" (basically if [ ] spans a new line)

This is a natural view of what the lexer is doing, but it is not quite how the lexer works. You generally don't detect bad things happening; instead you find out when nothing good was possible.

So in our language, if there is an opening [ character, we have only potentially got a word token. If a new-line character is encountered before the closing ], then it just wasn't a word, but it may still have other interpretations.

For example, the input:

x [g 4
10 ]

Should be lexed as: 1 - Word : x 2 - Word : g 3 - Number : 4 4 - Number : 10

keerthanenr commented 6 years ago

The examples given for the output in 1-regexes are from least to most. Would it be possible to update these to reflect the new specification?

m8pple commented 6 years ago

Yes - thanks for pointing it out, and sorry again for not noticing this one.

(Embedded examples have long been a problem for courseworks, as they are so difficult to keep in sync with the spec and actual code. I really like the approach of python documentation, as often people embed example code and test-cases in the comments for python classes, and then when you produce the documentation it actually runs them and inserts the real output. Something I aspire to, eventually)