djrieger / mjplusplus

A compiler for the MiniJava language
http://djrieger.github.io/mjplusplus/doc/doxygen/html/
6 stars 1 forks source link

Token positions incorrect in lexer output #1

Closed djrieger closed 9 years ago

djrieger commented 9 years ago

In 4a7c05c703fd507db2206ae6a1b9f38e30eeb2e6 we added a position attribute to the token type (std::pair for line and column). As of yet position output is still buggy. Sometimes line or column is incremented unexpectedly and sometimes they stay unchanged even though they actually need to be incremented.

Sample output:

1,1:    class
4,6:    identifier classic
4,14:   {
4,1:    public
5,7:    int
5,11:   identifier method
5,18:   (
5,19:   int
5,22:   identifier arg
5,26:   )
5,27:   {
5,1:    if
6,4:    (
6,6:    identifier i
6,7:    ==
6,10:   integer literal 5
6,12:   )
6,13:   {
6,1:    identifier i
7,4:    =
7,6:    integer literal 3
7,8:    ;
7,1:    }
8,1:    int
9,4:    identifier res
9,8:    =
9,10:   identifier arg
9,14:   +
9,15:   integer literal 42
9,17:   ;
9,18:   identifier res
9,22:   >>=
9,26:   integer literal 4
9,28:   ;
9,29:   return
9,36:   identifier res
9,40:   ;
9,1:    }
10,2:   }
10,4:   EOF

for this code:

/**
* A classic class
* @author Beate Best */
class classic {
public int method(int arg) {
    if (i == 5) {
        i = 3;
    }
int res = arg+42; res >>= 4; return res;
} }
Nidan commented 9 years ago

The positions printed now are the positions of the first character following the last token (usually whitespace). Updating the position when we transition from a whitespace/comment/... state to a "we found something interesting" state should fix this. Looking at the state table this is whenever we transition away from 0. Implemented that and it looks correct. Please verify.

maxvogel commented 9 years ago

Deine Begründung ist einleuchtend und die Ausgaben sehen korrekt aus. On 01.11.2014 07:09, Nidan wrote:

The positions printed now are the positions of the first character following the last token (usually whitespace). Updating the position when we transition from a whitespace/comment/... state to a "we found something interesting" state should fix this. Looking at the state table this is whenever we transition away from 0. Implemented that and it looks correct. Please verify.

— Reply to this email directly or view it on GitHub https://github.com/djrieger/mjplusplus/issues/1#issuecomment-61359393.

djrieger commented 9 years ago

For tests/blatt2.mj positions are correct except that one tab is counted as two characters, see lines 6, 7 and 8:

    if (i == 5) {
        i = 3;
    }

Positions:

6,2:    if
6,5:    (
6,6:    identifier i
6,8:    ==
6,11:   integer literal 5
6,12:   )
6,14:   {
7,3:    identifier i
7,5:    =
7,7:    integer literal 3
7,8:    ;
8,2:    }
Nidan commented 9 years ago

I disagree, line 6 starts with "if", so 'i' is the second character in that line, the same applies for the other lines. Also, if tabs were counted as two chars "6,2: if" and "7,3: identifier i" should differ by two columns instead of one, since line 7 has one additional tab.

BigPeet commented 9 years ago

I think a "tab" should be counted as white spaces. Line 6 starts with a tab (= 4 white spaces) and "if" should then be at position 6,5. Every normal editor would count it that way.

djrieger commented 9 years ago

+1 for that

Nidan commented 9 years ago

Except those that use some other value (8 spaces per tab is also common). (Personally I also use 4 spaces/tab.) gcc counts a tab as a single character and uses one space per tab when printing lines on the terminal. Whatever we use, it's guaranteed to be wrong for someone with a different terminal/editor configuration.

djrieger commented 9 years ago

That's right but I don't think this will be a big deal since we'll be printing error messages containing the line of source code where some error occurred in the future anyway. Then we can simply point to the position where the error occurred using some visual indicator as gcc does and column numbers won't matter that much.