arithy / packcc

A parser generator for C
Other
347 stars 28 forks source link

How to do good syntax error handling? #76

Closed joagre closed 5 months ago

joagre commented 7 months ago

Hi, I can use special rules to catch common errors and point out which row they occur on. I keep track of rows and store it in auxil:

_ <- (WS / Comments)*
__ <- (WS / Comments)+
WS <- [ \t\r\n] {
    if ($0[0] == '\n') {
        auxil->row++;
    }
}
Comments <- SingleLineComment / BlockComment
SingleLineComment <- "//" (!EOL .)* EOL?
EOL <- ("\r\n" / "\n" / "\r") { auxil->row++; }
BlockComment <- "/*" (BlockCommentContent / EOL)* "*/"
BlockCommentContent <- (!("*/" / EOL) .)

I can then use a special rule to catch a common error, e.g.

Block <- e:Expr { $$ = CN(BLOCK, 1, e); } ( _ CommaSeparator _ e:Expr { AC($$, e); })*
CommaSeparator <- ("," / ";") {
    if (strcmp($0, ";") == 0) {
        fprintf(stderr, "%d: Use ',' to separate expressions in blocks", auxil->row);
    }
}

But with unexpected syntax errors everything breaks down and I cannot point out which row the error occured on.

As a workaround I added the following:

    static int ROW = 1;

    static int satie_getchar(satie_auxil_t* _auxil) {
        int c = getchar();
        if (c == '\n') {
            ROW++;
        }
        return c;
    }

    static void satie_error(satie_auxil_t* auxil) {
        panic("Syntax error near line %d", ROW);
    }

It works and I have re-invented awk-like error handling. :-) It's crude though.

Ideally I would like to point out syntax errors very precisely with both row and column info.

I haven't been able to figure out how to do that? Any hints?

Cheers /Joakim

arithy commented 6 months ago

The example TinyC might be helpful to find the solution for precise counting rows and columns. It uses the customized macro PCC_GETCHAR() with the text reader function system__read_source_file(). In this function, line break positions in bytes are recorded by calling append_line_head_() while fetching byte characters from an input text. The parsing positions in the input text can be detected using the predefined variables $0s and $0e (see README.md). The row number and the column number are computed in the function compute_line_and_column_() using line break positions and the parsing position. If not supporting multibyte characters, the code below

count_characters_(obj->source.text.p, obj->source.line.p[i - 1], pos) + 1

can be simplified with

pos - obj->source.line.p[i - 1] + 1

Unless considering multibyte characters, the input text needn't be memorized as the example does. Regarding error reporting, the example does it like this using system__handle_syntax_error().

arithy commented 5 months ago

@joagre , I'm wondering if my answer was what you wanted. If not so, let me know it. I'll close this issue in a week if no reply. Feel free to reopen it when you need.