arithy / packcc

A parser generator for C
Other
347 stars 28 forks source link

Actions that runs before the end of parsing. #38

Closed DriNeo closed 3 years ago

DriNeo commented 3 years ago

I want to count the lines and columns in order to display better error messages. But when a syntax error occurs the line/col count doesn't happens because actions doesn't run. The older "peg/leg" tool have expression predicate that runs at parse time, I don't found a similar solution in packcc.

I can create many rules that try to handle all the possible errors, so the parsing is always successful. It's tedious and it's so sad to not use the errors actions " ~{} ".

My question is probably very noob because counting line/col is pretty common, so If you know the regular packcc solution fell free to answer .

arithy commented 3 years ago

I have enabled execution of actions in predicates. Can you check if the new behavior is what you expect, using the latest commit?

DriNeo commented 3 years ago

I compiled the master now but I don't understand how to make working the actions in predicates. Here my test.

%source {
    #include <stdio.h>
    int column = 1;
    int line = 1;
    int countAtlastEol = 0;
}

main <- 
    _ string _

string <- 
    '"' 
        ~{printf("missing opening quote at %i:%i\n", line, column);} 
    (!'"' !EOL .)* '"'

_ <- 
    ([ \t]? EOL?)* 
        {column = $0e - countAtlastEol;}

EOL <- 
    ( '\n'
    /'\r\n' 
    / '\r') 
        {line += 1; column = 1; countAtlastEol = $0e;}

%%

int main(void)
{
    pcc_context_t* ctx = pcc_create(NULL);
    while(pcc_parse(ctx, NULL));
    pcc_destroy(ctx);
    return 0;
}

The test input is

     hello"

In this case the actions doesn't work except the error action. Sorry for my lack of knowledge.

arithy commented 3 years ago

My question is probably very noob because counting line/col is pretty common, so If you know the regular packcc solution fell free to answer .

Sorry, I didn't answer this question. Please forget the predicate functionality for this issue.

As for line and column number counting, your example is not suitable because actions may not be executed in your expected order. You had better count them by looking over the original input text from the beginning to the indicated position. To support UTF-8 multibyte characters, you have to take care of character counting because the number of characters is not always equal to the number of bytes.

Is this the answer you're looking for? Feel free to ask more questions if not.

DriNeo commented 3 years ago

You gave me interesting information. But I don't know where the character counting should take place if it is not possible in the parser actions. This is my main question.

arithy commented 3 years ago

I have found a bug regarding $0s and $1s. It is fixed at the latest commit.

I show below an example code that displays an error with a line number and a column number.

%source {
#include <stdio.h>
#include <stdlib.h>

#define PCC_GETCHAR(auxil) MySystem_readCharacter(auxil)

typedef struct MySystem_tag {
    FILE *input;
    struct {
        char *buf;
        size_t max;
        size_t len;
    } text;
    struct {
        size_t *buf;
        size_t max;
        size_t len;
    } line;
} MySystem;

static void MySystem_appendCharacter(MySystem *obj, char c) {
    if (obj->text.max <= obj->text.len) {
        size_t m = (obj->text.max > 0) ? obj->text.max * 2 : 256;
        if (m == 0) m = obj->text.len; /* in case of multiplication overflow */
        char *const p = (char *)realloc(obj->text.buf, m);
        if (p == NULL) {
            fprintf(stderr, "out of memory\n");
            exit(1);
        }
        obj->text.buf = p;
        obj->text.max = m;
    }
    obj->text.buf[obj->text.len++] = c;
}

static void MySystem_appendLineHead(MySystem *obj, size_t h) {
    if (obj->line.max <= obj->line.len) {
        size_t m = (obj->line.max > 0) ? obj->line.max * 2 : 256;
        if (m == 0) m = obj->line.len; /* in case of multiplication overflow */
        size_t *const p = (size_t *)realloc(obj->line.buf, m);
        if (p == NULL) {
            fprintf(stderr, "out of memory\n");
            exit(1);
        }
        obj->line.buf = p;
        obj->line.max = m;
    }
    obj->line.buf[obj->line.len++] = h;
}

static void MySystem_initialize(MySystem *obj) {
    obj->input = stdin;
    obj->text.buf = NULL;
    obj->text.max = 0;
    obj->text.len = 0;
    obj->line.buf = NULL;
    obj->line.max = 0;
    obj->line.len = 0;
    MySystem_appendLineHead(obj, 0);
}

static void MySystem_finalize(MySystem *obj) {
    free(obj->text.buf);
    free(obj->line.buf);
}

static int MySystem_readCharacter(MySystem *obj) {
    const int c = fgetc(obj->input);
    if (c != EOF) {
        MySystem_appendCharacter(obj, (char)c);
        if (c == '\r') {
            MySystem_appendLineHead(obj, obj->text.len);
        }
        else if (c == '\n') {
            if (obj->text.len >= 2 && obj->text.buf[obj->text.len - 2] == '\r') {
                obj->line.buf[obj->line.len - 1] = obj->text.len;
            }
            else {
                MySystem_appendLineHead(obj, obj->text.len);
            }
        }
    }
    return c;
}

static size_t countCharacters(const char *buf, size_t start, size_t end) {
    /* TODO: UTF-8 multibyte character support */
    return end - start;
}

static void MySystem_computeLineAndColumn(MySystem *obj, size_t pos, size_t *line, size_t *column) {
    size_t i;
    for (i = 1; i < obj->line.len; i++) {
        if (pos < obj->line.buf[i]) break;
    }
    if (line) *line = i;
    if (column) *column = countCharacters(obj->text.buf, obj->line.buf[i - 1], pos) + 1;
}
}

main
   <- _ string _

string
   <- '"'
   ~{
        size_t line, column;
        MySystem_computeLineAndColumn(auxil, $0s, &line, &column);
        printf("missing opening quote at %zu:%zu\n", line, column);
    }
      (!'"' !EOL .)*
      '"'
   ~{
        size_t line, column;
        MySystem_computeLineAndColumn(auxil, $0e, &line, &column);
        printf("missing closing quote at %zu:%zu\n", line, column);
    }
    {
        size_t line, column;
        MySystem_computeLineAndColumn(auxil, $0s, &line, &column);
        printf("OK: string %s at %zu:%zu\n", $0, line, column);
    }

_ <-
    ([ \t] / EOL)*

EOL <-
    ( '\n'
    /'\r\n'
    / '\r')

%%

int main(void)
{
    MySystem aux;
    MySystem_initialize(&aux);
    pcc_context_t *ctx = pcc_create(&aux);
    while (pcc_parse(ctx, NULL));
    pcc_destroy(ctx);
    MySystem_finalize(&aux);
    return 0;
}
DriNeo commented 3 years ago

Thank you for the kind help. I missed the " PCC_GETCHAR " solution, my bad. I hope my issue hasn't taken too much of your time !

arithy commented 3 years ago

You’re welcome. It was good for me since the bug could be found and fixed.

arithy commented 3 years ago

@DriNeo, FYI. I have added a more practical example in examples/ast-tinyc. I hope it would be helpful to you.