Njord0 / Arobase

Arobase is a simple programming language with a c-like syntax.
GNU General Public License v3.0
19 stars 0 forks source link

Proposal: Represent tokens and AST nodes via algebraic data types #1

Closed hirrolot closed 2 years ago

hirrolot commented 2 years ago

Hi! I came from your Reddit comment on Arobase.

Currently, such composite data types as Token_t, Expression_t, and Statement_t are represented as either plain C tagged unions or a bunch of values and pointers, some of which can be invalid in a particular state. I suggest changing the implementation to use algebraic data types, thereby achieving convenient manipulation with pattern matching and type safety. The library is called Datatype99 (I am the author).

For example, we could represent tokens like this:

[arobase/includes/tokens.h]

#include <datatype99.h>

datatype(
    TokenKind,
    (Tok_Int, int64_t),
    (Tok_Float, double),
    (Tok_Plus),
    (Tok_Minus),
    // etc.
);

typedef struct Token {
    TokenKind kind;
    unsigned long int lineno;
    struct Token *next;
} Token_t;

This way, int64_t is available only if the token kind is Tok_Int, and cannot exist in any other state (type safety). The same holds for all other variants.

We can match the token type as follows:

TokenKind token = Tok_Int(42);
match(token) {
    of(Tok_Int, x) printf("Got %" PRId64 ".\n", *x);
    of(Tok_Plus) printf("Just +.\n");
    // etc.
}

Likewise, expression variants might be something like (Expr_Int, int64_t) (the int64_t int_value field in Expression_t), (Expr_FuncCall, FuncName, struct args *), etc. With the current definition of Expression_t and Statement_t, it is quite hard to understand which fields are used under which conditions.

The Statement_t structure can be refactored like so:

datatype(
    Statement,
    (Stmt_Decl, struct decl_),
    (Stmt_If, struct statement_ *),
    // etc.
);

I guess it is clear now. More usage examples can be found here, particularly examples/ast.c and examples/token.c.

During the refactoring, I can help with anything -- feel free to ask me any time.

Njord0 commented 2 years ago

Hi Hirrolot !

At first, thank you for your time ! I've started watching at Datatype99 and it seems to be a good solution to replace current implementation of tokens and others data types !

I've started the rewrite here

hirrolot commented 2 years ago

Awesome! Just FYI: you can download Metalang99 & Datatype99 as Git submodules so that they could be updated easily:

$ cd arobase/includes
$ git submodule add https://github.com/Hirrolot/metalang99
$ git submodule add https://github.com/Hirrolot/datatype99

Then add arobase/includes/datatype99/ and arobase/includes/metalang99/ to include directories and it should work fine.