aMOPel / tree-sitter-nim

tree-sitter parser for the nim programming language
MIT License
36 stars 10 forks source link

Use nim compiler to parse strange syntax #14

Open YesDrX opened 2 years ago

YesDrX commented 2 years ago

Can we use nim complier as library (runtime rather than compiletime) to parse edge case statements? Like in the example below, there is a if expression. In theory, we can create a shared library, in which there are some C-functions like

bool isIfExpr(char* src, size_t src_size);

To run the following example, you need

nimble install compiler

example

import compiler / [ast, idents, parser, options]
import strutils
import strformat

var
    identCache = newIdentCache()
    configRef = newConfigRef()
    code = """
var tmp = if 1 > 2 : "1 is bigger than 2" else : "1 is not bigger than 2"
"""

proc echoTree(tree : PNode, indent_level : int = 0) : string =
    if tree != nil:    
        if tree.kind == nkIfExpr:
            echo fmt"""
            ====================================================
                    WHAT? WE DETECTED AN IF_EXPRESSION
            ====================================================
            """

        result = result & " ".repeat(4 * indent_level) & tree.kind.`$` & fmt" : ({tree.info.line}:{tree.info.col}) "

        case tree.kind
        of nkCharLit .. nkUInt64Lit:
            result = result & fmt" : {tree.intVal}"&"\n"
        of nkFloatLit .. nkFloat128Lit:
            result = result & fmt" : {tree.floatVal}"&"\n"
        of nkStrLit .. nkTripleStrLit:
            result = result & fmt" : {tree.strVal}"&"\n"
        of nkSym:
            result = result & "\n"
        of nkIdent:
            result = result & fmt": {tree.ident.s}"&"\n"
        else:
            result = result & "\n"
            for son in tree.sons:
                result = result & son.echoTree(indent_level + 1)

echo  code.parseString(identCache, configRef).echoTree

Output

            ====================================================
                    WHAT? WE DETECTED AN IF_EXPRESSION
            ====================================================

nkStmtList : (1:0) 
    nkVarSection : (1:0) 
        nkIdentDefs : (1:4) 
            nkIdent : (1:4) : tmp
            nkEmpty : (1:8) 
            nkIfExpr : (1:10) 
                nkElifExpr : (1:13) 
                    nkInfix : (1:15) 
                        nkIdent : (1:15) : >
                        nkIntLit : (1:13)  : 1
                        nkIntLit : (1:17)  : 2
                    nkStmtList : (1:21) 
                        nkStrLit : (1:21)  : 1 is bigger than 2
                nkElseExpr : (1:42) 
                    nkStmtList : (1:49) 
                        nkStrLit : (1:49)  : 1 is not bigger than 2
aMOPel commented 2 years ago

Interesting idea.

I don't know how to make that work though. Have you read the tree sitter docs on creating parsers?

The src/parser.c is completely generated from the grammer.js file (using the tree sitter cli). I don't know of any interface to insert things at runtime into parser.c, but there is src/scanner.cc which offers more fine grained control over parsing than the DSL in grammar.js.

Theoretically you could import the nim compiler library as c code or cpp code in the src/scanner.cc. However the way, that the scanner (and probably parser) works is character by character and I don't know how that plays with the nim compiler library.

To give an example, currently the triplestr_lit is done in the scanner.cc, or at least the content and the ending quotes.

https://github.com/aMOPel/tree-sitter-nim/blob/main/grammar.js#L1183

It works like this: In the grammar.js, we match a triplestr_lit if we find the """ followed by _multi_string_content rules and a _multi_string_end rule. Those are done in the src/scanner.cc here:

https://github.com/aMOPel/tree-sitter-nim/blob/main/src/scanner.cc#L147

and the way the API works is character by character. You can use lexer->lookahead to look at the next char, advance(lexer) to match the next char and go 1 char forward, and skip(lexer) to not match the next char and go 1 char forward. (there is also mark_end)

That is pretty much the whole API. So I don't really know how to make this work with the nim compiler lib, but frankly I never used it, so maybe you have an idea.

I would be curious about the size of the parser, when you would to import the nim compiler.