alex / rply

An attempt to port David Beazley's PLY to RPython, and give it a cooler API.
BSD 3-Clause "New" or "Revised" License
381 stars 60 forks source link

How would I separate statements? #111

Open ghost opened 2 years ago

ghost commented 2 years ago

So I'm making this markup language that compiles to HTML, it's called bml.

But i'm having trouble separating statements.

This is its syntax:

Statements:
in HTML:
<!DOCTYPE HTML> in BML:
{ "DOCTYPE" "HTML" };

Declarations: in HTML:
<h1>Hello, World1</h1> in BML:
"h1" { "Hello, World!" };

And in the lexer and parser classes:

Lexer:

from rply import LexerGenerator

class BMLLexer():
    def __init__(self):
        self.__lexer = LexerGenerator()

    def __add_tokens(self):
        # Statement definitions
        self.__lexer.add('OPEN_STATEMENT', r'\{')
        self.__lexer.add('CLOSE_STATEMENT', r'\}')
        self.__lexer.add('STATEMENT_END', r'\;')
        # Ignore spaces
        self.__lexer.ignore('\s+')
        # Anything
        self.__lexer.add('STRING', r'["][\w\s]+["]')

    def build(self):
        self.__add_tokens()
        return self.__lexer.build()

Parser:

import re
from bmls.language.parser.definitions import BMLDefinition, BMLStatement
from rply import ParserGenerator

class BMLParser():
    def __init__(self):
        self.pg = ParserGenerator(
            # A list of all token names accepted by the parser.
            [
                'OPEN_STATEMENT',
                'CLOSE_STATEMENT',
                'STATEMENT_END',
                'STRING'
            ]
        )

    def parse(self):
        @self.pg.production('expression : OPEN_STATEMENT STRING STRING CLOSE_STATEMENT STATEMENT_END')
        def statement(p):
            name = ""
            definition = ""

            if p[1].gettokentype() == "STRING" and p[2].gettokentype() == "STRING":
                name = self.__removeQuotes(p[1].getstr())
                definition = self.__removeQuotes(p[2].getstr())
                print("Statement: (" + name + " , " + definition + ")")

            return BMLStatement(name, definition)

        @self.pg.production('expression : STRING OPEN_STATEMENT STRING CLOSE_STATEMENT STATEMENT_END')
        def definition(p):
            name = ""
            definition = ""

            if p[0].gettokentype() == "STRING" and p[2].gettokentype() == "STRING":
                name = self.__removeQuotes(p[0].getstr())
                definition = self.__removeQuotes(p[2].getstr())

                print("Definition: (" + name + " , " + definition + ")")

            return BMLDefinition(name, definition)

        @self.pg.error
        def error_handle(token):
            raise SyntaxError("Error on ( Token, type \"" + token.gettokentype() + "\" , Value \"" + token.getstr() + "\")")

    def build(self):
        return self.pg.build()

    def __removeQuotes(self, tok):
        return re.sub(r'^"|"$', '', tok)

Okay, everything seems okay right? Well I'm running into this issue. Maybe I'm a stupid beginner that doesn't understand this yet, who cares...

So here's where I call things:
SimpleBML is the simple thing I use to call the lexer and parser.

from bmls.language.lexer import BMLLexer
from bmls.language.parser import BMLParser
from bmls.language.parser.definitions import BMLDefinition

class SimpleBML():
    def __init__(self):
        pass

    def parse(self, content):
        lexer = BMLLexer().build()
        tokens = lexer.lex(content)

        pg = BMLParser()
        pg.parse()
        parser = pg.build()
        tkk = parser.parse(tokens)

And this is where I call SimpleBML:

from bmls.language.simple import SimpleBML
import rply

class BMLInterpreter:
    def run(self):
        self.lexer = SimpleBML()

        while True:
            inf = input("Interpreter > ")
            tokens = self.lexer.parse(inf)
            if tokens != None:
                for token in tokens:
                    print(token)

BMLInterpreter().run()

Anyways, tldr; I can't write multiple statements in one line.

Statement stuff:

PS C:\Users\*****\OneDrive\Documents\langs\bml> & C:/Python39/python.exe 
    c:/Users/*****/OneDrive/Documents/langs/bml/interp.py
    Interpreter > { "DOCTYPE" "HTML" };
    Statement: (DOCTYPE , HTML)
    Interpreter >

Definition stuff:

PS C:\Users\*****\OneDrive\Documents\langs\bml> & C:/Python39/python.exe 
    c:/Users/*****/OneDrive/Documents/langs/bml/interp.py
    Interpreter > "h1" { "Hello" };
    Definition: (h1 , Hello)
    Interpreter >

But I can't combine these things:

Interpreter > "h1" { "Hello" }; { "h1" "Hello" };
Definition: (h1 , Hello)
Traceback (most recent call last):
  File "c:\Users\*****\OneDrive\Documents\langs\bml\interp.py", line 16, in <module>
    BMLInterpreter().run()
  File "c:\Users\*****\OneDrive\Documents\langs\bml\interp.py", line 10, in run
    tokens = self.lexer.parse(inf)
  File "c:\Users\*****\OneDrive\Documents\langs\bml\bmls\language\simple\simple.py", line 16, in parse
    tkk = parser.parse(tokens)
  File "C:\Python39\lib\site-packages\rply\parser.py", line 60, in parse
    self.error_handler(lookahead)
  File "c:\Users\*****\OneDrive\Documents\langs\bml\bmls\language\parser\parser.py", line 45, in error_handle       
    raise SyntaxError("Error on ( Token, type \"" + token.gettokentype() + "\" , Value \"" + token.getstr() + "\")")
SyntaxError: Error on ( Token, type "OPEN_STATEMENT" , Value "{")
nobodxbodon commented 2 years ago

It seems the current production rules are only for single statement. Try adding some production rules for multiple statements?

nobodxbodon commented 2 years ago

Here's a quick little demo to support multiple expressions like "hi";"there";"buddy";.

ghost commented 2 years ago

i guess it works now, but im getting a crazy error.

AttributeError: 'str' object has no attribute 'getstr'

nobodxbodon commented 2 years ago

@AcaiBerii happens to me from time to time. getstr is a Token method to get corresponding string. If it's already a string, no need to use getstr(). Have you tried removing it?

ghost commented 2 years ago

Yes, I've somehow got it to work now. I'll try testing it with multiple statements now, it seems to be half-working.

ghost commented 2 years ago

It works! Thank you!

ghost commented 2 years ago

For anyone else, here is my new code:

Lexer:

from rply import LexerGenerator

class BMLLexer():
    def __init__(self):
        self.__lexer = LexerGenerator()

    def __add_tokens(self):
        # Statement definitions
        self.__lexer.add('OPEN_STATEMENT', r'\{')
        self.__lexer.add('CLOSE_STATEMENT', r'\}')
        # Ending things
        self.__lexer.add('ENDING_CHAR', r'\;')
        # Basic things
        self.__lexer.add('STRING', r'[`][a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)\{\}\[\]\;\:\'\\/\.\,\<\>\?\-\=\+\_]+[`]')
        # Ignore spaces
        self.__lexer.ignore('\s+')

    def build(self):
        self.__add_tokens()
        return self.__lexer.build()

Parser:

import re
from bmls.language.parser.definitions import BMLDefinition, BMLStatement
from rply import ParserGenerator

class BMLParser():
    def __init__(self):
        self.pg = ParserGenerator(
            # A list of all token names accepted by the parser.
            [
                'OPEN_STATEMENT',
                'CLOSE_STATEMENT',
                'ENDING_CHAR',
                'STRING',
            ]
        )

    def parse(self):
        @self.pg.production("main : expr")
        @self.pg.production("main : main expr")
        def main(p):
            return p

        @self.pg.production('expr : OPEN_STATEMENT STRING STRING CLOSE_STATEMENT ENDING_CHAR')
        def statement(p):
            name = ""
            definition1 = ""

            name = self.__removeFirstLast(p[1], '`', '`')
            definition = self.__removeFirstLast(p[2], '`', '`')

            print("<" + name + " " + definition + ">")

            return BMLStatement(name, definition)

        @self.pg.production('expr : STRING OPEN_STATEMENT expr CLOSE_STATEMENT ENDING_CHAR')
        def definition(p):
            name = ""
            definition = ""

            name = self.__removeFirstLast(p[0].getstr(), '`', '`')
            definition = self.__removeFirstLast(p[2], '`', '`')

            print("<" + name + ">" + definition + "</" + name + ">")

            return BMLDefinition(name, definition)

        @self.pg.production('expr : STRING ENDING_CHAR')
        def string_expr(p):
            if p[0].gettokentype() == "STRING":
                return self.__removeFirstLast(p[0].getstr(), '`', '`')

        @self.pg.error
        def error_handle(token):
            raise SyntaxError("Error on Token (\"" + token.gettokentype() + "\" , \"" + token.getstr() + "\")")

    def build(self):
        return self.pg.build()

    def __removeFirstLast(self, tok, char, endchar):
        if (tok.startswith(char) and tok.endswith(endchar)):
            return re.sub(r'^' + char + r'|' + endchar + r'$', '', tok)
        else:
            return tok
ghost commented 2 years ago

I'm still extremely sorry for nagging you, but I've updated my code and something weird is going on again.

mode > full
full > `h1` { { `hey` `h1` }; };
c:\Users\*****\OneDrive\Documents\langs\bml\bmls\language\parser\parser.py:50: ParserGeneratorWarning: 1 shift/reduce conflict
  return self.pg.build()
['<h1><hey h1></h1>']

full > `h1` { { `hey` `h1` }; }; `h33` { `hello` };
[['<h1><hey h1></h1>'], '<h33>hello</h33>'] (<< weird grouping by statement into list)
full >

Parser seems to be forcing them into an array by statement. Here is my code:

import re
from bmls.language.parser.definitions import BMLDefinition, BMLStatement
from rply import ParserGenerator, Token

class BMLParser():
    def __init__(self):
        self.pg = ParserGenerator(
            # A list of all token names accepted by the parser.
            [
                'OPEN_STATEMENT',
                'CLOSE_STATEMENT',
                'ENDING_CHAR',
                'STRING',
            ]
        )

    def parse(self):
        @self.pg.production("main : expr")
        @self.pg.production("main : main expr")
        def main(p):
            return p

        @self.pg.production('expr : OPEN_STATEMENT STRING STRING CLOSE_STATEMENT ENDING_CHAR')
        def statement(p):
            name = self.__removeFirstLast(self.__toSTRING(p[1]), '`', '`')
            definition1 = self.__removeFirstLast(self.__toSTRING(p[2]), '`', '`')

            comp = BMLStatement(name, definition1)
            return self.__toHTML(comp)

        @self.pg.production('expr : STRING OPEN_STATEMENT expr CLOSE_STATEMENT ENDING_CHAR')
        def definition(p):
            name = self.__removeFirstLast(self.__toSTRING(p[0]), '`', '`')
            definition1 = self.__removeFirstLast(self.__toSTRING(p[2]), '`', '`')

            comp = BMLDefinition(name, definition1)
            return self.__toHTML(comp)

        @self.pg.production('expr : STRING')
        def string_expr(p):
            if p[0].gettokentype() == "STRING":
                return self.__removeFirstLast(self.__toSTRING(p[0]), '`', '`')

        @self.pg.error
        def error_handle(token):
            raise SyntaxError("Error on Token (\"" + token.gettokentype() + "\" , \"" + token.getstr() + "\")")

    def build(self):
        return self.pg.build()

    def __removeFirstLast(self, tok, char, endchar):
        if isinstance(tok, str):
            if tok.startswith(char) and tok.endswith(endchar):
                return re.sub(r'^' + char + r'|' + endchar + r'$', '', tok)
            else:
                return tok
        else:
            return tok

    def __toHTML(self, tok):
        if isinstance(tok, BMLDefinition):
            return "<" + tok.left + ">" + tok.right + "</" + tok.left + ">"
        elif isinstance(tok, BMLStatement):
            return "<" + tok.left + " " + tok.right + ">"
        elif isinstance(tok, Token):
            return tok.getstr()
        else:
            return tok

    def __toSTRING(self, tok):
        if isinstance(tok, Token):
            return tok.getstr()
        else:
            return tok

Am I dumb?

nobodxbodon commented 2 years ago

@AcaiBerii it's because of this part:

        @self.pg.production("main : expr")
        @self.pg.production("main : main expr")
        def main(p):
            return p

p is an array containing the parts on the right side of the production rule. The second rule main : main expr has main as the first part, which evaluates as an array itself. Below is one way to "flatten" the array:

def main(p):
    if len(p) == 1:
        return p
    else:
        p[0].append(p[1])
        return p[0]
ghost commented 2 years ago

Thank you, and just one more question (i know you're going "oh god" in your head by this point):

How would I allow an infinite amount of exprs inside of a definition? like this:

`h1` { `h2` { `hello` }; `h2` { `world!` }; };
nobodxbodon commented 2 years ago

@AcaiBerii I can't run your code above because of some missing class. Off top of my head, main can be multiple expressions, so have you tried changing this rule

expr : STRING OPEN_STATEMENT expr CLOSE_STATEMENT ENDING_CHAR

to

expr : STRING OPEN_STATEMENT main CLOSE_STATEMENT ENDING_CHAR

? You may need to change definition method accordingly.

ghost commented 2 years ago

Just git-clone https://github.com/AcaiBerii/bml and run interp.py

ghost commented 2 years ago

Tomorrow I will credit you in the development of bml, because you've been a huge help in bml's development. Thank you!