dabeaz / sly

Sly Lex Yacc
Other
816 stars 107 forks source link

May resume SLY work in September 2021 #76

Open dabeaz opened 3 years ago

dabeaz commented 3 years ago

Just a quick note that may resume some work on SLY in the coming month. Open to anything that improves it. Some things I'm thinking about

jpsnyder commented 2 years ago

One QoL feature I would like to see:

I find it tedious to have to manually define the tokens set in the beginning of the Lexer class. And can be confusing since the ignore_* tokens don't have to be added to the set to be available. Perhaps the Lexer class can just assume all fields / wrapped functions that are in all uppercase (and don't start with a _) counts as a token and get automatically filled into the tokens field (if the user didn't explicitly define the set).

Also, as an alternative use case. It would be nice if the tokens field can be a list instead. That way we can define the proper order of applying the lexing rules with the list so we don't have to worry so much about the order the tokens are declared in the class.

Thanks for your hard work on this!

hadware commented 2 years ago

Also, as an alternative use case. It would be nice if the tokens field can be a list instead. That way we can define the proper order of applying the lexing rules with the list so we don't have to worry so much about the order the tokens are declared in the class.

I definitely agree with that part!

jpsnyder commented 2 years ago

Another feature that I would like to see based on an issue I ran into, would be to have a way to generate more than one token during error handling for the lexer. In some situations where the language you are trying to lex is sufficiently complex, one might need to add some hacks in the error() handling function.

It would be nice if we could yield multiple tokens from the error() function when we need to resort to doing some more manual regex pattern matching to get the next few tokens before getting back on track.

def error(self, t):
    # On error yield the raw whitespace separated arguments as tokens until we see a newline.
    text, _, _ = t.value.partition("\n")
    for arg in text.split(" "):
        new_token = YaccSymbol()
        new_token.type = "ARGUMENT"
        new_token.value = arg
        yield new_token
BlakeCThompson commented 1 year ago

@dabeaz Something I think would be nice would be the ability to change the start attribute after having initialized the parser, and have the grammar updated appropriately.

For example, if I have a test class to test my grammar, if I want to test small subsets of the grammar, I need to manually change the start attribute in my parser every time that I want to test the smaller subsets. If I could easily change the start attribute at the time parse was called, it would make automating tests a lot easier

BlakeCThompson commented 1 year ago

There are probably several ways to do this ^^ but I've made a PR with one solution https://github.com/dabeaz/sly/pull/101