dabeaz / sly

Sly Lex Yacc
Other
816 stars 107 forks source link

[Proposal] Add an alternatnive, more explicit way of using the library. #95

Open MinekPo1 opened 2 years ago

MinekPo1 commented 2 years ago

Motivation

  1. Readability

The abstract meta-programing features, require the reader to be acquainted to the library. While acceptable in most environments, in some this is a down side.

  1. Third party tools

While somewhat an extension of readability, I feel like its also important to mention. In some environments, these third party tools can be required and long comments disabling parts of these tools are not only looked down on, but also can prevent the tools from checking the code the developer wrote.

  1. Personal preference

For some, more explicit aliases can be preferable. Like how some people prefer tabs over spaces (haha).

Potential solution

sly.explicit

This new optional sub-module would include explicit aliases for existing meta-programing syntaxes.

Replaces:

class SomeLexer(Lexer):
    ...
    ABC_TOKEN = r'[abc]'
    ABC_TOKEN['a'] = 'A_TOKEN'

With:

class SomeLexer(Lexer):
    ...
    ABC_TOKEN = TokenType(r'abc')
    ABC_TOKEN['a'] = TokenType(name='A_TOKEN')

Aliases for _.

Replaces:

class SomeLexer(Lexer):
    ...
    @_(r'\d')
    def NUMBER(self,t):
        ...
...
class SomeParser(Parser):
    ...
    @_("A_TOKEN LETTER")
    def rule(self, p):
        ...

With

class SomeLexer(Lexer):
    ...
    @add_action(r'\d')
    def NUMBER(self,t):
        ...
...
class SomeParser(Parser):
    ...
    @add_rule("A_TOKEN ABC_TOKEN")
    def rule(self, p):
        ...

The main difference between the two is if the positional argument is of type Token or YaccProduction.

Extensive type annotation for interface functions.

This would not only apply to the new explicit interface, but also to the existing one, improving self documentation.

Aside from the members of sly.explicit this would also include:

Final thoughts

I understand, that meta-programing in this libraries spirit and that the problems I laid out in the motivation paragraph are known of and considered low (or even lower) priority. I however think, that this proposed alternative interface, would allow for whom these problems are important to solve them.

Potentially, the note outlined in Contributing.md could be reformatted to allow changes within this submodule.

I look forward to any suggestions and hopefully the go-ahead for me to implement this.

MinekPo1 commented 2 years ago

Also, an after thought:

Possibly

SOME_TOKEN[r'pattern'] = TokenType(name='OTHER_TOKEN')

could be simplified to

SOME_TOKEN[TokenType(r'pattern',name='OTHER_TOKEN')]

This would however require modification of the underlying meta-syntaxes. However, if implemented with a slice, it would add this syntax:

SOME_TOKEN[r'pattern':'OTHER_TOKEN']

Since TokenStr, does not have a __getitem__ method, this will not cause any collisions.

Edit:

Also implementing __getitem__ could allow the user to delete the del keyword from

del SOME_TOKEN['KEYWORD']
jpsnyder commented 2 years ago

I give no opinion on the proposed solution. But I do think having an option to do things more explicitly if the need arises would be nice. I think an explicit api could solve other things like having inherited classes or improving customization.

Although I imagine that would be a huge undertaking, so I wouldn't hold my breath.

MinekPo1 commented 2 years ago

Although I imagine that would be a huge undertaking, so I wouldn't hold my breath.

I would guess that with the implementation I have in my head it could be made in under 30 lines + small changes in existing code.

The hard part of those changes (figuring out the types) I have done in #96.

I'll try publishing an example implementation as a gist later today.

dabeaz commented 2 years ago

I've been thinking about this. Bottom line: I don't want to provide an alternate API on SLY. The whole point of the project was to create a DSL for specifying parsers using sneaky metaprogramming features. I acknowledge that this sort of thing isn't for everyone. However, there are numerous other Python parsing tools that can solve the same problem as SLY using a variety of different APIs.

This said, I HAVE been thinking about a refactoring of SLY that more cleanly isolates the LALR(1) parsing engine from the top-level user interface. I might also break the parsing engine out into its own library that could be shared between SLY and PLY. So, perhaps one could (eventually) code something on top of that.

I'm also not opposed to someone taking SLY, modifying it to have a different interface, and releasing it as a different package. People did this kind of thing with the PLY project and it doesn't bother me at all. I'd just ask that you send me a link so that I could tell people about it on the SLY README file.

MinekPo1 commented 2 years ago

Thinking about it, TokenType is not really necessary as TokenStr could just be used. (Granted the name kwarg is not supported)

Maybe the decorators could be exposed allowing them to be used directly, instead of the _, replacing the proposed aliases?

If this is not something you see be acceptable I might create a secondary module with the aliases I described.

Also, what do you think about the TOKEN["pattern":"name"] and TOKEN["keyword"] syntaxes?