Forcing a fixed (but smaller) regex prior to m.match()/m.group() at a certain state?

Maybe I bit more than I can chew, but I'm trying to develop a BIND9 configuration parser using Sly (and formerly Ply).

Basic problem is the Sly/Ply's auto-typing of multiple ID (identifiers) and whether I should generalize all my variable fields into just one ID type or not given the constrain put forth by sly (or ply) design.

BIND9 configuration is a weird comportment of C-style/Python-style comment, include statement, alias dictionary, multiple-LBRACE/RBRACE nesting, and ignoring newlines centered by using SEMICOLON as a statement terminator. I got all that working except for one: ID type discriminator (via multi-token regex).

include named-options.conf;
server example.com;

My first attempt to further subdivide/specialize that generic ID token was to break it up into multiple ID-type tokens and define SERVER ALIASNAME and INCLUDE FILESPEC using:

    t_SERVER_NAME = r'[A-Za-z0-9_\-\.]*'
    t_FILESPEC = r'([/\\:\._\-0-9A-Za-z]+)(?=[ \t]*;)'

I ran into that classic problem where a certain state is identifying the "ID" as a wrong token type.

After much reading of Google Group, StackOverflow, and GitHub forum/issues, I've concluded that any attempt to discriminate identifier (variable, aliasname, full domain name) is futile due to inability for regex to properly identify these identifiers.

Then I thought, why not at initialization time that I would forcibly pre-assign its smaller regex for that certain state (heck, for most states).

At any rate, I see three choices ahead of me:

Is there a way to pre-select a lone (but smaller) token regex after entering into a next state instead of using the more generalized multi-token ID type identification regex?
Or is verification of its variable naming convention (using just t_ID) best done inside the state-specific parser function (ie., p_clause_server and p_clause_include) and not at token-level (ie., t_FILESPEC and t_SERVER_NAME)?
Or did I overlook another tip?

If I can nail this, NGINX configuration file format will soon follow and I can post the result in its entirety here in GitHub for other security researchers to use.

dabeaz / sly

Forcing a fixed (but smaller) regex prior to m.match()/m.group() at a certain state? #23