API for Tokens Emitting in custom token patterns

bd82 commented 7 years ago

Originated from https://github.com/SAP/chevrotain/issues/373 and https://github.com/SAP/chevrotain/issues/414

bd82 commented 7 years ago

Should possibly be limited to "virtual" Tokens like Indents/outdent So the Lexer will continue handling all the position information.

Qix- commented 7 years ago

*poke*

This would be lovely to have. Doing some semantic indentation parsing and haven't a clue as to where to start. Looked up if Python had been implemented as a parser and found this.

bd82 commented 7 years ago

Hello @Qix- .

There is an example for dealing with Python like indentation here It is not the prettiest but it works...

This issue is more about cosmetic API changes that would make that example prettier and potentially support more scenarios so it is unfortunately not a high priority.

I'm currently more focused on POCing a scannerless ECMAScript parser and then investigating it's performance versus leading ECMAScript hand built parsers such as Acorn/Esprima. and in the longer term supporting Parser Generator/Combinator type APIs

Another alternative is to use a none Chevrotain lexer (hand built / generated). By creating Chevrotain Tokens or converting to Chevrotain Tokens.

On Creating Tokens

I'm also considering that this issue may be completely redundant if the Chevrotain Lexer would be refactored to output one token at a time (calling .next() many times) instead of tokenizing the whole input in one go then the user logic handling the indentation would not require any special treatment/apis by Chevrotain as it could be implemented completely separately from the regular lexing.

let nextToken
let tokens = []
while (nextToken = myLexer.next()) {
   if (nextToken.tokenType === Whitespace.tokenType && ...) {
     // indentation handling    
     // tokens.push(Indent/OutDent)
   }

    tokens.push(nextToken)
}

Chevrotain / chevrotain

API for Tokens Emitting in custom token patterns #415