Closed jpsnyder closed 2 years ago
The lexer in SLY is structured as a generator. One way to implement this is to write another generator function that filters/rewrites the token stream.
class MyCustomToken:
...
def as_custom_tokens(tokens):
for tok in tokens:
yield MyCustomToken(...)
You'd wrap the original token stream with as_custom_tokens()
. For example:
parser.parse(as_custom_tokens(lexer.tokenize(text)))
It's probably not the only way to do it, but this general approach can be useful for modifying any aspect of the input token stream.
I'm aware of this technique. (My lexers already heavily include customized tokenize()
routines)
But I didn't think of wrapping the tokenizer outside of the call as well. brainfart
Was trying to avoid lots of duplicate code that would need to be injected into all of the lexers in my project. So this seems reasonable to do since there is only a single call to .parse()
For anyone interested, I ended up writing something like this which will give the column instead of the index when you print it out in error messages, etc.
def _as_custom_tokens(text, tokenizer):
"""
Customizes tokens coming from tokenizer to include column indexing.
"""
@dataclass
class _Token:
type: str
value: Any
lineno: int
index: int
def __repr__(self):
return f'Token(type={self.type!r}, value={self.value!r}, lineno={self.lineno}, column={self.column})'
@property
def column(self) -> int:
"""Determines column index of given token"""
last_cr = text.rfind('\n', 0, self.index)
if last_cr < 0:
last_cr = 0
return (self.index - last_cr) + 1
for token in tokenizer:
yield _Token(token.type, token.value, token.lineno, token.index)
...
lexer = get_lexer(language)
parser = get_parser(language)
tokens = _as_custom_tokens(code, lexer.tokenize(code))
root_node = parser.parse(tokens)
Thanks!
I'm trying to figure out an easy way to add a column_index property attribute to the
Token
object. But trying to patch this in is proving to be very complex.Since the
Lexer
class keeps an attribute ofself.text
in its instance. I thought it might be easy to attach thetext
to any generatedToken
object and then add a propertycolumn_index
to be able to access the column index on-demand. By using a property, we can avoid pre-calculating this on all tokens, but it will be available when we need it (such as error reporting)Please let me know your thoughts on this.
If you don't want to include such a feature, perhaps we can add a way to provide our own alternative implementation of
Token
and a way to inject the token build to allow us to do whatever we want?