Open gemerden opened 7 years ago
@gemerden Thanks. This is an easy change that makes a lot of sense Just out of curiosity, what would be custom TOKENS you would need? BTW, slightly related, here is an example of a tokenizer that uses customs tokens (and uses a trie/aho-corasick automaton for tokens recognition) https://github.com/nexB/license-expression/blob/f3421c1a1f409249ba86a16b7b46c2e987f6ab35/src/license_expression/__init__.py#L409
@pombredanne: i only use '|', '&' and '!', '(' and ')' and i use e.g. '*' for something else (as a wildcard). I needed to change more in tokenize(); roughly: everything that is not a token i accept as a symbol, but i need to do some more testing. Currently it looks like this:
class KeyParser(BooleanAlgebra):
DEFAULT_TOKENS = {
'&': TOKEN_AND,
'|': TOKEN_OR,
'!': TOKEN_NOT,
'(': TOKEN_LPAR,
')': TOKEN_RPAR,
}
def __init__(self, TOKENS=None, *args, **kwargs):
super(KeyParser, self).__init__(Symbol_class=WildSymbol,
OR_class=SET_OR,
AND_class=SET_AND,
NOT_class=SET_NOT,
*args, **kwargs)
self.TOKENS = TOKENS or self.DEFAULT_TOKENS
def tokenize(self, expr):
if not isinstance(expr, basestring):
raise TypeError('expr must be string but it is %s.' % type(expr))
TOKENS = self.TOKENS
length = len(expr)
position = 0
while position < length:
tok = expr[position]
sym = tok not in TOKENS
if sym:
position += 1
while position < length:
char = expr[position]
if char not in TOKENS:
position += 1
tok += char
else:
break
position -= 1
try:
yield TOKENS[tok], tok, position
except KeyError:
if sym:
yield TOKEN_SYMBOL, tok, position
else:
raise ParseError(token_string=tok, position=position, error_code=PARSE_UNKNOWN_TOKEN)
position += 1
by sym = tok not in TOKENS
i leave the possibility to put more (a different) syntax in the symbols. When I am happy with my project I'll make the repo public and share the link here.
@gemerden OK, check also this other simpler tokenizer: https://github.com/nexB/license-expression/blob/master/src/license_expression/__init__.py#L1127
Thanks, the code above is passing all my tests, so for now i am ok.
ok, your call. You can send a PR or close this as you like.
In the tokenize() method of BooleanAlgebra, would it be possible to change the tokens without inheriting the whole method and changing just the tokens, e.g.:
Or perhaps define the current tokens outside the method and make them the default TOKENS instead of None above.
This makes it less likely that in future versions the inheriting class becomes outdated.
Cheers, Lars