Closed ryszard-swierczynski closed 7 months ago
About custom delimiters in keywords - it's make sense.
But you have to know that any rules wont be work. Tokenizer doesn't supported any rules for token sequences because it just simple parser. For example rule "after colon may by only @ or keyword" won't be work.
Currently you need to write method for your complex keyword, like
colonTokenKey := TokenKey(1) // :
atTokenKey := TokenKey(2) // @
tokenizer := New()
// ..
stream := tokenizer.ParseString(str)
for {
keyword := []string{}
if stream.CurrentToken().Is(TokenKeyword) {
keyword = append(keyword, stream.CurrentToken().ValueString())
if stream.IsNextSequence(colonTokenKey, atTokenKey, TokenKeyword) {
stream.NextToken()
stream.NextToken()
keyword = append(keyword, ":", "@", stream.CurrentToken().String())
} else if stream.IsNextSequence(colonTokenKey, TokenKeyword) {
stream.NextToken()
keyword = append(keyword, ":", stream.CurrentToken().String())
}
// ...
}
}
(code not tested, it is just an example)
I am sorry, I think there is a little misunderstanding. I would like to skip additional logic to determine if something is a "@" or "-". In the example I have written, I wanted to use ":" as normal separator which will be tokenized and I have done this:
parser := tokenizer.New()
parser.DefineTokens(TColon, []string{":"})
The rest of characters like "@" and "-" - I want to treat as a part of keyword, no special meaning for them, thus I've written a simple, ugly piece of code to verify my idea, it looks like that:
if unicode.IsLetter(r) ||
(p.t.flags&fAllowKeywordUnderscore != 0 && p.curr == '_') ||
(p.t.flags&fAllowNumberInKeyword != 0 && start != -1 && isNumberByte(p.curr)) ||
r == '@' || r == '-'
It seems to be working correctly, as I am receiving tokens:
{ TokenKeyword : name }
{ TColon : : }
{ TokenKeyword : some-value }
{ TokenKeyword : name }
{ TColon : : }
{ TokenKeyword : @value }
as intended. The "-" and "@" behaves in the same way as underscore character which is what I need.
I added AllowKeywordSymbols. Please try it and give me feedback.
No feedback
Token Keyword may contain only letters and (if configured) underscore or number. How about other special characters that may occur in some strings, for example:
"name:some-value"
or even"name:@value"
Assuming that those special characters are not a part of this specific grammar, it may be nice to add a way to automatically join them into Keyword. Maybe there is a way to do that right now, but I haven't found a solution. What I can suggest is to change:
or something like that. Currently as a temporary solution to check I've added something like that and I am receiving keywords like "some-value" and "@value".