Closed OTP-Maintainer closed 3 years ago
jlouis
said:
You can probably handle this by simply recognizing [A-Z0-9]+ and then handle it in a function:
{code:erlang}
[A-Z0-9]+ : recognize_length(TokenLine, TokenChars)
recognize_length(Line, Chs) when length(Chs) >= 10 andalso length(Chs) =< 11 ->
{token, {product_id, Line, Chs}};
recognize_length(Line, Chs) when length(Chs) == 7 ->
{token, {product_id, Line, Chs}};
recognize_length(_, _) -> {error, blargh}.
{code}
I'm not really sure why you want to extend leex with extensions to its regex engines. It isn't that common in lexers to provide these kinds of extensions, based on my knowledge of them over the years.
sunaku
said:
Thanks for suggesting a workaround. However, in my humble opinion, the workaround feels laborious in comparison to just specifying numerical repetition quantifiers. In addition, I feel it may become more complicated when having to workaround more complex regular expressions that nest numerical quantifiers among greedy ones, like this:
{noformat}
(([[:alnum:]]{10,11}\.){3,5})+
{noformat}
Since you considered numerical repetition quantifiers as an "extension" to leex's regex engine, I looked into [AWK's regex specification](http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_06) again more closely and found that numerical repetition quantifiers (also known as "BREs Matching Multiple Characters") are part of AWK's general (or _basic_) regular expression syntax (and not the extended syntax as I had previously assumed --- I was wrong about that; sorry for the confusion).
As a result, this is all the more reason for leex to support numerical repetition quantifiers (or "BREs Matching Multiple Characters") because they're part of AWK's basic regular expressions.
Thanks for your consideration.
jlouis
said:
I'm calling "isn't really needed". Usually, you would just tokenize your product_id blindly and then handle the verification at a later stage, either in the parser or even later in the stages. It is also this reason there has been no pressing need for such an extension. The {m} and {m,n} repetitions are fairly easy to implement by unrolling the DFA construction the right number of times, but I'm sure it would be that useful.
sunaku
said:
Alright, what about the fact that this "extension" seems to have been already implemented [1] but partially commented-out [2] in the codebase?
Would you be open to accepting a patch / pull request if I restored this "extension" to proper working order?
Thanks for your consideration.
[1] https://github.com/erlang/otp/blob/maint/lib/parsetools/src/leex.erl#L138
[2] https://github.com/erlang/otp/blob/maint/lib/parsetools/src/leex.erl#L692-L700
kenneth
said:
Changed type from Bug to New feature since we don't think this is a bug. The leex documentation does not mention anything about support for repetition quantifiers.
rvirding
said:
A major problem with this is the \{M\} pair is already in use for macro expansion. How would this be handled in that case? Doing a \ \{ \ \} would not work as this is needed now to quote the \{ and \} .
Also it would break consistency if some control characters apply with \ and some don't.
So I am against this suggestion.
kenneth
said:
As Robert Virding is the original developer of leex we follow his advice and reject this suggestion.
Ther reasons are then: 1) Would cause backward compatibility problems,
2) Can't see that it is important enough to motivate 1.
kenneth
said:
See my last comment before this one.
Original reporter:
sunaku
Affected version:OTP-18.1
Component:parsetools
Migrated from: https://bugs.erlang.org/browse/ERL-31