Closed sjoelund closed 5 years ago
I'm definately not a RegExp expert and Automa's regular expressions are less extensive than other regexp engines like Julia's build in one. But, I'm not sure why you're wanting to use \\
- representing the literal \
, twice in a character class / set. Given such set's are for saying "one of these options" (or "none of these" in the case of a negated character class).
Well, ideally you would only use one backslash just like in Julia's and most other RegEx implementations, but this does not work at all:
julia> re"[\\]"
ERROR: LoadError: ArgumentError: invalid escape sequence \]
Stacktrace:
Having the class [xx]
works in the same as [x]
usually so having double backslashes should not hurt in any way at least. If the code is fixed to properly handle the escape using one backslash I think that would resolve the other issues.
One and two backslashes are escaped to the same thing:
julia> re"\\\\"
Automa.RegExp.RE(:char, ['\\'], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
julia> re"\\"
Automa.RegExp.RE(:char, ['\\'], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
And escaping hexadecimal also only works if you use double backslash:
julia> re"[\x5c]"
Automa.RegExp.RE(:class, Any[0x5d:0x5d], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
julia> re"[\\x5c]"
Automa.RegExp.RE(:class, Any[0x5c:0x5c], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
Yeah, the regular expression parser of Automa.jl should more carefully handle backslash characters. I think it has many bugs and needs an overhaul. I will take a look soon.
Thanks. Looks pretty good from what I remember when looking at the code last :)
re"[x]"
givesAutoma.RegExp.RE(:class, Any[0x78:0x78], ...
So you would assumere"[^\\\\]
would give:Automa.RegExp.RE(:class, Any[0x5c:0x5c], ...
wheras it does giveAutoma.RegExp.RE(:class, Any[0x5c:0x5c, 0x5d:0x5d], ...
(]
seems to also be included)For the actual expression I try to parse
re"'([^\\\\']|([\\\\].))+'"
:Expression that seems to work is
re"'([^\\x5c']|(\\x5c.))+'"
: