Closed gfredericks closed 4 years ago
I think the answer to \N
is:
(Character/codePointOf "WHITE SMILING FACE")
(Character/codePointOf "some nonexistent nonsense")
... which only exists in JDK9+. If you need it to work below that, there's CharacterName/getCodePoint
but that appears to be a package-scoped class.
I don't think the \N
construct is a valid regex pre-JDK9 -- my goal with this functionality is to correctly parse/interpret things according to re-pattern
's behavior -- i.e., parsing and interpreting relative to the jvm you're running on.
There's already one or two variable features for things that differ between 7 and 8. I just did all this work prior to 9.
Probably don't need to support 7 anymore (since clojure doesn't, I don't think?), so some of that variability can be removed.
and yes, Character/codePointOf
looks like exactly what we'd need, thanks for looking that up
(I'm planning on digging into this in early July if nobody else gets to it first)
Just pushed fixes for both of these. \X
is parsed but unsupported, \c\Q0
does the correct (insane) thing, and \N{...}
is fully supported. Additionally, large code-points are now supported with \x
and \u
literals.
\X
and\N{WHITE SMILING FACE}
;\X
can probably be parsed-but-not-supported (unless the definition turns out to be super easy to implement), and the other one might be an easy lookup on theCharacter
class or something, we'll see