Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

utf8-bracket support #11271

Closed p5pRT closed 2 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#89032 (status was 'open')

Searchable as RT89032$

p5pRT commented 13 years ago

From perl-diddler@tlinx.org

Created by perl-diddler@tlinx.org

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square & curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

It's a shame actually\, that someone confused less-than and greater-than with angle brackets (*cough*)\, since real angle brackets are not likely to be confused with any perl operator and are unlikely to be included in most code.

It wouldn't be to hard\, I woudn't think\, to pair up "Right & Left" "THINGS" from unicode\, given their symetric naming. Would it be reasonable to ask that paired operators/symbols be allowed to be used in a paired manner in Perl?

At the very least\, the manpage (and any other references to the mathematical operators) should be fixed\, since if someplace is going to claim angle-brackets work\, then real angle brackets should be supported\, no? ;-)

Perl Info ``` Flags: category=core severity=low This perlbug was built using Perl 5.10.0 - Fri Jul 30 00:12:10 UTC 2010 It is being executed now by Perl 5.10.0 - Thu Sep 16 16:14:28 UTC 2010. Site configuration information for perl 5.10.0: Configured by abuild at Thu Sep 16 16:14:28 UTC 2010. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.31, archname=x86_64-linux-thread-multi uname='linux build35 2.6.31 #1 smp 2010-01-06 16:07:25 +0100 x86_64 x86_64 x86_64 gnulinux ' config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -DEBUGGING=both -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -Wall -pipe -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe' ccversion='', gccversion='4.4.1 [gcc-4_4-branch revision 150839]', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib64' libpth=/lib64 /usr/lib64 /usr/local/lib64 libs=-lm -ldl -lcrypt -lpthread perllibs=-lm -ldl -lcrypt -lpthread libc=/lib64/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.10.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64' Locally applied patches: @INC for perl 5.10.0: /usr/local/lib/perl/5.8 /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/5.10.0 /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.10.0 /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.10.0 /usr/lib/perl5/vendor_perl . Environment for perl 5.10.0: HOME=/home/law LANG=en_US.UTF-8 LANGUAGE (unset) LC_CTYPE=en_US.UTF-8 LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64 LOGDIR (unset) PATH=.:/sbin:/usr/local/sbin:/usr/lib64/mpi/gcc/openmpi/bin:/home/law/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/usr/sbin PERL5LIB=/usr/local/lib/perl/5.8 PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 13 years ago

From @iabyn

On Wed\, Apr 20\, 2011 at 10​:02​:45PM -0700\, Linda Walsh wrote​:

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square & curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

I think you meant *U+232A* for the right bracket. But having said that\, I agree it still doesn't work under blead​:

$ cat /tmp/p

#!/usr/bin/perl binmode(STDOUT\,'​:utf8'); print "use utf8; \$x = q\x{2329}abc\x{232A}; print qq{x=[\$x]\\n};\n"

$ ./perl /tmp/p > /tmp/pp

$ cat /tmp/pp

use utf8; $x = q〈abc〉; print qq{x=[$x]\n};

$ ./perl /tmp/pp

Can't find string terminator "�" anywhere before EOF at /tmp/pp line 1.

-- The Enterprise successfully ferries an alien VIP from one place to another without serious incident.   -- Things That Never Happen in "Star Trek" #7

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

From @khwilliamson

On 04/20/2011 11​:02 PM\, Linda Walsh (via RT) wrote​:

# New Ticket Created by Linda Walsh # Please include the string​: [perl #89032] # in the subject line of all future correspondence about this issue. #\<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032>

This is a bug report for perl from perl-diddler@​tlinx.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- [Please enter your report here]

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square& curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

It's a shame actually\, that someone confused less-than and greater-than with angle brackets (*cough*)\, since real angle brackets are not likely to be confused with any perl operator and are unlikely to be included in most code.

It wouldn't be to hard\, I woudn't think\, to pair up "Right& Left" "THINGS" from unicode\, given their symetric naming. Would it be reasonable to ask that paired operators/symbols be allowed to be used in a paired manner in Perl?

At the very least\, the manpage (and any other references to the mathematical operators) should be fixed\, since if someplace is going to claim angle-brackets work\, then real angle brackets should be supported\, no? ;-)

If we were to do this\, the criteria should probably be members of the classes Open and Close Punctuation plus the existing GREATER and LESS THAN signs. Here are the 72 opening ones​: 0028 # '(' LEFT PARENTHESIS 005B # '[' LEFT SQUARE BRACKET 007B # '{' LEFT CURLY BRACKET 0F3A # '༺' TIBETAN MARK GUG RTAGS GYON 0F3C # '༼' TIBETAN MARK ANG KHANG GYON 169B # '᚛' OGHAM FEATHER MARK 201A # '‚' SINGLE LOW-9 QUOTATION MARK 201E # '„' DOUBLE LOW-9 QUOTATION MARK 2045 # '⁅' LEFT SQUARE BRACKET WITH QUILL 207D # '⁽' SUPERSCRIPT LEFT PARENTHESIS 208D # '₍' SUBSCRIPT LEFT PARENTHESIS 2329 # '〈' LEFT-POINTING ANGLE BRACKET 2768 # '❨' MEDIUM LEFT PARENTHESIS ORNAMENT 276A # '❪' MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 276C # '❬' MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 276E # '❮' HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT 2770 # '❰' HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 2772 # '❲' LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 2774 # '❴' MEDIUM LEFT CURLY BRACKET ORNAMENT 27C5 # '⟅' LEFT S-SHAPED BAG DELIMITER 27E6 # '⟦' MATHEMATICAL LEFT WHITE SQUARE BRACKET 27E8 # '⟨' MATHEMATICAL LEFT ANGLE BRACKET 27EA # '⟪' MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 27EC # '⟬' MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET 27EE # '⟮' MATHEMATICAL LEFT FLATTENED PARENTHESIS 2983 # '⦃' LEFT WHITE CURLY BRACKET 2985 # '⦅' LEFT WHITE PARENTHESIS 2987 # '⦇' Z NOTATION LEFT IMAGE BRACKET 2989 # '⦉' Z NOTATION LEFT BINDING BRACKET 298B # '⦋' LEFT SQUARE BRACKET WITH UNDERBAR 298D # '⦍' LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 298F # '⦏' LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 2991 # '⦑' LEFT ANGLE BRACKET WITH DOT 2993 # '⦓' LEFT ARC LESS-THAN BRACKET 2995 # '⦕' DOUBLE LEFT ARC GREATER-THAN BRACKET 2997 # '⦗' LEFT BLACK TORTOISE SHELL BRACKET 29D8 # '⧘' LEFT WIGGLY FENCE 29DA # '⧚' LEFT DOUBLE WIGGLY FENCE 29FC # '⧼' LEFT-POINTING CURVED ANGLE BRACKET 2E22 # '⸢' TOP LEFT HALF BRACKET 2E24 # '⸤' BOTTOM LEFT HALF BRACKET 2E26 # '⸦' LEFT SIDEWAYS U BRACKET 2E28 # '⸨' LEFT DOUBLE PARENTHESIS 3008 # '〈' LEFT ANGLE BRACKET 300A # '《' LEFT DOUBLE ANGLE BRACKET 300C # '「' LEFT CORNER BRACKET 300E # '『' LEFT WHITE CORNER BRACKET 3010 # '【' LEFT BLACK LENTICULAR BRACKET 3014 # '〔' LEFT TORTOISE SHELL BRACKET 3016 # '〖' LEFT WHITE LENTICULAR BRACKET 3018 # '〘' LEFT WHITE TORTOISE SHELL BRACKET 301A # '〚' LEFT WHITE SQUARE BRACKET 301D # '〝' REVERSED DOUBLE PRIME QUOTATION MARK FD3E # '﴾' ORNATE LEFT PARENTHESIS FE17 # '︗' PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET FE35 # '︵' PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS FE37 # '︷' PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET FE39 # '︹' PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET FE3B # '︻' PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET FE3D # '︽' PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET FE3F # '︿' PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET FE41 # '﹁' PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET FE43 # '﹃' PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET FE47 # '﹇' PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET FE59 # '﹙' SMALL LEFT PARENTHESIS FE5B # '﹛' SMALL LEFT CURLY BRACKET FE5D # '﹝' SMALL LEFT TORTOISE SHELL BRACKET FF08 # '(' FULLWIDTH LEFT PARENTHESIS FF3B # '[' FULLWIDTH LEFT SQUARE BRACKET FF5B # '{' FULLWIDTH LEFT CURLY BRACKET FF5F # '⦅' FULLWIDTH LEFT WHITE PARENTHESIS FF62 # '「' HALFWIDTH LEFT CORNER BRACKET

p5pRT commented 13 years ago

From @khwilliamson

On 04/21/2011 08​:32 PM\, Karl Williamson wrote​:

On 04/20/2011 11​:02 PM\, Linda Walsh (via RT) wrote​:

# New Ticket Created by Linda Walsh # Please include the string​: [perl #89032] # in the subject line of all future correspondence about this issue. #\<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032>

This is a bug report for perl from perl-diddler@​tlinx.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- [Please enter your report here]

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square& curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

It's a shame actually\, that someone confused less-than and greater-than with angle brackets (*cough*)\, since real angle brackets are not likely to be confused with any perl operator and are unlikely to be included in most code.

It wouldn't be to hard\, I woudn't think\, to pair up "Right& Left" "THINGS" from unicode\, given their symetric naming. Would it be reasonable to ask that paired operators/symbols be allowed to be used in a paired manner in Perl?

At the very least\, the manpage (and any other references to the mathematical operators) should be fixed\, since if someplace is going to claim angle-brackets work\, then real angle brackets should be supported\, no? ;-)

If we were to do this\, the criteria should probably be members of the classes Open and Close Punctuation plus the existing GREATER and LESS THAN signs. Here are the 72 opening ones​: 0028 # '(' LEFT PARENTHESIS 005B # '[' LEFT SQUARE BRACKET 007B # '{' LEFT CURLY BRACKET 0F3A # '༺' TIBETAN MARK GUG RTAGS GYON 0F3C # '༼' TIBETAN MARK ANG KHANG GYON 169B # '᚛' OGHAM FEATHER MARK 201A # '‚' SINGLE LOW-9 QUOTATION MARK 201E # '„' DOUBLE LOW-9 QUOTATION MARK 2045 # '⁅' LEFT SQUARE BRACKET WITH QUILL 207D # '⁽' SUPERSCRIPT LEFT PARENTHESIS 208D # '₍' SUBSCRIPT LEFT PARENTHESIS 2329 # '〈' LEFT-POINTING ANGLE BRACKET 2768 # '❨' MEDIUM LEFT PARENTHESIS ORNAMENT 276A # '❪' MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 276C # '❬' MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 276E # '❮' HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT 2770 # '❰' HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 2772 # '❲' LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 2774 # '❴' MEDIUM LEFT CURLY BRACKET ORNAMENT 27C5 # '⟅' LEFT S-SHAPED BAG DELIMITER 27E6 # '⟦' MATHEMATICAL LEFT WHITE SQUARE BRACKET 27E8 # '⟨' MATHEMATICAL LEFT ANGLE BRACKET 27EA # '⟪' MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 27EC # '⟬' MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET 27EE # '⟮' MATHEMATICAL LEFT FLATTENED PARENTHESIS 2983 # '⦃' LEFT WHITE CURLY BRACKET 2985 # '⦅' LEFT WHITE PARENTHESIS 2987 # '⦇' Z NOTATION LEFT IMAGE BRACKET 2989 # '⦉' Z NOTATION LEFT BINDING BRACKET 298B # '⦋' LEFT SQUARE BRACKET WITH UNDERBAR 298D # '⦍' LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 298F # '⦏' LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 2991 # '⦑' LEFT ANGLE BRACKET WITH DOT 2993 # '⦓' LEFT ARC LESS-THAN BRACKET 2995 # '⦕' DOUBLE LEFT ARC GREATER-THAN BRACKET 2997 # '⦗' LEFT BLACK TORTOISE SHELL BRACKET 29D8 # '⧘' LEFT WIGGLY FENCE 29DA # '⧚' LEFT DOUBLE WIGGLY FENCE 29FC # '⧼' LEFT-POINTING CURVED ANGLE BRACKET 2E22 # '⸢' TOP LEFT HALF BRACKET 2E24 # '⸤' BOTTOM LEFT HALF BRACKET 2E26 # '⸦' LEFT SIDEWAYS U BRACKET 2E28 # '⸨' LEFT DOUBLE PARENTHESIS 3008 # '〈' LEFT ANGLE BRACKET 300A # '《' LEFT DOUBLE ANGLE BRACKET 300C # '「' LEFT CORNER BRACKET 300E # '『' LEFT WHITE CORNER BRACKET 3010 # '【' LEFT BLACK LENTICULAR BRACKET 3014 # '〔' LEFT TORTOISE SHELL BRACKET 3016 # '〖' LEFT WHITE LENTICULAR BRACKET 3018 # '〘' LEFT WHITE TORTOISE SHELL BRACKET 301A # '〚' LEFT WHITE SQUARE BRACKET 301D # '〝' REVERSED DOUBLE PRIME QUOTATION MARK FD3E # '﴾' ORNATE LEFT PARENTHESIS FE17 # '︗' PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET FE35 # '︵' PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS FE37 # '︷' PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET FE39 # '︹' PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET FE3B # '︻' PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET FE3D # '︽' PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET FE3F # '︿' PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET FE41 # '﹁' PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET FE43 # '﹃' PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET FE47 # '﹇' PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET FE59 # '﹙' SMALL LEFT PARENTHESIS FE5B # '﹛' SMALL LEFT CURLY BRACKET FE5D # '﹝' SMALL LEFT TORTOISE SHELL BRACKET FF08 # '(' FULLWIDTH LEFT PARENTHESIS FF3B # '[' FULLWIDTH LEFT SQUARE BRACKET FF5B # '{' FULLWIDTH LEFT CURLY BRACKET FF5F # '⦅' FULLWIDTH LEFT WHITE PARENTHESIS FF62 # '「' HALFWIDTH LEFT CORNER BRACKET

But perhaps I should have included the initial and final quotes\, of which there are 12 pairs​: 00AB # '«' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 2018 # '‘' LEFT SINGLE QUOTATION MARK 201B # '‛' SINGLE HIGH-REVERSED-9 QUOTATION MARK 201C # '“' LEFT DOUBLE QUOTATION MARK 201F # '‟' DOUBLE HIGH-REVERSED-9 QUOTATION MARK 2039 # '‹' SINGLE LEFT-POINTING ANGLE QUOTATION MARK 2E02 # '⸂' LEFT SUBSTITUTION BRACKET 2E04 # '⸄' LEFT DOTTED SUBSTITUTION BRACKET 2E09 # '⸉' LEFT TRANSPOSITION BRACKET 2E0C # '⸌' LEFT RAISED OMISSION BRACKET 2E1C # '⸜' LEFT LOW PARAPHRASE BRACKET 2E20 # '⸠' LEFT VERTICAL BAR WITH QUILL

Note that some of the first set have the name QUOTATION\, but aren't considered to be quotes.

p5pRT commented 13 years ago

From tchrist@perl.com

If we were to do this\, the criteria should probably be members of the classes Open and Close Punctuation plus the existing GREATER and LESS THAN signs. Here are the 72 opening ones​:

Except that the pair she had suggested\, LEFT- and RIGHT-POINTING ANGLE QUOTATION MARK\, are Initial and Final Punctuation\, not Open and Close Punctuation. There are a dozen of the Pi kind​:

  « 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK   ‘ 2018 LEFT SINGLE QUOTATION MARK   ‛ 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK   “ 201C LEFT DOUBLE QUOTATION MARK   ‟ 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK   ‹ 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK   ⸂ 2E02 LEFT SUBSTITUTION BRACKET   ⸄ 2E04 LEFT DOTTED SUBSTITUTION BRACKET   ⸉ 2E09 LEFT TRANSPOSITION BRACKET   ⸌ 2E0C LEFT RAISED OMISSION BRACKET   ⸜ 2E1C LEFT LOW PARAPHRASE BRACKET   ⸠ 2E20 LEFT VERTICAL BAR WITH QUILL

Of those\, these four are *not* Bidi Mirrored​:

  ‘ 2018 LEFT SINGLE QUOTATION MARK   ‛ 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK   “ 201C LEFT DOUBLE QUOTATION MARK   ‟ 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK

I do agree that of the BidiM Symbols\, probably only "\<" and ">" should count -- because they already have. I guess you might make some argument to add the things with the same UCA1 values as those​:

  ﹤ FE64 SMALL LESS-THAN SIGN   ﹥ FE65 SMALL GREATER-THAN SIGN   < FF1C FULLWIDTH LESS-THAN SIGN   > FF1E FULLWIDTH GREATER-THAN SIGN

But I dunno. Here are the only BidiM full/halfwidth code points​:

  ( FF08 GC=Ps FULLWIDTH LEFT PARENTHESIS   ) FF09 GC=Pe FULLWIDTH RIGHT PARENTHESIS   < FF1C GC=Sm FULLWIDTH LESS-THAN SIGN   > FF1E GC=Sm FULLWIDTH GREATER-THAN SIGN   [ FF3B GC=Ps FULLWIDTH LEFT SQUARE BRACKET   ] FF3D GC=Pe FULLWIDTH RIGHT SQUARE BRACKET   { FF5B GC=Ps FULLWIDTH LEFT CURLY BRACKET   } FF5D GC=Pe FULLWIDTH RIGHT CURLY BRACKET   ⦅ FF5F GC=Ps FULLWIDTH LEFT WHITE PARENTHESIS   ⦆ FF60 GC=Pe FULLWIDTH RIGHT WHITE PARENTHESIS   「 FF62 GC=Ps HALFWIDTH LEFT CORNER BRACKET   」 FF63 GC=Pe HALFWIDTH RIGHT CORNER BRACKET

I don't know whether you really want to include the verticals​:

  ⸠ 2E20 GC=Pi LEFT VERTICAL BAR WITH QUILL   ︗ FE17 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET   ︵ FE35 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS   ︷ FE37 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET   ︹ FE39 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET   ︻ FE3B GC=Ps PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET   ︽ FE3D GC=Ps PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET   ︿ FE3F GC=Ps PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET   ﹁ FE41 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET   ﹃ FE43 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET   ﹇ FE47 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET

--tom

p5pRT commented 13 years ago

From tchrist@perl.com

But perhaps I should have included the initial and final quotes\, of which there are 12 pairs​:

00AB # '«' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 2018 # '‘' LEFT SINGLE QUOTATION MARK 201B # '‛' SINGLE HIGH-REVERSED-9 QUOTATION MARK 201C # '“' LEFT DOUBLE QUOTATION MARK 201F # '‟' DOUBLE HIGH-REVERSED-9 QUOTATION MARK 2039 # '‹' SINGLE LEFT-POINTING ANGLE QUOTATION MARK 2E02 # '⸂' LEFT SUBSTITUTION BRACKET 2E04 # '⸄' LEFT DOTTED SUBSTITUTION BRACKET 2E09 # '⸉' LEFT TRANSPOSITION BRACKET 2E0C # '⸌' LEFT RAISED OMISSION BRACKET 2E1C # '⸜' LEFT LOW PARAPHRASE BRACKET 2E20 # '⸠' LEFT VERTICAL BAR WITH QUILL

Note that some of the first set have the name QUOTATION\, but aren't considered to be quotes.

I get only two​:

  % unichars -c '\pP' '\P{QMark}' 'NAME =~ /QUOT/'   ❮ 276E GC=Ps HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT   ❯ 276F GC=Pe HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT

And they aren't from the Pi/Pf set.

--tom

p5pRT commented 13 years ago

From perl-diddler@tlinx.org

tchrist1 via RT wrote​:

If we were to do this\, the criteria should probably be members of the classes Open and Close Punctuation plus the existing GREATER and LESS THAN signs. Here are the 72 opening ones​:

Except that the pair she had suggested\, LEFT- and RIGHT-POINTING ANGLE QUOTATION MARK\, are Initial and Final Punctuation\, not Open and Close Punctuation. There are a dozen of the Pi kind​:


  By she are you meaning me?

  I do like the double angle brackets that are called quotation marks\, but my original note was on not U+00AB\, but U+2329 & U232A (correction caught by Dave Mitchel) -- they are called left & right angle brackets\, not quotes.

  Honestly\, when I submitted this\, I thought the easiest thing to do would be to check if something had "LEFT" or "RIGHT" in it's textual description\, then use\, um\, something like Perl\, ( :^) ) to look up a textual description with the opposite word substituted in and if found\, use it as the complimentary character -- basically doing this for 2 characters that have a RIGHT & LEFT. Should also obviate the need for any enumerated table.

  Is there something wrong in that 'simple' approach? It would seem to be the most flexible... (?)

p5pRT commented 13 years ago

From @tux

On Fri\, 22 Apr 2011 00​:30​:56 -0700\, Linda Walsh \perl\-diddler@&#8203;tlinx\.org wrote​:

tchrist1 via RT wrote​:

If we were to do this\, the criteria should probably be members of the classes Open and Close Punctuation plus the existing GREATER and LESS THAN signs. Here are the 72 opening ones​:

Except that the pair she had suggested\, LEFT- and RIGHT-POINTING ANGLE QUOTATION MARK\, are Initial and Final Punctuation\, not Open and Close Punctuation. There are a dozen of the Pi kind​: ---- By she are you meaning me?

I do like the double angle brackets that are called

quotation marks\, but my original note was on not U+00AB\, but U+2329 & U232A (correction caught by Dave Mitchel) -- they are called left & right angle brackets\, not quotes.

Honestly\, when I submitted this\, I thought the easiest thing

to do would be to check if something had "LEFT" or "RIGHT" in it's textual description\,

That would give you more than you ask for​:

LEFT has ± 328 entries\, and RIGHT has ± 331 The LEFT list has been included at the end

The list only gets interesting when the "LEFT" code point has a matching "RIGHT" version. That list is included below. I however think that Karl and Tom are right that the *property* of the code point has to be taken into account for "matching tokens"

000028 ( LEFT PARENTHESIS 000029 ) RIGHT PARENTHESIS 00005b [ LEFT SQUARE BRACKET 00005d ] RIGHT SQUARE BRACKET 00007b { LEFT CURLY BRACKET 00007d } RIGHT CURLY BRACKET 0000ab « AQML_IDX LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 0000bb » AQMR_IDX RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 0002bf ʿ MODIFIER LETTER LEFT HALF RING 0002be ʾ MODIFIER LETTER RIGHT HALF RING 0002c2 ˂ MODIFIER LETTER LEFT ARROWHEAD 0002c3 ˃ MODIFIER LETTER RIGHT ARROWHEAD 0002d3 ˓ MODIFIER LETTER CENTRED LEFT HALF RING 0002d2 ˒ MODIFIER LETTER CENTRED RIGHT HALF RING 0002f1 ˱ MODIFIER LETTER LOW LEFT ARROWHEAD 0002f2 ˲ MODIFIER LETTER LOW RIGHT ARROWHEAD 000318 ̘ COMBINING LEFT TACK BELOW 000319 ̙ COMBINING RIGHT TACK BELOW 00031c ̜ COMBINING LEFT HALF RING BELOW 000339 ̹ COMBINING RIGHT HALF RING BELOW 000351 ͑ COMBINING LEFT HALF RING ABOVE 000357 ͗ COMBINING RIGHT HALF RING ABOVE 000354 ͔ COMBINING LEFT ARROWHEAD BELOW 000355 ͕ COMBINING RIGHT ARROWHEAD BELOW 000706 ܆ SYRIAC COLON SKEWED LEFT 000707 ܇ SYRIAC COLON SKEWED RIGHT 000fd6 ࿖ LEFT-FACING SVASTI SIGN 000fd5 ࿕ RIGHT-FACING SVASTI SIGN 000fd8 ࿘ LEFT-FACING SVASTI SIGN WITH DOTS 000fd7 ࿗ RIGHT-FACING SVASTI SIGN WITH DOTS 001dfe ᷾ COMBINING LEFT ARROWHEAD ABOVE 000350 ͐ COMBINING RIGHT ARROWHEAD ABOVE 002018 ‘ LEFT SINGLE QUOTATION MARK 002019 ’ RIGHT SINGLE QUOTATION MARK 00201c “ LEFT DOUBLE QUOTATION MARK 00201d ” RIGHT DOUBLE QUOTATION MARK 002039 ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK 00203a › SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 002045 ⁅ LEFT SQUARE BRACKET WITH QUILL 002046 ⁆ RIGHT SQUARE BRACKET WITH QUILL 00207d ⁽ SUPERSCRIPT LEFT PARENTHESIS 00207e ⁾ SUPERSCRIPT RIGHT PARENTHESIS 00208d ₍ SUBSCRIPT LEFT PARENTHESIS 00208e ₎ SUBSCRIPT RIGHT PARENTHESIS 0020d0 ⃐ COMBINING LEFT HARPOON ABOVE 0020d1 ⃑ COMBINING RIGHT HARPOON ABOVE 0020d6 ⃖ COMBINING LEFT ARROW ABOVE 0020d7 ⃗ COMBINING RIGHT ARROW ABOVE 0020ee ⃮ COMBINING LEFT ARROW BELOW 0020ef ⃯ COMBINING RIGHT ARROW BELOW 0022a3 ⊣ LEFT TACK 0022a2 ⊢ RIGHT TACK 0022c9 ⋉ LEFT NORMAL FACTOR SEMIDIRECT PRODUCT 0022ca ⋊ RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT 0022cb ⋋ LEFT SEMIDIRECT PRODUCT 0022cc ⋌ RIGHT SEMIDIRECT PRODUCT 002308 ⌈ LEFT CEILING 002309 ⌉ RIGHT CEILING 00230a ⌊ LEFT FLOOR 00230b ⌋ RIGHT FLOOR 00230d ⌍ BOTTOM LEFT CROP 00230c ⌌ BOTTOM RIGHT CROP 00230f ⌏ TOP LEFT CROP 00230e ⌎ TOP RIGHT CROP 00231c ⌜ TOP LEFT CORNER 00231d ⌝ TOP RIGHT CORNER 00231e ⌞ BOTTOM LEFT CORNER 00231f ⌟ BOTTOM RIGHT CORNER 002329 〈 LEFT-POINTING ANGLE BRACKET 00232a 〉 RIGHT-POINTING ANGLE BRACKET 00232b ⌫ ERASE TO THE LEFT 002326 ⌦ ERASE TO THE RIGHT 00239b ⎛ LEFT PARENTHESIS UPPER HOOK 00239e ⎞ RIGHT PARENTHESIS UPPER HOOK 00239c ⎜ LEFT PARENTHESIS EXTENSION 00239f ⎟ RIGHT PARENTHESIS EXTENSION 00239d ⎝ LEFT PARENTHESIS LOWER HOOK 0023a0 ⎠ RIGHT PARENTHESIS LOWER HOOK 0023a1 ⎡ LEFT SQUARE BRACKET UPPER CORNER 0023a4 ⎤ RIGHT SQUARE BRACKET UPPER CORNER 0023a2 ⎢ LEFT SQUARE BRACKET EXTENSION 0023a5 ⎥ RIGHT SQUARE BRACKET EXTENSION 0023a3 ⎣ LEFT SQUARE BRACKET LOWER CORNER 0023a6 ⎦ RIGHT SQUARE BRACKET LOWER CORNER 0023a7 ⎧ LEFT CURLY BRACKET UPPER HOOK 0023ab ⎫ RIGHT CURLY BRACKET UPPER HOOK 0023a8 ⎨ LEFT CURLY BRACKET MIDDLE PIECE 0023ac ⎬ RIGHT CURLY BRACKET MIDDLE PIECE 0023a9 ⎩ LEFT CURLY BRACKET LOWER HOOK 0023ad ⎭ RIGHT CURLY BRACKET LOWER HOOK 0023b8 ⎸ LEFT VERTICAL BOX LINE 0023b9 ⎹ RIGHT VERTICAL BOX LINE 0023cb ⏋ DENTISTRY SYMBOL LIGHT VERTICAL AND TOP LEFT 0023be ⎾ DENTISTRY SYMBOL LIGHT VERTICAL AND TOP RIGHT 0023cc ⏌ DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM LEFT 0023bf ⎿ DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM RIGHT 002510 ┐ BOX DRAWINGS LIGHT DOWN AND LEFT 00250c ┌ BOX DRAWINGS LIGHT DOWN AND RIGHT 002511 ┑ BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY 00250d ┍ BOX DRAWINGS DOWN LIGHT AND RIGHT HEAVY 002512 ┒ BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT 00250e ┎ BOX DRAWINGS DOWN HEAVY AND RIGHT LIGHT 002513 ┓ BOX DRAWINGS HEAVY DOWN AND LEFT 00250f ┏ BOX DRAWINGS HEAVY DOWN AND RIGHT 002518 ┘ BOX DRAWINGS LIGHT UP AND LEFT 002514 └ BOX DRAWINGS LIGHT UP AND RIGHT 002519 ┙ BOX DRAWINGS UP LIGHT AND LEFT HEAVY 002515 ┕ BOX DRAWINGS UP LIGHT AND RIGHT HEAVY 00251a ┚ BOX DRAWINGS UP HEAVY AND LEFT LIGHT 002516 ┖ BOX DRAWINGS UP HEAVY AND RIGHT LIGHT 00251b ┛ BOX DRAWINGS HEAVY UP AND LEFT 002517 ┗ BOX DRAWINGS HEAVY UP AND RIGHT 002524 ┤ BOX DRAWINGS LIGHT VERTICAL AND LEFT 00251c ├ BOX DRAWINGS LIGHT VERTICAL AND RIGHT 002525 ┥ BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY 00251d ┝ BOX DRAWINGS VERTICAL LIGHT AND RIGHT HEAVY 002526 ┦ BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT 00251e ┞ BOX DRAWINGS UP HEAVY AND RIGHT DOWN LIGHT 002527 ┧ BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT 00251f ┟ BOX DRAWINGS DOWN HEAVY AND RIGHT UP LIGHT 002528 ┨ BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT 002520 ┠ BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT 002529 ┩ BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY 002521 ┡ BOX DRAWINGS DOWN LIGHT AND RIGHT UP HEAVY 00252a ┪ BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY 002522 ┢ BOX DRAWINGS UP LIGHT AND RIGHT DOWN HEAVY 00252b ┫ BOX DRAWINGS HEAVY VERTICAL AND LEFT 002523 ┣ BOX DRAWINGS HEAVY VERTICAL AND RIGHT 002555 ╕ BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE 002552 ╒ BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE 002556 ╖ BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE 002553 ╓ BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE 002557 ╗ BOX DRAWINGS DOUBLE DOWN AND LEFT 002554 ╔ BOX DRAWINGS DOUBLE DOWN AND RIGHT 00255b ╛ BOX DRAWINGS UP SINGLE AND LEFT DOUBLE 002558 ╘ BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE 00255c ╜ BOX DRAWINGS UP DOUBLE AND LEFT SINGLE 002559 ╙ BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE 00255d ╝ BOX DRAWINGS DOUBLE UP AND LEFT 00255a ╚ BOX DRAWINGS DOUBLE UP AND RIGHT 002561 ╡ BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE 00255e ╞ BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE 002562 ╢ BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE 00255f ╟ BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE 002563 ╣ BOX DRAWINGS DOUBLE VERTICAL AND LEFT 002560 ╠ BOX DRAWINGS DOUBLE VERTICAL AND RIGHT 00256e ╮ BOX DRAWINGS LIGHT ARC DOWN AND LEFT 00256d ╭ BOX DRAWINGS LIGHT ARC DOWN AND RIGHT 00256f ╯ BOX DRAWINGS LIGHT ARC UP AND LEFT 002570 ╰ BOX DRAWINGS LIGHT ARC UP AND RIGHT 002574 ╴ BOX DRAWINGS LIGHT LEFT 002576 ╶ BOX DRAWINGS LIGHT RIGHT 002578 ╸ BOX DRAWINGS HEAVY LEFT 00257a ╺ BOX DRAWINGS HEAVY RIGHT 00258c ▌ LEFT HALF BLOCK 002590 ▐ RIGHT HALF BLOCK 00258f ▏ LEFT ONE EIGHTH BLOCK 002595 ▕ RIGHT ONE EIGHTH BLOCK 002596 ▖ QUADRANT LOWER LEFT 002597 ▗ QUADRANT LOWER RIGHT 002598 ▘ QUADRANT UPPER LEFT 00259d ▝ QUADRANT UPPER RIGHT 002599 ▙ QUADRANT UPPER LEFT AND LOWER LEFT AND LOWER RIGHT 00259f ▟ QUADRANT UPPER RIGHT AND LOWER LEFT AND LOWER RIGHT 0025c0 ◀ BLACK LEFT-POINTING TRIANGLE 0025b6 ▶ BLACK RIGHT-POINTING TRIANGLE 0025c1 ◁ WHITE LEFT-POINTING TRIANGLE 0025b7 ▷ WHITE RIGHT-POINTING TRIANGLE 0025c2 ◂ BLACK LEFT-POINTING SMALL TRIANGLE 0025b8 ▸ BLACK RIGHT-POINTING SMALL TRIANGLE 0025c3 ◃ WHITE LEFT-POINTING SMALL TRIANGLE 0025b9 ▹ WHITE RIGHT-POINTING SMALL TRIANGLE 0025c4 ◄ BLACK LEFT-POINTING POINTER 0025ba ► BLACK RIGHT-POINTING POINTER 0025c5 ◅ WHITE LEFT-POINTING POINTER 0025bb ▻ WHITE RIGHT-POINTING POINTER 0025d0 ◐ CIRCLE WITH LEFT HALF BLACK 0025d1 ◑ CIRCLE WITH RIGHT HALF BLACK 0025d6 ◖ LEFT HALF BLACK CIRCLE 0025d7 ◗ RIGHT HALF BLACK CIRCLE 0025dc ◜ UPPER LEFT QUADRANT CIRCULAR ARC 0025dd ◝ UPPER RIGHT QUADRANT CIRCULAR ARC 0025df ◟ LOWER LEFT QUADRANT CIRCULAR ARC 0025de ◞ LOWER RIGHT QUADRANT CIRCULAR ARC 0025e3 ◣ BLACK LOWER LEFT TRIANGLE 0025e2 ◢ BLACK LOWER RIGHT TRIANGLE 0025e4 ◤ BLACK UPPER LEFT TRIANGLE 0025e5 ◥ BLACK UPPER RIGHT TRIANGLE 0025e7 ◧ SQUARE WITH LEFT HALF BLACK 0025e8 ◨ SQUARE WITH RIGHT HALF BLACK 0025e9 ◩ SQUARE WITH UPPER LEFT DIAGONAL HALF BLACK 002b14 ⬔ SQUARE WITH UPPER RIGHT DIAGONAL HALF BLACK 0025ed ◭ UP-POINTING TRIANGLE WITH LEFT HALF BLACK 0025ee ◮ UP-POINTING TRIANGLE WITH RIGHT HALF BLACK 0025f0 ◰ WHITE SQUARE WITH UPPER LEFT QUADRANT 0025f3 ◳ WHITE SQUARE WITH UPPER RIGHT QUADRANT 0025f1 ◱ WHITE SQUARE WITH LOWER LEFT QUADRANT 0025f2 ◲ WHITE SQUARE WITH LOWER RIGHT QUADRANT 0025f4 ◴ WHITE CIRCLE WITH UPPER LEFT QUADRANT 0025f7 ◷ WHITE CIRCLE WITH UPPER RIGHT QUADRANT 0025f5 ◵ WHITE CIRCLE WITH LOWER LEFT QUADRANT 0025f6 ◶ WHITE CIRCLE WITH LOWER RIGHT QUADRANT 0025f8 ◸ UPPER LEFT TRIANGLE 0025f9 ◹ UPPER RIGHT TRIANGLE 0025fa ◺ LOWER LEFT TRIANGLE 0025ff ◿ LOWER RIGHT TRIANGLE 00261a ☚ BLACK LEFT POINTING INDEX 00261b ☛ BLACK RIGHT POINTING INDEX 00261c ☜ WHITE LEFT POINTING INDEX 00261e ☞ WHITE RIGHT POINTING INDEX 00269f ⚟ THREE LINES CONVERGING LEFT 00269e ⚞ THREE LINES CONVERGING RIGHT 002768 ❨ MEDIUM LEFT PARENTHESIS ORNAMENT 002769 ❩ MEDIUM RIGHT PARENTHESIS ORNAMENT 00276a ❪ MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 00276b ❫ MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT 00276c ❬ MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 00276d ❭ MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT 00276e ❮ HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT 00276f ❯ HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT 002770 ❰ HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 002771 ❱ HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT 002772 ❲ LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 002773 ❳ LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT 002774 ❴ MEDIUM LEFT CURLY BRACKET ORNAMENT 002775 ❵ MEDIUM RIGHT CURLY BRACKET ORNAMENT 0027aa ➪ LEFT-SHADED WHITE RIGHTWARDS ARROW 0027a9 ➩ RIGHT-SHADED WHITE RIGHTWARDS ARROW 0027c5 ⟅ LEFT S-SHAPED BAG DELIMITER 0027c6 ⟆ RIGHT S-SHAPED BAG DELIMITER 0027d5 ⟕ LEFT OUTER JOIN 0027d6 ⟖ RIGHT OUTER JOIN 0027de ⟞ LONG LEFT TACK 0027dd ⟝ LONG RIGHT TACK 0027e6 ⟦ MATHEMATICAL LEFT WHITE SQUARE BRACKET 0027e7 ⟧ MATHEMATICAL RIGHT WHITE SQUARE BRACKET 0027e8 ⟨ MATHEMATICAL LEFT ANGLE BRACKET 0027e9 ⟩ MATHEMATICAL RIGHT ANGLE BRACKET 0027ea ⟪ MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 0027eb ⟫ MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET 0027ec ⟬ MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET 0027ed ⟭ MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET 0027ee ⟮ MATHEMATICAL LEFT FLATTENED PARENTHESIS 0027ef ⟯ MATHEMATICAL RIGHT FLATTENED PARENTHESIS 00294c ⥌ UP BARB RIGHT DOWN BARB LEFT HARPOON 00294f ⥏ UP BARB RIGHT DOWN BARB RIGHT HARPOON 00294d ⥍ UP BARB LEFT DOWN BARB RIGHT HARPOON 00294f ⥏ UP BARB RIGHT DOWN BARB RIGHT HARPOON 002951 ⥑ UP BARB LEFT DOWN BARB LEFT HARPOON 00294c ⥌ UP BARB RIGHT DOWN BARB LEFT HARPOON 002958 ⥘ UPWARDS HARPOON WITH BARB LEFT TO BAR 002954 ⥔ UPWARDS HARPOON WITH BARB RIGHT TO BAR 002959 ⥙ DOWNWARDS HARPOON WITH BARB LEFT TO BAR 002955 ⥕ DOWNWARDS HARPOON WITH BARB RIGHT TO BAR 002960 ⥠ UPWARDS HARPOON WITH BARB LEFT FROM BAR 00295c ⥜ UPWARDS HARPOON WITH BARB RIGHT FROM BAR 002961 ⥡ DOWNWARDS HARPOON WITH BARB LEFT FROM BAR 00295d ⥝ DOWNWARDS HARPOON WITH BARB RIGHT FROM BAR 00297c ⥼ LEFT FISH TAIL 00297d ⥽ RIGHT FISH TAIL 002983 ⦃ LEFT WHITE CURLY BRACKET 002984 ⦄ RIGHT WHITE CURLY BRACKET 002985 ⦅ LEFT WHITE PARENTHESIS 002986 ⦆ RIGHT WHITE PARENTHESIS 002987 ⦇ Z NOTATION LEFT IMAGE BRACKET 002988 ⦈ Z NOTATION RIGHT IMAGE BRACKET 002989 ⦉ Z NOTATION LEFT BINDING BRACKET 00298a ⦊ Z NOTATION RIGHT BINDING BRACKET 00298b ⦋ LEFT SQUARE BRACKET WITH UNDERBAR 00298c ⦌ RIGHT SQUARE BRACKET WITH UNDERBAR 00298d ⦍ LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 002990 ⦐ RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER 00298f ⦏ LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 00298e ⦎ RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 002991 ⦑ LEFT ANGLE BRACKET WITH DOT 002992 ⦒ RIGHT ANGLE BRACKET WITH DOT 002997 ⦗ LEFT BLACK TORTOISE SHELL BRACKET 002998 ⦘ RIGHT BLACK TORTOISE SHELL BRACKET 0029a9 ⦩ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND LEFT 0029a8 ⦨ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND RIGHT 0029ab ⦫ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND LEFT 0029aa ⦪ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND RIGHT 0029ad ⦭ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND UP 0029ac ⦬ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND UP 0029af ⦯ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND DOWN 0029ae ⦮ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND DOWN 0029b4 ⦴ EMPTY SET WITH LEFT ARROW ABOVE 0029b3 ⦳ EMPTY SET WITH RIGHT ARROW ABOVE 0029d1 ⧑ BOWTIE WITH LEFT HALF BLACK 0029d2 ⧒ BOWTIE WITH RIGHT HALF BLACK 0029d4 ⧔ TIMES WITH LEFT HALF BLACK 0029d5 ⧕ TIMES WITH RIGHT HALF BLACK 0029d8 ⧘ LEFT WIGGLY FENCE 0029d9 ⧙ RIGHT WIGGLY FENCE 0029da ⧚ LEFT DOUBLE WIGGLY FENCE 0029db ⧛ RIGHT DOUBLE WIGGLY FENCE 0029e8 ⧨ DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK 0029e9 ⧩ DOWN-POINTING TRIANGLE WITH RIGHT HALF BLACK 0029fc ⧼ LEFT-POINTING CURVED ANGLE BRACKET 0029fd ⧽ RIGHT-POINTING CURVED ANGLE BRACKET 002a2d ⨭ PLUS SIGN IN LEFT HALF CIRCLE 002a2e ⨮ PLUS SIGN IN RIGHT HALF CIRCLE 002a34 ⨴ MULTIPLICATION SIGN IN LEFT HALF CIRCLE 002a35 ⨵ MULTIPLICATION SIGN IN RIGHT HALF CIRCLE 002acd ⫍ SQUARE LEFT OPEN BOX OPERATOR 002ace ⫎ SQUARE RIGHT OPEN BOX OPERATOR 002ae5 ⫥ DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE 0022ab ⊫ DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE 002b15 ⬕ SQUARE WITH LOWER LEFT DIAGONAL HALF BLACK 0025ea ◪ SQUARE WITH LOWER RIGHT DIAGONAL HALF BLACK 002b16 ⬖ DIAMOND WITH LEFT HALF BLACK 002b17 ⬗ DIAMOND WITH RIGHT HALF BLACK 002b30 ⬰ LEFT ARROW WITH SMALL CIRCLE 0021f4 ⇴ RIGHT ARROW WITH SMALL CIRCLE 002b32 ⬲ LEFT ARROW WITH CIRCLED PLUS 0027f4 ⟴ RIGHT ARROW WITH CIRCLED PLUS 002b3f ⬿ WAVE ARROW POINTING DIRECTLY LEFT 002933 ⤳ WAVE ARROW POINTING DIRECTLY RIGHT 002e02 ⸂ LEFT SUBSTITUTION BRACKET 002e03 ⸃ RIGHT SUBSTITUTION BRACKET 002e04 ⸄ LEFT DOTTED SUBSTITUTION BRACKET 002e05 ⸅ RIGHT DOTTED SUBSTITUTION BRACKET 002e09 ⸉ LEFT TRANSPOSITION BRACKET 002e0a ⸊ RIGHT TRANSPOSITION BRACKET 002e0c ⸌ LEFT RAISED OMISSION BRACKET 002e0d ⸍ RIGHT RAISED OMISSION BRACKET 002e1c ⸜ LEFT LOW PARAPHRASE BRACKET 002e1d ⸝ RIGHT LOW PARAPHRASE BRACKET 002e20 ⸠ LEFT VERTICAL BAR WITH QUILL 002e21 ⸡ RIGHT VERTICAL BAR WITH QUILL 002e22 ⸢ TOP LEFT HALF BRACKET 002e23 ⸣ TOP RIGHT HALF BRACKET 002e24 ⸤ BOTTOM LEFT HALF BRACKET 002e25 ⸥ BOTTOM RIGHT HALF BRACKET 002e26 ⸦ LEFT SIDEWAYS U BRACKET 002e27 ⸧ RIGHT SIDEWAYS U BRACKET 002e28 ⸨ LEFT DOUBLE PARENTHESIS 002e29 ⸩ RIGHT DOUBLE PARENTHESIS 002ff8 ⿸ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT 002ff9 ⿹ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT 003008 〈 LEFT ANGLE BRACKET 003009 〉 RIGHT ANGLE BRACKET 00300a 《 LEFT DOUBLE ANGLE BRACKET 00300b 》 RIGHT DOUBLE ANGLE BRACKET 00300c 「 LEFT CORNER BRACKET 00300d 」 RIGHT CORNER BRACKET 00300e 『 LEFT WHITE CORNER BRACKET 00300f 』 RIGHT WHITE CORNER BRACKET 003010 【 LEFT BLACK LENTICULAR BRACKET 003011 】 RIGHT BLACK LENTICULAR BRACKET 003014 〔 LEFT TORTOISE SHELL BRACKET 003015 〕 RIGHT TORTOISE SHELL BRACKET 003016 〖 LEFT WHITE LENTICULAR BRACKET 003017 〗 RIGHT WHITE LENTICULAR BRACKET 003018 〘 LEFT WHITE TORTOISE SHELL BRACKET 003019 〙 RIGHT WHITE TORTOISE SHELL BRACKET 00301a 〚 LEFT WHITE SQUARE BRACKET 00301b 〛 RIGHT WHITE SQUARE BRACKET 0032a7 ㊧ CIRCLED IDEOGRAPH LEFT 0032a8 ㊨ CIRCLED IDEOGRAPH RIGHT 00a9c1 ꧁ JAVANESE LEFT RERENGGAN 00a9c2 ꧂ JAVANESE RIGHT RERENGGAN 00fd3e ﴾ ORNATE LEFT PARENTHESIS 00fd3f ﴿ ORNATE RIGHT PARENTHESIS 00fe17 ︗ PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET 00fe18 ︘ PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET 00fe20 ︠ COMBINING LIGATURE LEFT HALF 00fe21 ︡ COMBINING LIGATURE RIGHT HALF 00fe22 ︢ COMBINING DOUBLE TILDE LEFT HALF 00fe23 ︣ COMBINING DOUBLE TILDE RIGHT HALF 00fe24 ︤ COMBINING MACRON LEFT HALF 00fe25 ︥ COMBINING MACRON RIGHT HALF 00fe35 ︵ PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS 00fe36 ︶ PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS 00fe37 ︷ PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET 00fe38 ︸ PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET 00fe39 ︹ PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET 00fe3a ︺ PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET 00fe3b ︻ PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET 00fe3c ︼ PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET 00fe3d ︽ PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET 00fe3e ︾ PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET 00fe3f ︿ PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET 00fe40 ﹀ PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET 00fe41 ﹁ PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET 00fe42 ﹂ PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET 00fe43 ﹃ PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET 00fe44 ﹄ PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET 00fe47 ﹇ PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET 00fe48 ﹈ PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET 00fe59 ﹙ SMALL LEFT PARENTHESIS 00fe5a ﹚ SMALL RIGHT PARENTHESIS 00fe5b ﹛ SMALL LEFT CURLY BRACKET 00fe5c ﹜ SMALL RIGHT CURLY BRACKET 00fe5d ﹝ SMALL LEFT TORTOISE SHELL BRACKET 00fe5e ﹞ SMALL RIGHT TORTOISE SHELL BRACKET 00ff08 ( FULLWIDTH LEFT PARENTHESIS 00ff09 ) FULLWIDTH RIGHT PARENTHESIS 00ff3b [ FULLWIDTH LEFT SQUARE BRACKET 00ff3d ] FULLWIDTH RIGHT SQUARE BRACKET 00ff5b { FULLWIDTH LEFT CURLY BRACKET 00ff5d } FULLWIDTH RIGHT CURLY BRACKET 00ff5f ⦅ FULLWIDTH LEFT WHITE PARENTHESIS 00ff60 ⦆ FULLWIDTH RIGHT WHITE PARENTHESIS 00ff62 「 HALFWIDTH LEFT CORNER BRACKET 00ff63 」 HALFWIDTH RIGHT CORNER BRACKET 01d106 � MUSICAL SYMBOL LEFT REPEAT SIGN 01d107 � MUSICAL SYMBOL RIGHT REPEAT SIGN 01d14a � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT WHITE 01d14c � MUSICAL SYMBOL TRIANGLE NOTEHEAD RIGHT WHITE 01d14b � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT BLACK 01d14d � MUSICAL SYMBOL TRIANGLE NOTEHEAD RIGHT BLACK 0e0028 � TAG LEFT PARENTHESIS 0e0029 � TAG RIGHT PARENTHESIS 0e005b � TAG LEFT SQUARE BRACKET 0e005d � TAG RIGHT SQUARE BRACKET 0e007b � TAG LEFT CURLY BRACKET 0e007d � TAG RIGHT CURLY BRACKET

then use\, um\, something like Perl\, ( :^) ) to look up a textual description with the opposite word substituted in and if found\, use it as the complimentary character -- basically doing this for 2 characters that have a RIGHT & LEFT. Should also obviate the need for any enumerated table.

Is there something wrong in that 'simple' approach?  It would seem

to be the most flexible... (?)

000028 ( LEFT PARENTHESIS 00005b [ LEFT SQUARE BRACKET 00007b { LEFT CURLY BRACKET 0000ab « AQML_IDX LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 00019d Ɲ LATIN CAPITAL LETTER N WITH LEFT HOOK 000272 ɲ LATIN SMALL LETTER N WITH LEFT HOOK 0002bf ʿ MODIFIER LETTER LEFT HALF RING 0002c2 ˂ MODIFIER LETTER LEFT ARROWHEAD 0002d3 ˓ MODIFIER LETTER CENTRED LEFT HALF RING 0002f1 ˱ MODIFIER LETTER LOW LEFT ARROWHEAD 0002ff ˿ MODIFIER LETTER LOW LEFT ARROW 000318 ̘ COMBINING LEFT TACK BELOW 00031a ̚ COMBINING LEFT ANGLE ABOVE 00031c ̜ COMBINING LEFT HALF RING BELOW 000349 ͉ COMBINING LEFT ANGLE BELOW 00034d ͍ COMBINING LEFT RIGHT ARROW BELOW 000351 ͑ COMBINING LEFT HALF RING ABOVE 000354 ͔ COMBINING LEFT ARROWHEAD BELOW 000559 ՙ ARMENIAN MODIFIER LETTER LEFT HALF RING 000706 ܆ SYRIAC COLON SKEWED LEFT 000708 ܈ SYRIAC SUPRALINEAR COLON SKEWED LEFT 000fd6 ࿖ LEFT-FACING SVASTI SIGN 000fd8 ࿘ LEFT-FACING SVASTI SIGN WITH DOTS 001b78 ᭸ BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PANG 001b79 ᭹ BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PUNG 001b7a ᭺ BALINESE MUSICAL SYMBOL LEFT-HAND CLOSED PLAK 001b7b ᭻ BALINESE MUSICAL SYMBOL LEFT-HAND CLOSED PLUK 001b7c ᭼ BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PING 001dae ᶮ MODIFIER LETTER SMALL N WITH LEFT HOOK 001dfe ᷾ COMBINING LEFT ARROWHEAD ABOVE 00200e ‎ LEFT-TO-RIGHT MARK 00200f ‏ RIGHT-TO-LEFT MARK 002018 ‘ LEFT SINGLE QUOTATION MARK 00201c “ LEFT DOUBLE QUOTATION MARK 00202a ‪ LEFT-TO-RIGHT EMBEDDING 00202b ‫ RIGHT-TO-LEFT EMBEDDING 00202d ‭ LEFT-TO-RIGHT OVERRIDE 00202e ‮ RIGHT-TO-LEFT OVERRIDE 002039 ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK 002045 ⁅ LEFT SQUARE BRACKET WITH QUILL 00207d ⁽ SUPERSCRIPT LEFT PARENTHESIS 00208d ₍ SUBSCRIPT LEFT PARENTHESIS 0020d0 ⃐ COMBINING LEFT HARPOON ABOVE 0020d6 ⃖ COMBINING LEFT ARROW ABOVE 0020e1 ⃡ COMBINING LEFT RIGHT ARROW ABOVE 0020ee ⃮ COMBINING LEFT ARROW BELOW 002194 ↔ LEFT RIGHT ARROW 0021ad ↭ LEFT RIGHT WAVE ARROW 0021ae ↮ LEFT RIGHT ARROW WITH STROKE 0021ce ⇎ LEFT RIGHT DOUBLE ARROW WITH STROKE 0021d4 ⇔ LEFT RIGHT DOUBLE ARROW 0021f9 ⇹ LEFT RIGHT ARROW WITH VERTICAL STROKE 0021fc ⇼ LEFT RIGHT ARROW WITH DOUBLE VERTICAL STROKE 0021ff ⇿ LEFT RIGHT OPEN-HEADED ARROW 0022a3 ⊣ LEFT TACK 0022c9 ⋉ LEFT NORMAL FACTOR SEMIDIRECT PRODUCT 0022cb ⋋ LEFT SEMIDIRECT PRODUCT 002308 ⌈ LEFT CEILING 00230a ⌊ LEFT FLOOR 00230d ⌍ BOTTOM LEFT CROP 00230f ⌏ TOP LEFT CROP 00231c ⌜ TOP LEFT CORNER 00231e ⌞ BOTTOM LEFT CORNER 002329 〈 LEFT-POINTING ANGLE BRACKET 00232b ⌫ ERASE TO THE LEFT 002367 ⍧ APL FUNCTIONAL SYMBOL LEFT SHOE STILE 00239b ⎛ LEFT PARENTHESIS UPPER HOOK 00239c ⎜ LEFT PARENTHESIS EXTENSION 00239d ⎝ LEFT PARENTHESIS LOWER HOOK 0023a1 ⎡ LEFT SQUARE BRACKET UPPER CORNER 0023a2 ⎢ LEFT SQUARE BRACKET EXTENSION 0023a3 ⎣ LEFT SQUARE BRACKET LOWER CORNER 0023a7 ⎧ LEFT CURLY BRACKET UPPER HOOK 0023a8 ⎨ LEFT CURLY BRACKET MIDDLE PIECE 0023a9 ⎩ LEFT CURLY BRACKET LOWER HOOK 0023b0 ⎰ UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION 0023b1 ⎱ UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION 0023b8 ⎸ LEFT VERTICAL BOX LINE 0023cb ⏋ DENTISTRY SYMBOL LIGHT VERTICAL AND TOP LEFT 0023cc ⏌ DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM LEFT 002510 ┐ BOX DRAWINGS LIGHT DOWN AND LEFT 002511 ┑ BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY 002512 ┒ BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT 002513 ┓ BOX DRAWINGS HEAVY DOWN AND LEFT 002518 ┘ BOX DRAWINGS LIGHT UP AND LEFT 002519 ┙ BOX DRAWINGS UP LIGHT AND LEFT HEAVY 00251a ┚ BOX DRAWINGS UP HEAVY AND LEFT LIGHT 00251b ┛ BOX DRAWINGS HEAVY UP AND LEFT 002524 ┤ BOX DRAWINGS LIGHT VERTICAL AND LEFT 002525 ┥ BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY 002526 ┦ BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT 002527 ┧ BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT 002528 ┨ BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT 002529 ┩ BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY 00252a ┪ BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY 00252b ┫ BOX DRAWINGS HEAVY VERTICAL AND LEFT 00252d ┭ BOX DRAWINGS LEFT HEAVY AND RIGHT DOWN LIGHT 00252e ┮ BOX DRAWINGS RIGHT HEAVY AND LEFT DOWN LIGHT 002531 ┱ BOX DRAWINGS RIGHT LIGHT AND LEFT DOWN HEAVY 002532 ┲ BOX DRAWINGS LEFT LIGHT AND RIGHT DOWN HEAVY 002535 ┵ BOX DRAWINGS LEFT HEAVY AND RIGHT UP LIGHT 002536 ┶ BOX DRAWINGS RIGHT HEAVY AND LEFT UP LIGHT 002539 ┹ BOX DRAWINGS RIGHT LIGHT AND LEFT UP HEAVY 00253a ┺ BOX DRAWINGS LEFT LIGHT AND RIGHT UP HEAVY 00253d ┽ BOX DRAWINGS LEFT HEAVY AND RIGHT VERTICAL LIGHT 00253e ┾ BOX DRAWINGS RIGHT HEAVY AND LEFT VERTICAL LIGHT 002543 ╃ BOX DRAWINGS LEFT UP HEAVY AND RIGHT DOWN LIGHT 002544 ╄ BOX DRAWINGS RIGHT UP HEAVY AND LEFT DOWN LIGHT 002545 ╅ BOX DRAWINGS LEFT DOWN HEAVY AND RIGHT UP LIGHT 002546 ╆ BOX DRAWINGS RIGHT DOWN HEAVY AND LEFT UP LIGHT 002549 ╉ BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY 00254a ╊ BOX DRAWINGS LEFT LIGHT AND RIGHT VERTICAL HEAVY 002555 ╕ BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE 002556 ╖ BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE 002557 ╗ BOX DRAWINGS DOUBLE DOWN AND LEFT 00255b ╛ BOX DRAWINGS UP SINGLE AND LEFT DOUBLE 00255c ╜ BOX DRAWINGS UP DOUBLE AND LEFT SINGLE 00255d ╝ BOX DRAWINGS DOUBLE UP AND LEFT 002561 ╡ BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE 002562 ╢ BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE 002563 ╣ BOX DRAWINGS DOUBLE VERTICAL AND LEFT 00256e ╮ BOX DRAWINGS LIGHT ARC DOWN AND LEFT 00256f ╯ BOX DRAWINGS LIGHT ARC UP AND LEFT 002571 ╱ BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT 002572 ╲ BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT 002574 ╴ BOX DRAWINGS LIGHT LEFT 002578 ╸ BOX DRAWINGS HEAVY LEFT 00257c ╼ BOX DRAWINGS LIGHT LEFT AND HEAVY RIGHT 00257e ╾ BOX DRAWINGS HEAVY LEFT AND LIGHT RIGHT 002589 ▉ LEFT SEVEN EIGHTHS BLOCK 00258a ▊ LEFT THREE QUARTERS BLOCK 00258b ▋ LEFT FIVE EIGHTHS BLOCK 00258c ▌ LEFT HALF BLOCK 00258d ▍ LEFT THREE EIGHTHS BLOCK 00258e ▎ LEFT ONE QUARTER BLOCK 00258f ▏ LEFT ONE EIGHTH BLOCK 002596 ▖ QUADRANT LOWER LEFT 002598 ▘ QUADRANT UPPER LEFT 002599 ▙ QUADRANT UPPER LEFT AND LOWER LEFT AND LOWER RIGHT 00259a ▚ QUADRANT UPPER LEFT AND LOWER RIGHT 00259b ▛ QUADRANT UPPER LEFT AND UPPER RIGHT AND LOWER LEFT 00259c ▜ QUADRANT UPPER LEFT AND UPPER RIGHT AND LOWER RIGHT 00259e ▞ QUADRANT UPPER RIGHT AND LOWER LEFT 00259f ▟ QUADRANT UPPER RIGHT AND LOWER LEFT AND LOWER RIGHT 0025a7 ▧ SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL 0025a8 ▨ SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL 0025c0 ◀ BLACK LEFT-POINTING TRIANGLE 0025c1 ◁ WHITE LEFT-POINTING TRIANGLE 0025c2 ◂ BLACK LEFT-POINTING SMALL TRIANGLE 0025c3 ◃ WHITE LEFT-POINTING SMALL TRIANGLE 0025c4 ◄ BLACK LEFT-POINTING POINTER 0025c5 ◅ WHITE LEFT-POINTING POINTER 0025d0 ◐ CIRCLE WITH LEFT HALF BLACK 0025d5 ◕ CIRCLE WITH ALL BUT UPPER LEFT QUADRANT BLACK 0025d6 ◖ LEFT HALF BLACK CIRCLE 0025dc ◜ UPPER LEFT QUADRANT CIRCULAR ARC 0025df ◟ LOWER LEFT QUADRANT CIRCULAR ARC 0025e3 ◣ BLACK LOWER LEFT TRIANGLE 0025e4 ◤ BLACK UPPER LEFT TRIANGLE 0025e7 ◧ SQUARE WITH LEFT HALF BLACK 0025e9 ◩ SQUARE WITH UPPER LEFT DIAGONAL HALF BLACK 0025ed ◭ UP-POINTING TRIANGLE WITH LEFT HALF BLACK 0025f0 ◰ WHITE SQUARE WITH UPPER LEFT QUADRANT 0025f1 ◱ WHITE SQUARE WITH LOWER LEFT QUADRANT 0025f4 ◴ WHITE CIRCLE WITH UPPER LEFT QUADRANT 0025f5 ◵ WHITE CIRCLE WITH LOWER LEFT QUADRANT 0025f8 ◸ UPPER LEFT TRIANGLE 0025fa ◺ LOWER LEFT TRIANGLE 00261a ☚ BLACK LEFT POINTING INDEX 00261c ☜ WHITE LEFT POINTING INDEX 00269f ⚟ THREE LINES CONVERGING LEFT 0026d5 ⛕ ALTERNATE ONE-WAY LEFT WAY TRAFFIC 0026d6 ⛖ BLACK TWO-WAY LEFT WAY TRAFFIC 0026d7 ⛗ WHITE TWO-WAY LEFT WAY TRAFFIC 0026d8 ⛘ BLACK LEFT LANE MERGE 0026d9 ⛙ WHITE LEFT LANE MERGE 0026dc ⛜ LEFT CLOSED ENTRY 0026e0 ⛠ RESTRICTED LEFT ENTRY-1 0026e1 ⛡ RESTRICTED LEFT ENTRY-2 002768 ❨ MEDIUM LEFT PARENTHESIS ORNAMENT 00276a ❪ MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 00276c ❬ MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 00276e ❮ HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT 002770 ❰ HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 002772 ❲ LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 002774 ❴ MEDIUM LEFT CURLY BRACKET ORNAMENT 0027aa ➪ LEFT-SHADED WHITE RIGHTWARDS ARROW 0027c5 ⟅ LEFT S-SHAPED BAG DELIMITER 0027d4 ⟔ UPPER LEFT CORNER WITH DOT 0027d5 ⟕ LEFT OUTER JOIN 0027da ⟚ LEFT AND RIGHT DOUBLE TURNSTILE 0027db ⟛ LEFT AND RIGHT TACK 0027dc ⟜ LEFT MULTIMAP 0027de ⟞ LONG LEFT TACK 0027e6 ⟦ MATHEMATICAL LEFT WHITE SQUARE BRACKET 0027e8 ⟨ MATHEMATICAL LEFT ANGLE BRACKET 0027ea ⟪ MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 0027ec ⟬ MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET 0027ee ⟮ MATHEMATICAL LEFT FLATTENED PARENTHESIS 0027f7 ⟷ LONG LEFT RIGHT ARROW 0027fa ⟺ LONG LEFT RIGHT DOUBLE ARROW 002904 ⤄ LEFT RIGHT DOUBLE ARROW WITH VERTICAL STROKE 002939 ⤹ LEFT-SIDE ARC ANTICLOCKWISE ARROW 00293f ⤿ LOWER LEFT SEMICIRCULAR ANTICLOCKWISE ARROW 002948 ⥈ LEFT RIGHT ARROW THROUGH SMALL CIRCLE 00294a ⥊ LEFT BARB UP RIGHT BARB DOWN HARPOON 00294b ⥋ LEFT BARB DOWN RIGHT BARB UP HARPOON 00294c ⥌ UP BARB RIGHT DOWN BARB LEFT HARPOON 00294d ⥍ UP BARB LEFT DOWN BARB RIGHT HARPOON 00294e ⥎ LEFT BARB UP RIGHT BARB UP HARPOON 002950 ⥐ LEFT BARB DOWN RIGHT BARB DOWN HARPOON 002951 ⥑ UP BARB LEFT DOWN BARB LEFT HARPOON 002958 ⥘ UPWARDS HARPOON WITH BARB LEFT TO BAR 002959 ⥙ DOWNWARDS HARPOON WITH BARB LEFT TO BAR 002960 ⥠ UPWARDS HARPOON WITH BARB LEFT FROM BAR 002961 ⥡ DOWNWARDS HARPOON WITH BARB LEFT FROM BAR 002963 ⥣ UPWARDS HARPOON WITH BARB LEFT BESIDE UPWARDS HARPOON WITH BARB RIGHT 002965 ⥥ DOWNWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH BARB RIGHT 00296e ⥮ UPWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH BARB RIGHT 00296f ⥯ DOWNWARDS HARPOON WITH BARB LEFT BESIDE UPWARDS HARPOON WITH BARB RIGHT 00297c ⥼ LEFT FISH TAIL 002983 ⦃ LEFT WHITE CURLY BRACKET 002985 ⦅ LEFT WHITE PARENTHESIS 002987 ⦇ Z NOTATION LEFT IMAGE BRACKET 002989 ⦉ Z NOTATION LEFT BINDING BRACKET 00298b ⦋ LEFT SQUARE BRACKET WITH UNDERBAR 00298d ⦍ LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 00298f ⦏ LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 002991 ⦑ LEFT ANGLE BRACKET WITH DOT 002993 ⦓ LEFT ARC LESS-THAN BRACKET 002995 ⦕ DOUBLE LEFT ARC GREATER-THAN BRACKET 002997 ⦗ LEFT BLACK TORTOISE SHELL BRACKET 00299b ⦛ MEASURED ANGLE OPENING LEFT 0029a0 ⦠ SPHERICAL ANGLE OPENING LEFT 0029a9 ⦩ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND LEFT 0029ab ⦫ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND LEFT 0029ad ⦭ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND UP 0029af ⦯ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND DOWN 0029b4 ⦴ EMPTY SET WITH LEFT ARROW ABOVE 0029ce ⧎ RIGHT TRIANGLE ABOVE LEFT TRIANGLE 0029cf ⧏ LEFT TRIANGLE BESIDE VERTICAL BAR 0029d1 ⧑ BOWTIE WITH LEFT HALF BLACK 0029d4 ⧔ TIMES WITH LEFT HALF BLACK 0029d8 ⧘ LEFT WIGGLY FENCE 0029da ⧚ LEFT DOUBLE WIGGLY FENCE 0029e8 ⧨ DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK 0029fc ⧼ LEFT-POINTING CURVED ANGLE BRACKET 002a1e ⨞ LARGE LEFT TRIANGLE OPERATOR 002a2d ⨭ PLUS SIGN IN LEFT HALF CIRCLE 002a34 ⨴ MULTIPLICATION SIGN IN LEFT HALF CIRCLE 002a84 ⪄ GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE LEFT 002acd ⫍ SQUARE LEFT OPEN BOX OPERATOR 002ade ⫞ SHORT LEFT TACK 002ae3 ⫣ DOUBLE VERTICAL BAR LEFT TURNSTILE 002ae4 ⫤ VERTICAL BAR DOUBLE LEFT TURNSTILE 002ae5 ⫥ DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE 002ae6 ⫦ LONG DASH FROM LEFT MEMBER OF DOUBLE VERTICAL 002b04 ⬄ LEFT RIGHT WHITE ARROW 002b0c ⬌ LEFT RIGHT BLACK ARROW 002b15 ⬕ SQUARE WITH LOWER LEFT DIAGONAL HALF BLACK 002b16 ⬖ DIAMOND WITH LEFT HALF BLACK 002b30 ⬰ LEFT ARROW WITH SMALL CIRCLE 002b32 ⬲ LEFT ARROW WITH CIRCLED PLUS 002b3f ⬿ WAVE ARROW POINTING DIRECTLY LEFT 002e02 ⸂ LEFT SUBSTITUTION BRACKET 002e04 ⸄ LEFT DOTTED SUBSTITUTION BRACKET 002e09 ⸉ LEFT TRANSPOSITION BRACKET 002e0c ⸌ LEFT RAISED OMISSION BRACKET 002e1c ⸜ LEFT LOW PARAPHRASE BRACKET 002e20 ⸠ LEFT VERTICAL BAR WITH QUILL 002e22 ⸢ TOP LEFT HALF BRACKET 002e24 ⸤ BOTTOM LEFT HALF BRACKET 002e26 ⸦ LEFT SIDEWAYS U BRACKET 002e28 ⸨ LEFT DOUBLE PARENTHESIS 002ff0 ⿰ IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT 002ff2 ⿲ IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT 002ff7 ⿷ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT 002ff8 ⿸ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT 002ffa ⿺ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT 003008 〈 LEFT ANGLE BRACKET 00300a 《 LEFT DOUBLE ANGLE BRACKET 00300c 「 LEFT CORNER BRACKET 00300e 『 LEFT WHITE CORNER BRACKET 003010 【 LEFT BLACK LENTICULAR BRACKET 003014 〔 LEFT TORTOISE SHELL BRACKET 003016 〖 LEFT WHITE LENTICULAR BRACKET 003018 〘 LEFT WHITE TORTOISE SHELL BRACKET 00301a 〚 LEFT WHITE SQUARE BRACKET 0032a7 ㊧ CIRCLED IDEOGRAPH LEFT 00a70d ꜍ MODIFIER LETTER EXTRA-HIGH DOTTED LEFT-STEM TONE BAR 00a70e ꜎ MODIFIER LETTER HIGH DOTTED LEFT-STEM TONE BAR 00a70f ꜏ MODIFIER LETTER MID DOTTED LEFT-STEM TONE BAR 00a710 ꜐ MODIFIER LETTER LOW DOTTED LEFT-STEM TONE BAR 00a711 ꜑ MODIFIER LETTER EXTRA-LOW DOTTED LEFT-STEM TONE BAR 00a712 ꜒ MODIFIER LETTER EXTRA-HIGH LEFT-STEM TONE BAR 00a713 ꜓ MODIFIER LETTER HIGH LEFT-STEM TONE BAR 00a714 ꜔ MODIFIER LETTER MID LEFT-STEM TONE BAR 00a715 ꜕ MODIFIER LETTER LOW LEFT-STEM TONE BAR 00a716 ꜖ MODIFIER LETTER EXTRA-LOW LEFT-STEM TONE BAR 00a9c1 ꧁ JAVANESE LEFT RERENGGAN 00fd3e ﴾ ORNATE LEFT PARENTHESIS 00fe17 ︗ PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET 00fe20 ︠ COMBINING LIGATURE LEFT HALF 00fe22 ︢ COMBINING DOUBLE TILDE LEFT HALF 00fe24 ︤ COMBINING MACRON LEFT HALF 00fe35 ︵ PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS 00fe37 ︷ PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET 00fe39 ︹ PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET 00fe3b ︻ PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET 00fe3d ︽ PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET 00fe3f ︿ PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET 00fe41 ﹁ PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET 00fe43 ﹃ PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET 00fe47 ﹇ PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET 00fe59 ﹙ SMALL LEFT PARENTHESIS 00fe5b ﹛ SMALL LEFT CURLY BRACKET 00fe5d ﹝ SMALL LEFT TORTOISE SHELL BRACKET 00ff08 ( FULLWIDTH LEFT PARENTHESIS 00ff3b [ FULLWIDTH LEFT SQUARE BRACKET 00ff5b { FULLWIDTH LEFT CURLY BRACKET 00ff5f ⦅ FULLWIDTH LEFT WHITE PARENTHESIS 00ff62 「 HALFWIDTH LEFT CORNER BRACKET 01d106 � MUSICAL SYMBOL LEFT REPEAT SIGN 01d14a � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT WHITE 01d14b � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT BLACK 0e0028 � TAG LEFT PARENTHESIS 0e005b � TAG LEFT SQUARE BRACKET 0e007b � TAG LEFT CURLY BRACKET

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 13 years ago

From jwkrahn@shaw.ca

Linda Walsh wrote​:

# New Ticket Created by Linda Walsh # Please include the string​: [perl #89032] # in the subject line of all future correspondence about this issue. #\<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032>

This is a bug report for perl from perl-diddler@​tlinx.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- [Please enter your report here]

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square& curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

Have you thought about using a here-document to quote your code​:

my $code = \<\<CODE_BLOCK;

# your code here my \$var = 30;

CODE_BLOCK

And if you use single quotes it won't be interpolated​:

my $code = \<\<'CODE_BLOCK';

# your code here my $var = 30;

CODE_BLOCK

John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein

p5pRT commented 13 years ago

From @abigail

On Wed\, Apr 20\, 2011 at 10​:02​:45PM -0700\, Linda Walsh wrote​:

# New Ticket Created by Linda Walsh # Please include the string​: [perl #89032] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032 >

This is a bug report for perl from perl-diddler@​tlinx.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- [Please enter your report here]

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square & curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

If in your block of code\, your '{}'\, '[]' or '()' are balanced\, you can use them as delimiters.

And if your code uses "real angle brackets"\, it fails to work.

Abigail

p5pRT commented 13 years ago

From @khwilliamson

Body was too large for import. Click here for the attachment in RT

p5pRT commented 13 years ago

From tchrist@perl.com

The last part of the list above displayed on my system written right to left\, until I added a second LEFT-TO-RIGHT OVERRIDE character to get it back on track. That shows some of the perils of blindly using the words of the names to decide if the delimiter is part of a pair or not.

The list doesn't include "LEFT BAGGAGE"\, for some reason\, where LEFT probably should have been "UNCLAIMED". Not a lot of thought has gone into the Unicode names\, and so they can be ambiguous.

That's putting it mildly. Another area where I find things have been left too much to chance is in the primary collation strengths\, where there is no rhyme nor reason about whether something counts as the same letter or not.

The list probably should be a subset of those characters that have a mirrored glyph. These are​:

Is there an easier way to to pull those out of BidiMirroring.txt than doing it by hand?

--tom

p5pRT commented 13 years ago

From @abigail

On Fri\, Apr 22\, 2011 at 09​:16​:47AM -0600\, Karl Williamson wrote​:

The list probably should be a subset of those characters that have a
mirrored glyph. These are​: 0028 0029 # '(' => ')'; LEFT PARENTHESIS => RIGHT PARENTHESIS 0029 0028 # ')' => '('; RIGHT PARENTHESIS => LEFT PARENTHESIS 003C 003E # '\<' => '>'; LESS-THAN SIGN => GREATER-THAN SIGN 003E 003C # '>' => '\<'; GREATER-THAN SIGN => LESS-THAN SIGN 005B 005D # '[' => ']'; LEFT SQUARE BRACKET => RIGHT SQUARE BRACKET 005D 005B # ']' => '['; RIGHT SQUARE BRACKET => LEFT SQUARE BRACKET 007B 007D # '{' => '}'; LEFT CURLY BRACKET => RIGHT CURLY BRACKET 007D 007B # '}' => '{'; RIGHT CURLY BRACKET => LEFT CURLY BRACKET

No 'd' => 'b' or 'p' => 'q' ? ;-)

And then there are '|' => '|'\, '!' => '!'\, and other symmetric glyphs - but we already can use them as "mirrorred" delimiters. They don't nest though.

(I'd pick d/b and p/q over any of the non-ASCII mirrored glyphs; my terminal/font show most of them fine (as long as I stay away from MacOS); they're just too hard to enter)

I sometimes wish that Perl would do delimiter as POD does. So one could write​:

  say qq\<\<\< a > b >>>; Print "a > b"

Abigail

p5pRT commented 13 years ago

From perl-diddler@tlinx.org

karl williamson via RT wrote​:

0e005b � TAG LEFT SQUARE BRACKET 0e007b � TAG LEFT CURLY BRACKET


The last part of the list above displayed on my system written right to left\, until I added a second LEFT-TO-RIGHT OVERRIDE character to get it

  Something broken with your system?

  They don't change RtL semantics anywhere I used them.

Where are you seeing this behavior?

The list doesn't include "LEFT BAGGAGE"\, for some reason\, where LEFT probably should have been "UNCLAIMED". Not a lot of thought has gone into the Unicode names\, and so they can be ambiguous.


  Unless there is a "RIGHT BAGGAGE" to match it up with\, I wouldn't worry.

p5pRT commented 13 years ago

From @khwilliamson

On 04/22/2011 10​:10 AM\, Tom Christiansen wrote​:

The list probably should be a subset of those characters that have a mirrored glyph. These are​:

Is there an easier way to to pull those out of BidiMirroring.txt than doing it by hand?

--tom

It is somewhat easier to use lib/unicore/To/Bmg.pl The list I used was from a version of that file that had been compiled with mktables -annotate

p5pRT commented 13 years ago

From @khwilliamson

On 04/22/2011 10​:37 AM\, Linda Walsh wrote​:

karl williamson via RT wrote​:

0e005b � TAG LEFT SQUARE BRACKET 0e007b � TAG LEFT CURLY BRACKET ----

The last part of the list above displayed on my system written right to left\, until I added a second LEFT-TO-RIGHT OVERRIDE character to get it

Something broken with your system?

They don't change RtL semantics anywhere I used them.

Where are you seeing this behavior?

On the email I received from H. Merijn Brand\, the RIGHT-TO-LEFT OVERRIDE character in it caused the remainder of the email to be displayed mirrored. That seems to me to be the correct behavior\, and so I don't think my system is broken.

The list doesn't include "LEFT BAGGAGE"\, for some reason\, where LEFT probably should have been "UNCLAIMED". Not a lot of thought has gone into the Unicode names\, and so they can be ambiguous. ---- Unless there is a "RIGHT BAGGAGE" to match it up with\, I wouldn't worry.

p5pRT commented 13 years ago

From perl-diddler@tlinx.org

karl williamson via RT wrote​:

On 04/22/2011 10​:37 AM\, Linda Walsh wrote​:

karl williamson via RT wrote​:

0e005b � TAG LEFT SQUARE BRACKET 0e007b � TAG LEFT CURLY BRACKET ----

The last part of the list above displayed on my system written right to left\, until I added a second LEFT-TO-RIGHT OVERRIDE character to get it

On the email I received from H. Merijn Brand\, the RIGHT-TO-LEFT OVERRIDE character in it caused the remainder of the email to be displayed mirrored. That seems to me to be the correct behavior\, and so I don't think my system is broken.


  What email program do you use?

  FF doesn't display that behavior ...

  Oh\, you mean after U+200E/U+200F

I thought you meant inherent the characters for the 2nd part of the list.

  I'd rule out those characters because they contain both the 'RIGHT+LEFT' keywords in the description. So a 'dumb' algorithm looking at it wouldn't know if it was meant to be a right or a left side of a pair.

  Alot of these objections are trivial details that would be worked out in coding it up.

  What I gave was a general concept -- not a tested algorithm. Be reasonable. If you need me to design the whole algorithm\, it's not something I'm going to do off the top of my head.

p5pRT commented 13 years ago

From @obra

On Wed 20.Apr'11 at 22​:02​:45 -0700\, Linda Walsh wrote​:

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square & curly (according to the perlop manpage). One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

I'd be curious to know if the Perl 6 community can offer us any useful advice here. I believe that @​Larry went for such a solution. How happy are they having done it?

-Jesse

p5pRT commented 13 years ago

From vadim.konovalov@alcatel-lucent.com

From​: Jesse Vincent On Wed 20.Apr'11 at 22​:02​:45 -0700\, Linda Walsh wrote​:

I was trying to quote a block of code. Thing is\, to do that\, you have to choose a delimiter that's not in the code. I wanted to use a "paired" operator like some sort of bracket -- but it seems that perl ignores "left & right" *anything*\, unless it is a one of 4 "bracket types"​: round\, angle\, square & curly (according to the perlop manpage).
One problem though\, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

I'd be curious to know if the Perl 6 community can offer us any useful advice here. I believe that @​Larry went for such a solution.
How happy are they having done it?

STD.pm and STD.pm6 (http​://cpansearch.perl.org/src/SOREAR/STD-20101111/lib/STD.pm and https://github.com/perl6/std/blob/master/STD.pm6)

have our %open2close = ( "\x{0028}" => "\x{0029}"\, "\x{003C}" => "\x{003E}"\, "\x{005B}" => "\x{005D}"\, "\x{007B}" => "\x{007D}"\, "\x{00AB}" => "\x{00BB}"\, "\x{0F3A}" => "\x{0F3B}"\, "\x{0F3C}" => "\x{0F3D}"\, "\x{169B}" => "\x{169C}"\, "\x{2018}" => "\x{2019}"\, "\x{201A}" => "\x{2019}"\, "\x{201B}" => "\x{2019}"\, "\x{201C}" => "\x{201D}"\, "\x{201E}" => "\x{201D}"\, "\x{201F}" => "\x{201D}"\, "\x{2039}" => "\x{203A}"\, "\x{2045}" => "\x{2046}"\, "\x{207D}" => "\x{207E}"\, "\x{208D}" => "\x{208E}"\, "\x{2208}" => "\x{220B}"\, "\x{2209}" => "\x{220C}"\, "\x{220A}" => "\x{220D}"\, "\x{2215}" => "\x{29F5}"\, "\x{223C}" => "\x{223D}"\, "\x{2243}" => "\x{22CD}"\, "\x{2252}" => "\x{2253}"\, "\x{2254}" => "\x{2255}"\, "\x{2264}" => "\x{2265}"\, "\x{2266}" => "\x{2267}"\, "\x{2268}" => "\x{2269}"\, "\x{226A}" => "\x{226B}"\, "\x{226E}" => "\x{226F}"\, "\x{2270}" => "\x{2271}"\, "\x{2272}" => "\x{2273}"\, "\x{2274}" => "\x{2275}"\, "\x{2276}" => "\x{2277}"\, "\x{2278}" => "\x{2279}"\, "\x{227A}" => "\x{227B}"\, "\x{227C}" => "\x{227D}"\, "\x{227E}" => "\x{227F}"\, "\x{2280}" => "\x{2281}"\, "\x{2282}" => "\x{2283}"\, "\x{2284}" => "\x{2285}"\, "\x{2286}" => "\x{2287}"\, "\x{2288}" => "\x{2289}"\, "\x{228A}" => "\x{228B}"\, "\x{228F}" => "\x{2290}"\, "\x{2291}" => "\x{2292}"\, "\x{2298}" => "\x{29B8}"\, "\x{22A2}" => "\x{22A3}"\, "\x{22A6}" => "\x{2ADE}"\, "\x{22A8}" => "\x{2AE4}"\, "\x{22A9}" => "\x{2AE3}"\, "\x{22AB}" => "\x{2AE5}"\, "\x{22B0}" => "\x{22B1}"\, "\x{22B2}" => "\x{22B3}"\, "\x{22B4}" => "\x{22B5}"\, "\x{22B6}" => "\x{22B7}"\, "\x{22C9}" => "\x{22CA}"\, "\x{22CB}" => "\x{22CC}"\, "\x{22D0}" => "\x{22D1}"\, "\x{22D6}" => "\x{22D7}"\, "\x{22D8}" => "\x{22D9}"\, "\x{22DA}" => "\x{22DB}"\, "\x{22DC}" => "\x{22DD}"\, "\x{22DE}" => "\x{22DF}"\, "\x{22E0}" => "\x{22E1}"\, "\x{22E2}" => "\x{22E3}"\, "\x{22E4}" => "\x{22E5}"\, "\x{22E6}" => "\x{22E7}"\, "\x{22E8}" => "\x{22E9}"\, "\x{22EA}" => "\x{22EB}"\, "\x{22EC}" => "\x{22ED}"\, "\x{22F0}" => "\x{22F1}"\, "\x{22F2}" => "\x{22FA}"\, "\x{22F3}" => "\x{22FB}"\, "\x{22F4}" => "\x{22FC}"\, "\x{22F6}" => "\x{22FD}"\, "\x{22F7}" => "\x{22FE}"\, "\x{2308}" => "\x{2309}"\, "\x{230A}" => "\x{230B}"\, "\x{2329}" => "\x{232A}"\, "\x{23B4}" => "\x{23B5}"\, "\x{2768}" => "\x{2769}"\, "\x{276A}" => "\x{276B}"\, "\x{276C}" => "\x{276D}"\, "\x{276E}" => "\x{276F}"\, "\x{2770}" => "\x{2771}"\, "\x{2772}" => "\x{2773}"\, "\x{2774}" => "\x{2775}"\, "\x{27C3}" => "\x{27C4}"\, "\x{27C5}" => "\x{27C6}"\, "\x{27D5}" => "\x{27D6}"\, "\x{27DD}" => "\x{27DE}"\, "\x{27E2}" => "\x{27E3}"\, "\x{27E4}" => "\x{27E5}"\, "\x{27E6}" => "\x{27E7}"\, "\x{27E8}" => "\x{27E9}"\, "\x{27EA}" => "\x{27EB}"\, "\x{2983}" => "\x{2984}"\, "\x{2985}" => "\x{2986}"\, "\x{2987}" => "\x{2988}"\, "\x{2989}" => "\x{298A}"\, "\x{298B}" => "\x{298C}"\, "\x{298D}" => "\x{298E}"\, "\x{298F}" => "\x{2990}"\, "\x{2991}" => "\x{2992}"\, "\x{2993}" => "\x{2994}"\, "\x{2995}" => "\x{2996}"\, "\x{2997}" => "\x{2998}"\, "\x{29C0}" => "\x{29C1}"\, "\x{29C4}" => "\x{29C5}"\, "\x{29CF}" => "\x{29D0}"\, "\x{29D1}" => "\x{29D2}"\, "\x{29D4}" => "\x{29D5}"\, "\x{29D8}" => "\x{29D9}"\, "\x{29DA}" => "\x{29DB}"\, "\x{29F8}" => "\x{29F9}"\, "\x{29FC}" => "\x{29FD}"\, "\x{2A2B}" => "\x{2A2C}"\, "\x{2A2D}" => "\x{2A2E}"\, "\x{2A34}" => "\x{2A35}"\, "\x{2A3C}" => "\x{2A3D}"\, "\x{2A64}" => "\x{2A65}"\, "\x{2A79}" => "\x{2A7A}"\, "\x{2A7D}" => "\x{2A7E}"\, "\x{2A7F}" => "\x{2A80}"\, "\x{2A81}" => "\x{2A82}"\, "\x{2A83}" => "\x{2A84}"\, "\x{2A8B}" => "\x{2A8C}"\, "\x{2A91}" => "\x{2A92}"\, "\x{2A93}" => "\x{2A94}"\, "\x{2A95}" => "\x{2A96}"\, "\x{2A97}" => "\x{2A98}"\, "\x{2A99}" => "\x{2A9A}"\, "\x{2A9B}" => "\x{2A9C}"\, "\x{2AA1}" => "\x{2AA2}"\, "\x{2AA6}" => "\x{2AA7}"\, "\x{2AA8}" => "\x{2AA9}"\, "\x{2AAA}" => "\x{2AAB}"\, "\x{2AAC}" => "\x{2AAD}"\, "\x{2AAF}" => "\x{2AB0}"\, "\x{2AB3}" => "\x{2AB4}"\, "\x{2ABB}" => "\x{2ABC}"\, "\x{2ABD}" => "\x{2ABE}"\, "\x{2ABF}" => "\x{2AC0}"\, "\x{2AC1}" => "\x{2AC2}"\, "\x{2AC3}" => "\x{2AC4}"\, "\x{2AC5}" => "\x{2AC6}"\, "\x{2ACD}" => "\x{2ACE}"\, "\x{2ACF}" => "\x{2AD0}"\, "\x{2AD1}" => "\x{2AD2}"\, "\x{2AD3}" => "\x{2AD4}"\, "\x{2AD5}" => "\x{2AD6}"\, "\x{2AEC}" => "\x{2AED}"\, "\x{2AF7}" => "\x{2AF8}"\, "\x{2AF9}" => "\x{2AFA}"\, "\x{2E02}" => "\x{2E03}"\, "\x{2E04}" => "\x{2E05}"\, "\x{2E09}" => "\x{2E0A}"\, "\x{2E0C}" => "\x{2E0D}"\, "\x{2E1C}" => "\x{2E1D}"\, "\x{2E20}" => "\x{2E21}"\, "\x{3008}" => "\x{3009}"\, "\x{300A}" => "\x{300B}"\, "\x{300C}" => "\x{300D}"\, "\x{300E}" => "\x{300F}"\, "\x{3010}" => "\x{3011}"\, "\x{3014}" => "\x{3015}"\, "\x{3016}" => "\x{3017}"\, "\x{3018}" => "\x{3019}"\, "\x{301A}" => "\x{301B}"\, "\x{301D}" => "\x{301E}"\, "\x{FD3E}" => "\x{FD3F}"\, "\x{FE17}" => "\x{FE18}"\, "\x{FE35}" => "\x{FE36}"\, "\x{FE37}" => "\x{FE38}"\, "\x{FE39}" => "\x{FE3A}"\, "\x{FE3B}" => "\x{FE3C}"\, "\x{FE3D}" => "\x{FE3E}"\, "\x{FE3F}" => "\x{FE40}"\, "\x{FE41}" => "\x{FE42}"\, "\x{FE43}" => "\x{FE44}"\, "\x{FE47}" => "\x{FE48}"\, "\x{FE59}" => "\x{FE5A}"\, "\x{FE5B}" => "\x{FE5C}"\, "\x{FE5D}" => "\x{FE5E}"\, "\x{FF08}" => "\x{FF09}"\, "\x{FF1C}" => "\x{FF1E}"\, "\x{FF3B}" => "\x{FF3D}"\, "\x{FF5B}" => "\x{FF5D}"\, "\x{FF5F}" => "\x{FF60}"\, "\x{FF62}" => "\x{FF63}"\, );

This list is useful\, but I think even more complete list have been established in this thread.

I see "\x{2329}" => "\x{232A}"\, rather than \x{2330}\, though.

Regards\, Vadim.

p5pRT commented 12 years ago

From @cpansprout

On Aug 2\, 2011\, at 10​:15 PM\, Brian Fraser wrote​:

Moving on\, PL_multi_(close|open) are chars\, not char*\, so any attempt to implement RT#89032 would have to turn those to something more sensible (an SV\, perhaps? through a simple three-element struct would make do just fine). Should I change it?

Are we sure it’s even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers\, but it has proven to be problematic\, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets\, that will just aggravate the problem.

Also\, it would not be backward-compatible\, as these currently work​:

$ perl -Mutf8 -le 'print q «foo«' foo

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it’s documented\, we can’t easily change it without a deprecation cycle\, can we?

p5pRT commented 12 years ago

From perl-diddler@tlinx.org

Father Chrysostomos via RT wrote​:

On Aug 2\, 2011\, at 10​:15 PM\, Brian Fraser wrote​:

Moving on\, PL_multi_(close|open) are chars\, not char*\, so any attempt to implement RT#89032 would have to turn those to something more sensible (an SV\, perhaps? through a simple three-element struct would make do just fine). Should I change it?

Are we sure it’s even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers\, but it has proven to be problematic\, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets\, that will just aggravate the problem.

Also\, it would not be backward-compatible\, as these currently work​:

$ perl -Mutf8 -le 'print q «foo«' foo

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it’s documented\, we can’t easily change it without a deprecation cycle\, can we?


use unicode_brackets;

p5pRT commented 12 years ago

From @Hugmeir

On 8/7/11\, Father Chrysostomos \sprout@&#8203;cpan\.org wrote​:

Are we sure it’s even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers\, but it has proven to be problematic\, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets\, that will just aggravate the problem.

Also\, it would not be backward-compatible\, as these currently work​:

$ perl -Mutf8 -le 'print q «foo«' foo

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it’s documented\, we can’t easily change it without a deprecation cycle\, can we?

I think this is a valid concern\, but I don't think the decision should be dictated because of the implementation. If adding paired UTF-8 delimiters isn't a good course of action\, then don't add them\, but the tokenizer's (in)ability to handle them should be besides the point.

p5pRT commented 12 years ago

From @nwc10

On Sun\, Aug 07\, 2011 at 03​:05​:38PM -0700\, Linda Walsh wrote​:

Father Chrysostomos via RT wrote​:

On Aug 2\, 2011\, at 10​:15 PM\, Brian Fraser wrote​:

Moving on\, PL_multi_(close|open) are chars\, not char*\, so any attempt to implement RT#89032 would have to turn those to something more sensible (an SV\, perhaps? through a simple three-element struct would make do just fine). Should I change it?

Are we sure it's even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers\, but it has proven to be problematic\, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets\, that will just aggravate the problem.

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it's documented\, we can't easily change it without a deprecation cycle\, can we? ----

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's not a paired delimiter this version might become one next version. And suddenly your program changes meaning underneath you.

Nicholas Clark

p5pRT commented 12 years ago

From @Hugmeir

On Fri\, Aug 12\, 2011 at 6​:46 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's not a paired delimiter this version might become one next version. And suddenly your program changes meaning underneath you.

And in any case\, I think that\, if you want to change the syntax\, you should be doing it explicitly\, ala​:

use charnames qw( :full ); use paired_delimiters "\N{LEFT-POINTING DOUBLE ANGLE QUOTATION MARK}" => "\N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}";

or

use unicode_brackets Unicode => v6;

or somesuch. Though all of this would need a new API -- Might be less wrong to just wait for (someone|Zefram) to add pluggable operators.

p5pRT commented 12 years ago

From zefram@fysh.org

Brian Fraser wrote​:

use paired_delimiters "\N{LEFT-POINTING DOUBLE ANGLE QUOTATION MARK}" => "\N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}";

Ah\, that's reasonably nice.

or somesuch. Though all of this would need a new API -- Might be less wrong to just wait for (someone|Zefram) to add pluggable operators.

I've been thinking about how to handle delimiters in plugged-in syntax. I think it's a no-brainer that syntax plugins ought to have some help in handling Perl's standard forms of delimitation. Syntax plugins shouldn't have to handle delimiter pairing themselves\, they should be able to ask the core to process that part of the syntax. I imagine an API function that will skip whitespace\, read the next character (the opening delimiter)\, and then return the codepoint that will be the closing delimiter. (There's already a function\, commonly used via Devel​::Declare\, that scans an entire delimited string\, but as we've learned with "@​{[]}" this pre-scanning doesn't get along nicely with nesting syntactic constructs.)

So syntax plugins aren't a solution here\, they're another source of demand for a solution. I think the core should have some support for delimiter pairing\, but that could take the form of another hook that lets modules plug in arbitrarily-complicated delimiter behaviour.

-zefram

p5pRT commented 12 years ago

From perl-diddler@tlinx.org

Nicholas Clark via RT wrote​:

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's not a paired delimiter this version might become one next version. And suddenly your program changes meaning underneath you.


  That's not exactly true. It shouldn't happen.

  Characters can't be changed once they are created. They can be 'obviated'\, but never deleted. They just move on to a replacement char in a new code area with the new meaning.

  That 'not deleting anything that has been published'\, rule was required to move forward for concerns exactly like this.

  New delimiters may come\, but they'll come out of what are now\, invalid ranges or unused planes.

p5pRT commented 12 years ago

From @nwc10

On Fri\, Aug 12\, 2011 at 07​:50​:18AM -0700\, Linda Walsh wrote​:

Nicholas Clark via RT wrote​:

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's not a paired delimiter this version might become one next version. And suddenly your program changes meaning underneath you. --- That's not exactly true. It shouldn't happen.

Characters can't be changed once they are created\.  They can be 

'obviated'\, but never deleted. They just move on to a replacement char in a new code area with the new meaning.

Character properties can change. U-00B5 was Greek once. It isn't now. My impression (as an outsider) is that such changes are rare. But they have happened\, so clearly they are not impossible.

So the concern is that if we drive parsing using Unicode properties\, then there will be corner cases where the parsing changes based on which version of Unicode the parser is based on. And *that* is going to be surprising to anyone caught by it.

Nicholas Clark

p5pRT commented 12 years ago

From perl-diddler@tlinx.org

` Nicholas Clark via RT wrote​:

Character properties can change. U-00B5 was Greek once.


  Could you find a better example? As it is still is. Its been the same since the earliest online reference for 2.0 dating back to 1994. I don't think unicode has changed it since its inception in 91.

  Unless you can come up with a more firm example\, I'm only willing to consider this (sorry) to to use the term\, but "FUD"\, as it goes against the the Unicode mission statement. I quote from the book\, Fonts and Encodings (O'Reilly)\, p61​:

  Unwritten principle #11​: permanent stability.

We have taken the liberty of adding an eleventh principle the official Unicode principles\, one that is important and laden with consequences​:   [next two lines italicized in text for emphasis].   *** as soon as a character has been added to the encoding\, that   *** character cannot be removed or altered. The idea is that document encoded in Unicode today should not become unusable in a few years hence\, as is often the case with word-processing software documents (such as those produced with MS Word\, not to name any names). Unlike the ten official principles\, this one is so scrupulous respected that Unicode has come to contain a large number of characters whose use is deprecated by Unicode itself. Even more shocking is that the name of the character 0xD0C5 contains an obvious type (FHTORA instead of FTHORA); [still true in unicode 6.0\, BTW]\, rather than correcting it the Consortium has decided to let it stand and to insert a little note" [... acknowledging it ].


  I'm probably as much an outsider as you (my largest claim to knowledge about fonts is the book I just mentioned - great reference on the topic\, though a bit dated).

  I don't see any evidence to support the type changes you express concern about. Obviously\, the universe could end tomorrow\, and permanence would be severely truncated\, but they appear to be more stable than perl -- and perl is pretty stable (compared to BASH where the maintain cares nothing for previous version compat in multiple areas...it's becoming a nightmare -- I keep holding up perl as an example to follow\, but my protests fall on deaf ears...)...

It isn't now. My impression (as an outsider) is that such changes are rare. But they have happened\, so clearly they are not impossible.


  They may have happened\, but the cited example is not one of those cases.

So the concern is that if we drive parsing using Unicode properties\, then there will be corner cases where the parsing changes based on which version of Unicode the parser is based on. And *that* is going to be surprising to anyone caught by it.


  Hey -- we an always blame it on them!.. ;-)

p5pRT commented 12 years ago

From @nwc10

On Sat\, Aug 13\, 2011 at 02​:02​:32AM -0700\, Linda Walsh wrote​:

` Nicholas Clark via RT wrote​:

Character properties can change. U-00B5 was Greek once. --- Could you find a better example? As it is still is. Its been the same since the earliest online reference for 2.0 dating back to 1994. I don't think unicode has changed it since its inception in 91.

$ ~/Sandpit/583/bin/perl -le '$_ = chr 0xB5; utf8​::upgrade $_; print /\p{isGreek}/ ? "Greek!" : "not :-("' Greek! $ ~/Sandpit/584/bin/perl -le '$_ = chr 0xB5; utf8​::upgrade $_; print /\p{isGreek}/ ? "Greek!" : "not :-("' not :-(

Unless you can come up with a more firm example\, I'm only willing

to consider this (sorry) to to use the term\, but "FUD"\, as it goes against the the Unicode mission statement. I quote from the book\, Fonts and Encodings (O'Reilly)\, p61​:

Unwritten principle \#11&#8203;: permanent stability\.

Not FUD. See above. I don't know *why* Perl's implementation changed\, but it did.

Nicholas Clark

p5pRT commented 12 years ago

From @nwc10

On Sat\, Aug 13\, 2011 at 10​:10​:49AM +0100\, Nicholas Clark wrote​:

$ ~/Sandpit/583/bin/perl -le '$_ = chr 0xB5; utf8​::upgrade $_; print /\p{isGreek}/ ? "Greek!" : "not :-("' Greek! $ ~/Sandpit/584/bin/perl -le '$_ = chr 0xB5; utf8​::upgrade $_; print /\p{isGreek}/ ? "Greek!" : "not :-("' not :-(

Not FUD. See above. I don't know *why* Perl's implementation changed\, but it did.

Because the Unicode consortium changed it between 4.0.0 and 4.0.1​:

$ head perl-5.8.[34]/lib/unicore/version ==> perl-5.8.3/lib/unicore/version \<== 4.0.0

==> perl-5.8.4/lib/unicore/version \<== 4.0.1

$ grep 00B5 perl-5.8.[34]/lib/unicore/Scripts.txt perl-5.8.3/lib/unicore/Scripts.txt​:00B5 ; GREEK # L& MICRO SIGN perl-5.8.4/lib/unicore/Scripts.txt​:00B5 ; Common # L& MICRO SIGN

Nicholas Clark

p5pRT commented 12 years ago

From tchrist@perl.com

Nicholas Clark \nick@&#8203;ccl4\.org wrote   on Sat\, 13 Aug 2011 09​:08​:43 BST​:

Character properties can change.

Yes\, Nick\, that's certainly correct. This is part of the first stability guarnatee in

  http​://unicode.org/policies/stability_policy.html

which reads​:

  Identity Stability

  Applicable Version​: Unicode 1.1+

-> Once a character is encoded\, its properties may still be changed\,   but *not* in such a way as to change the fundamental identity of the   character.

  The Consortium will endeavor to keep the values of the other   properties as stable as possible\, but some circumstances may arise   that require changing them. Particularly in the situation where the   Unicode Standard first encodes less well-documented characters and   scripts\, the exact character properties and behavior initially may   not be well known.

  As more experience is gathered in implementing the characters\,   adjustments in the properties may become necessary. Examples of such   properties include\, but are not limited to\, the following​:

  * General_Category   * Case mappings   * Bidirectional properties   * Compatibility decomposition tags (such as \ or \)   * Representative glyphs

  However\, character properties will *not* be changed in a way that   would affect character identity. For example\, the representative   glyph for U+0061 “A” cannot be changed to “B”; the General_Category   for U+0061 “A” cannot be changed to Ll (lowercase letter); and the   decomposition mapping for U+00C1 (Á) cannot be changed to \<U+0042\,   U+0301> (B\, ´).

  Property Stability

  Applicable Version​: Unicode 5.2+

  Normative and informative properties\, once defined in the Unicode   Character Database\, will never be removed.

  This stability guarantee does not apply to Contributory properties   (such as "Other_Alphabetic") nor to Provisional properties. For a list   of which properties are Normative or Informative\, see UAX #44\, Unicode   Character Database.

  In prior versions of the Unicode Standard\, the only non-provisional   property that has ever been withdrawn from the standard was the   informative property Special_Case_Condition\, which was removed as of   Unicode 5.1.

  This policy does not preclude the deprecation of a Unicode character   property. Such deprecation would not remove the property; it would only   indicate a strong recommendation not to use it.

Beyond that\, there are stability guarantees for property values\, which I do not list here in full but which one should probably read. For us one place where these constraints particularly matter is the stability guarantees for IDS/IDC/XIDS/XIDC\, which hold for Unicode 3.0.1 and above​:

* Once a character is ID_Continue\, it must continue to be so in all future versions. * If a character is ID_Start then it must also be ID_Continue. * Once a character is ID_Start\, it must continue to be so in all future versions. * Once a character is XID_Continue\, it must continue to be so in all future versions. * If a character is XID_Start then it must also be XID_Continue. * Once a character is XID_Start\, it must continue to be so in all future versions.

Notice those are all phrased in the positive\, not the negative. I think that we can therefore guarantee that something will not *lose* IDC due to a new release\, but that some may gain it. New\, previously undefined code points can be added which have IDC\, and as far as I can see\, even a code point that is already defined vut which lacks IDC can (permanently) gain it at some future release.

That said\, and presumably apart from that\, other mistakes do happen. Sometimes they are fixed by errata and corrigenda.

  http​://www.unicode.org/errata/index.html   http​://unicode.org/versions/corrigenda.html

For example http​://www.unicode.org/versions/corrigendum8.html corrects the Bidi class of U+070F from AN to AL. Similarly\, http​://www.unicode.org/versions/corrigendum6.html had several fixes to Bidi mirroring. And things like these fixed bugs​:

  http​://unicode.org/versions/Unicode5.1.0/erratafixed.html   http​://unicode.org/versions/Unicode5.2.0/erratafixed.html

So properties can change.

There do exist oddities in the mismatch between what is and what is not in IDC. For example\, here is a Letter they forgot to add to IDC​:

  % unichars --nopage -gs '\pL' '\P{IDC}'   ⸯ U+2E2F GC=Lm SC=Common VERTICAL TILDE

That's one that lacks IDC and which I think should by virtue of its being a letter deserve to gain it. I think Karl mentioned this to them and they flustered a bit\, perhaps even a little disingenuously but who knows.

Here are non-Word chars that are nonetheless in IDC​:

  % unichars --nopage -gs '\W' '\p{IDC}'   · U+00B7 GC=Po SC=Common MIDDLE DOT   · U+0387 GC=Po SC=Common GREEK ANO TELEIA   ፩ U+1369 GC=No SC=Ethiopic ETHIOPIC DIGIT ONE   ፪ U+136A GC=No SC=Ethiopic ETHIOPIC DIGIT TWO   ፫ U+136B GC=No SC=Ethiopic ETHIOPIC DIGIT THREE   ፬ U+136C GC=No SC=Ethiopic ETHIOPIC DIGIT FOUR   ፭ U+136D GC=No SC=Ethiopic ETHIOPIC DIGIT FIVE   ፮ U+136E GC=No SC=Ethiopic ETHIOPIC DIGIT SIX   ፯ U+136F GC=No SC=Ethiopic ETHIOPIC DIGIT SEVEN   ፰ U+1370 GC=No SC=Ethiopic ETHIOPIC DIGIT EIGHT   ፱ U+1371 GC=No SC=Ethiopic ETHIOPIC DIGIT NINE   ᧚ U+19DA GC=No SC=New_Tai_Lue NEW TAI LUE THAM DIGIT ONE   ℘ U+2118 GC=Sm SC=Common SCRIPT CAPITAL P   ℮ U+212E GC=So SC=Common ESTIMATED SYMBOL   ゛ U+309B GC=Sk SC=Common KATAKANA-HIRAGANA VOICED SOUND MARK   ゜ U+309C GC=Sk SC=Common KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK

I don't believe there are stability guarantees about the Word class\, which is a derived property. Derived properties are particularly pointed out as subject to change.

Of those\, the two Kana becoming combining chars compatibility decomposition\, and U+0387 GREEK ANO TELEIA becomes U+00B7 MIDDLE DOT\, but the rest are invariant under NFKD.

  % unichars -gs --nopage 'NFKD =~ /\N{MIDDLE DOT}/'   · U+00B7 GC=Po SC=Common MIDDLE DOT   Ŀ U+013F GC=Lu SC=Latin LATIN CAPITAL LETTER L WITH MIDDLE DOT   ŀ U+0140 GC=Ll SC=Latin LATIN SMALL LETTER L WITH MIDDLE DOT   · U+0387 GC=Po SC=Common GREEK ANO TELEIA

Here are those that lose IDC under NFKD​:

  % unichars -gs --nopage '\p{IDC}' 'NFKD =~ /\P{IDC}/'   ‭ ͺ U+037A GC=Lm SC=Greek GREEK YPOGEGRAMMENI   ‭ ゛ U+309B GC=Sk SC=Common KATAKANA-HIRAGANA VOICED SOUND MARK   ‭ ゜ U+309C GC=Sk SC=Common KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK   ‭ ﱞ U+FC5E GC=Lo SC=Arabic ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM   ‭ ﱟ U+FC5F GC=Lo SC=Arabic ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM   ‭ ﱠ U+FC60 GC=Lo SC=Arabic ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM   ‭ ﱡ U+FC61 GC=Lo SC=Arabic ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM   ‭ ﱢ U+FC62 GC=Lo SC=Arabic ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM   ‭ ﱣ U+FC63 GC=Lo SC=Arabic ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM   ‭ ﷺ U+FDFA GC=Lo SC=Arabic ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM   ‭ ﷻ U+FDFB GC=Lo SC=Arabic ARABIC LIGATURE JALLAJALALOUHOU   ‭ ﹰ U+FE70 GC=Lo SC=Arabic ARABIC FATHATAN ISOLATED FORM   ‭ ﹲ U+FE72 GC=Lo SC=Arabic ARABIC DAMMATAN ISOLATED FORM   ‭ ﹴ U+FE74 GC=Lo SC=Arabic ARABIC KASRATAN ISOLATED FORM   ‭ ﹶ U+FE76 GC=Lo SC=Arabic ARABIC FATHA ISOLATED FORM   ‭ ﹸ U+FE78 GC=Lo SC=Arabic ARABIC DAMMA ISOLATED FORM   ‭ ﹺ U+FE7A GC=Lo SC=Arabic ARABIC KASRA ISOLATED FORM   ‭ ﹼ U+FE7C GC=Lo SC=Arabic ARABIC SHADDA ISOLATED FORM   ‭ ﹾ U+FE7E GC=Lo SC=Arabic ARABIC SUKUN ISOLATED FORM

  (Hm\, looks like LRO isn't working for some of the Arabic above. Weird.)

The first three all lose NFC upon NFKD because they become a space plus a combining mark\, and the space is not IDC.

BTW\, I now see that Python NFKC's its identifiers. I previously thought it NFC'd them\, but that was incorrect.

--tom

p5pRT commented 12 years ago

From tchrist@perl.com

Linda Walsh \perl\-diddler@&#8203;tlinx\.org wrote   on Sat\, 13 Aug 2011 02​:02​:32 PDT​:

` Nicholas Clark via RT wrote​:

Character properties can change. U-00B5 was Greek once.

Could you find a better example? As it is still is. Its been the same since the earliest online reference for 2.0 dating back to 1994. I don't think unicode has changed it since its inception in 91.

Um\, I don't see any property related to B5 being Greek​:

  % uniprops -a b5   U+00B5 ‹µ› \N{MICRO SIGN}   \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}   All Any Alnum Alpha Alphabetic Assigned InLatin1 Cased Cased_Letter LC Changes_When_Casefolded CWCF Changes_When_Casemapped CWCM   Changes_When_NFKC_Casefolded CWKCF Changes_When_Titlecased CWT Changes_When_Uppercased CWU Common Zyyy Ll L Gr_Base Grapheme_Base   Graph GrBase ID_Continue IDC ID_Start IDS Letter L_ Latin_1 Latin_1_Supplement Lowercase_Letter Lower Lowercase Print Word   XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Lower X_POSIX_Print X_POSIX_Word   Age=1.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Latin_1 Block=Latin_1_Supplement BLK=Latin1 Canonical_Combining_Class=0   Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=Com   Decomposition_Type=Compat DT=Com Decomposition_Type=Non_Canon Decomposition_Type=Non_Canonical DT=NonCanon East_Asian_Width=Neutral   Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA   Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic   LB=AL Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1   Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0   Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=LO Sentence_Break=Lower SB=LO   Word_Break=ALetter WB=LE Word_Break=LE _X_Begin

See? The word Greek does not appear in the uniprops output.

That said\, its tc/uc casemap and its casefold are both Greek.

--tom

p5pRT commented 12 years ago

From perl-diddler@tlinx.org

Tom Christiansen wrote​:

Linda Walsh \perl\-diddler@&#8203;tlinx\.org wrote on Sat\, 13 Aug 2011 02​:02​:32 PDT​:

` Nicholas Clark via RT wrote​:

Character properties can change. U-00B5 was Greek once.

Could you find a better example? As it is still is. Its been the same since the earliest online reference for 2.0 dating back to 1994. I don't think unicode has changed it since its inception in 91.

Um\, I don't see any property related to B5 being Greek​:

% uniprops \-a b5
U\+00B5 ‹µ› \\N\{MICRO SIGN\}
    \\w \\pL \\p\{LC\} \\p\{L\_\} \\p\{L&\} \\p\{Ll\}
    All Any Alnum Alpha Alphabetic Assigned InLatin1 Cased Cased\_Letter LC Changes\_When\_Casefolded CWCF Changes\_When\_Casemapped CWCM
       Changes\_When\_NFKC\_Casefolded CWKCF Changes\_When\_Titlecased CWT Changes\_When\_Uppercased CWU Common Zyyy Ll L Gr\_Base Grapheme\_Base
       Graph GrBase ID\_Continue IDC ID\_Start IDS Letter L\_ Latin\_1 Latin\_1\_Supplement Lowercase\_Letter Lower Lowercase Print Word
       XID\_Continue XIDC XID\_Start XIDS X\_POSIX\_Alnum X\_POSIX\_Alpha X\_POSIX\_Graph X\_POSIX\_Lower X\_POSIX\_Print X\_POSIX\_Word
    Age=1\.1 Bidi\_Class=L Bidi\_Class=Left\_To\_Right BC=L Block=Latin\_1 Block=Latin\_1\_Supplement BLK=Latin1 Canonical\_Combining\_Class=0
       Canonical\_Combining\_Class=Not\_Reordered CCC=NR Canonical\_Combining\_Class=NR Script=Common Decomposition\_Type=Com
       Decomposition\_Type=Compat DT=Com Decomposition\_Type=Non\_Canon Decomposition\_Type=Non\_Canonical DT=NonCanon East\_Asian\_Width=Neutral
       Grapheme\_Cluster\_Break=Other GCB=XX Grapheme\_Cluster\_Break=XX Hangul\_Syllable\_Type=NA Hangul\_Syllable\_Type=Not\_Applicable HST=NA
       Joining\_Group=No\_Joining\_Group JG=NoJoiningGroup Joining\_Type=Non\_Joining JT=U Joining\_Type=U Line\_Break=AL Line\_Break=Alphabetic
       LB=AL Numeric\_Type=None NT=None Numeric\_Value=NaN NV=NaN Present\_In=1\.1 IN=1\.1 Present\_In=2\.0 IN=2\.0 Present\_In=2\.1 IN=2\.1
       Present\_In=3\.0 IN=3\.0 Present\_In=3\.1 IN=3\.1 Present\_In=3\.2 IN=3\.2 Present\_In=4\.0 IN=4\.0 Present\_In=4\.1 IN=4\.1 Present\_In=5\.0 IN=5\.0
       Present\_In=5\.1 IN=5\.1 Present\_In=5\.2 IN=5\.2 Present\_In=6\.0 IN=6\.0 SC=Zyyy Script=Zyyy Sentence\_Break=LO Sentence\_Break=Lower SB=LO
       Word\_Break=ALetter WB=LE Word\_Break=LE \_X\_Begin

See? The word Greek does not appear in the uniprops output.

That said\, its tc/uc casemap and its casefold are both Greek.

--tom

I do see... I am looking at the published unicode docs on the unicode website @​ @​ http​://www.unicode.org/versions/. I went to latest code charts\, (and got other info from archival code charts). All of them referred to the character as "005b Micro Sign - µ - greek small letter mu".

  I think others have derived properties and those may have changed as it was recognized that the real 'greek letter' was up in the 'greek range'....and this this symbol 'looked' like it (as it is in its description)\, but was removed from its property list... perhaps as a correction so they letters would be adjacent.

  I'm guessing\, but it would be like if the sign for the dollar changed over over the years to something else -- the original character would probably remain as a currency symbol\, but instead of US\, it might say 'obsolete or deprecated.

  In this situation it's country of origin has changed (as would be a new $)\, but it remains an alphabetic class char\, as I'm guessing $ would remain of the same class.

  As far as a lexical parser goes\, I don't think this type of change would introduce problems over time. From what I read\, they may create a new replacement and deprecate the old -- so it loses 'Greek'\, but it's still a letter\, and is still in unicode.

  I wouldn't be suprised if 'usage' of a symbol changed\, or national identity\, as those shift. But it's properties -- class of char\, and its name...those I see as going against the basic principles.

  But a program has to have use utf8 in it to gain use any of the properties in the source itself\, no? Two points -- 1\, either acceptance that if unicode redefines '$' sigle to a new shape\, and the old one is no longer used\, then one must update program source to reflect such usage\, or -- as I think other's have mentioned\, a version number akin to perl's version number\, need be introduced.

  I see unicode as changing less than perl. So I guess I'm not as concerned about this issue. I think it more likely someone's program won't run due to changes in perl\, than changes in unicode.

 

on the unicode web site\, and you are looking at a list of of properties that someone has

p5pRT commented 12 years ago

From @khwilliamson

On 08/13/2011 03​:02 AM\, Linda Walsh wrote​:

Even more shocking is that the name of the character 0xD0C5 contains an obvious type (FHTORA instead of FTHORA); [still true in unicode 6.0\, BTW]\, rather than correcting it the Consortium has decided to let it stand and to insert a little note" [... acknowledging it ].

All names once created are never changed. But in cases like this Unicode has created an alias with the correct spelling that is to be preferred over the original. Perl will return the corrected value in 5.14.

$ blead -Mcharnames=​:full -E 'say charnames​::viacode(0x1D0C5)' BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS

p5pRT commented 12 years ago

From @cpansprout

On Fri Aug 12 07​:50​:53 2011\, LAWalsh wrote​:

New delimiters may come\, but they'll come out of what are now\,

invalid ranges or unused planes.

How do you defined ‘unused’? How many people know what version of Unicode their perl and their editor (actually\, their fonts) are using\, or whether they are even the same? What if I use a new character as a delimiter\, not realising that perl considers it to be ‘unused’\, and the next upgrade will break my script?

p5pRT commented 12 years ago

From tchrist@perl.com

"Father Chrysostomos via RT" \perlbug\-followup@&#8203;perl\.org wrote   on Sun\, 04 Sep 2011 12​:15​:50 PDT​:

How do you defined ‘unused’? How many people know what version of Unicode their perl and their editor (actually\, their fonts) are using\, or whether they are even the same? What if I use a new character as a delimiter\, not realising that perl considers it to be ‘unused’\, and the next upgrade will break my script?

I don't understand what you're so worried about. The Unicode Pattern_Syntax character property has strong stability guarantees.
Why not use just the open/left thingies in BidiMirroring that are pattern syntax as openers and the corresponding mirrored bit for closers? Point this at BidiMirroring.txt...

  #!/usr/bin/env perl   use v5.14;   use strict;   use warnings;   use charnames ();   my $pairs = 0;   while (\<>) {   next if /^\s*#/;   my($left\, $right) = /\b(\p{Ahex}{4\,})\b/g;   next unless $left && $right;   for ($left\, $right) { $_ = hex }   next unless chr($left) =~ /(?=\p{patsyn})[\p{Ps}\p{Pi}]/;   next unless chr($right) =~ /(?=\p{patsyn})[\p{Pe}\p{Pf}]/;   $pairs++;   for ($left\, $right) {   printf "%s %04X %s\n"\, chr\, $_\, charnames​::viacode($_);   }   print "\n";   }   print "$pairs pairs.\n";

...and out pop 53 apparently eligible pairs.

What's so wrong with that?

--tom

  ( 0028 LEFT PARENTHESIS   ) 0029 RIGHT PARENTHESIS

  [ 005B LEFT SQUARE BRACKET   ] 005D RIGHT SQUARE BRACKET

  { 007B LEFT CURLY BRACKET   } 007D RIGHT CURLY BRACKET

  « 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK   » 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

  ‹ 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK   › 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

  ⁅ 2045 LEFT SQUARE BRACKET WITH QUILL   ⁆ 2046 RIGHT SQUARE BRACKET WITH QUILL

  〈 2329 LEFT-POINTING ANGLE BRACKET   〉 232A RIGHT-POINTING ANGLE BRACKET

  ❨ 2768 MEDIUM LEFT PARENTHESIS ORNAMENT   ❩ 2769 MEDIUM RIGHT PARENTHESIS ORNAMENT

  ❪ 276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT   ❫ 276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT

  ❬ 276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT   ❭ 276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT

  ❮ 276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT   ❯ 276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT

  ❰ 2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT   ❱ 2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT

  ❲ 2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT   ❳ 2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT

  ❴ 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT   ❵ 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT

  ⟅ 27C5 LEFT S-SHAPED BAG DELIMITER   ⟆ 27C6 RIGHT S-SHAPED BAG DELIMITER

  ⟦ 27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET   ⟧ 27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET

  ⟨ 27E8 MATHEMATICAL LEFT ANGLE BRACKET   ⟩ 27E9 MATHEMATICAL RIGHT ANGLE BRACKET

  ⟪ 27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET   ⟫ 27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET

  ⟬ 27EC MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET   ⟭ 27ED MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET

  ⟮ 27EE MATHEMATICAL LEFT FLATTENED PARENTHESIS   ⟯ 27EF MATHEMATICAL RIGHT FLATTENED PARENTHESIS

  ⦃ 2983 LEFT WHITE CURLY BRACKET   ⦄ 2984 RIGHT WHITE CURLY BRACKET

  ⦅ 2985 LEFT WHITE PARENTHESIS   ⦆ 2986 RIGHT WHITE PARENTHESIS

  ⦇ 2987 Z NOTATION LEFT IMAGE BRACKET   ⦈ 2988 Z NOTATION RIGHT IMAGE BRACKET

  ⦉ 2989 Z NOTATION LEFT BINDING BRACKET   ⦊ 298A Z NOTATION RIGHT BINDING BRACKET

  ⦋ 298B LEFT SQUARE BRACKET WITH UNDERBAR   ⦌ 298C RIGHT SQUARE BRACKET WITH UNDERBAR

  ⦍ 298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER   ⦐ 2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER

  ⦏ 298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER   ⦎ 298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER

  ⦑ 2991 LEFT ANGLE BRACKET WITH DOT   ⦒ 2992 RIGHT ANGLE BRACKET WITH DOT

  ⦓ 2993 LEFT ARC LESS-THAN BRACKET   ⦔ 2994 RIGHT ARC GREATER-THAN BRACKET

  ⦕ 2995 DOUBLE LEFT ARC GREATER-THAN BRACKET   ⦖ 2996 DOUBLE RIGHT ARC LESS-THAN BRACKET

  ⦗ 2997 LEFT BLACK TORTOISE SHELL BRACKET   ⦘ 2998 RIGHT BLACK TORTOISE SHELL BRACKET

  ⧘ 29D8 LEFT WIGGLY FENCE   ⧙ 29D9 RIGHT WIGGLY FENCE

  ⧚ 29DA LEFT DOUBLE WIGGLY FENCE   ⧛ 29DB RIGHT DOUBLE WIGGLY FENCE

  ⧼ 29FC LEFT-POINTING CURVED ANGLE BRACKET   ⧽ 29FD RIGHT-POINTING CURVED ANGLE BRACKET

  ⸂ 2E02 LEFT SUBSTITUTION BRACKET   ⸃ 2E03 RIGHT SUBSTITUTION BRACKET

  ⸄ 2E04 LEFT DOTTED SUBSTITUTION BRACKET   ⸅ 2E05 RIGHT DOTTED SUBSTITUTION BRACKET

  ⸉ 2E09 LEFT TRANSPOSITION BRACKET   ⸊ 2E0A RIGHT TRANSPOSITION BRACKET

  ⸌ 2E0C LEFT RAISED OMISSION BRACKET   ⸍ 2E0D RIGHT RAISED OMISSION BRACKET

  ⸜ 2E1C LEFT LOW PARAPHRASE BRACKET   ⸝ 2E1D RIGHT LOW PARAPHRASE BRACKET

  ⸠ 2E20 LEFT VERTICAL BAR WITH QUILL   ⸡ 2E21 RIGHT VERTICAL BAR WITH QUILL

  ⸢ 2E22 TOP LEFT HALF BRACKET   ⸣ 2E23 TOP RIGHT HALF BRACKET

  ⸤ 2E24 BOTTOM LEFT HALF BRACKET   ⸥ 2E25 BOTTOM RIGHT HALF BRACKET

  ⸦ 2E26 LEFT SIDEWAYS U BRACKET   ⸧ 2E27 RIGHT SIDEWAYS U BRACKET

  ⸨ 2E28 LEFT DOUBLE PARENTHESIS   ⸩ 2E29 RIGHT DOUBLE PARENTHESIS

  〈 3008 LEFT ANGLE BRACKET   〉 3009 RIGHT ANGLE BRACKET

  《 300A LEFT DOUBLE ANGLE BRACKET   》 300B RIGHT DOUBLE ANGLE BRACKET

  「 300C LEFT CORNER BRACKET   」 300D RIGHT CORNER BRACKET

  『 300E LEFT WHITE CORNER BRACKET   』 300F RIGHT WHITE CORNER BRACKET

  【 3010 LEFT BLACK LENTICULAR BRACKET   】 3011 RIGHT BLACK LENTICULAR BRACKET

  〔 3014 LEFT TORTOISE SHELL BRACKET   〕 3015 RIGHT TORTOISE SHELL BRACKET

  〖 3016 LEFT WHITE LENTICULAR BRACKET   〗 3017 RIGHT WHITE LENTICULAR BRACKET

  〘 3018 LEFT WHITE TORTOISE SHELL BRACKET   〙 3019 RIGHT WHITE TORTOISE SHELL BRACKET

  〚 301A LEFT WHITE SQUARE BRACKET   〛 301B RIGHT WHITE SQUARE BRACKET

  53 pairs.

p5pRT commented 12 years ago

From perl-diddler@tlinx.org

tchrist1 via RT wrote​:

"Father Chrysostomos via RT" \perlbug\-followup@&#8203;perl\.org wrote on Sun\, 04 Sep 2011 12​:15​:50 PDT​:

How do you defined ‘unused’?

I don't understand what you're so worried about. The Unicode Pattern_Syntax character property has strong stability guarantees.
Why not use just the open/left thingies in BidiMirroring that are pattern syntax as openers and the corresponding mirrored bit for closers? Point this at BidiMirroring.txt...

===   This would just be too simple or too much like...um...'right'...can't do that we have to be concerned about all the F.U.D... I mean the moon might fall tomorrow and where would perl compatibility be then?! (Not to understate the value I hold w/perl's compat after being burned by 4.1 bash that voids compat w/3.0-4.0 in certain features. (But hey\, it's more posix compliant!...um which posix\, -- why the new 2008 posix that's incompat w/the original posix.. OH!...that posix... grumble!)

p5pRT commented 12 years ago

From @rjbs

* Tom Christiansen \tchrist@&#8203;perl\.com [2011-09-04T16​:19​:18]

I don't understand what you're so worried about. The Unicode Pattern_Syntax character property has strong stability guarantees.
Why not use just the open/left thingies in BidiMirroring that are pattern syntax as openers and the corresponding mirrored bit for closers? Point this at BidiMirroring.txt...

[…]

...and out pop 53 apparently eligible pairs.

What's so wrong with that?

Not much that I see\, although we'll need to be sure to keep supporting

  003C \< LESS-THAN SIGN   003E > GREATER-THAN SIGN

:)

-- rjbs

p5pRT commented 12 years ago

From tchrist@perl.com

Ricardo Signes \perl\.p5p@&#8203;rjbs\.manxome\.org wrote   on Sun\, 04 Sep 2011 18​:26​:26 EDT​:

* Tom Christiansen \tchrist@&#8203;perl\.com [2011-09-04T16​:19​:18]

I don't understand what you're so worried about. The Unicode Pattern_Syntax character property has strong stability guarantees.
Why not use just the open/left thingies in BidiMirroring that are pattern syntax as openers and the corresponding mirrored bit for closers? Point this at BidiMirroring.txt...

[…]

...and out pop 53 apparently eligible pairs.

What's so wrong with that?

Not much that I see\, although we'll need to be sure to keep supporting

003C \< LESS-THAN SIGN 003E > GREATER-THAN SIGN

:)

Oh right​: I forgot all the \pS "symbols\, signs\, and sigils" glyphs.
If you include paired symbols\, you get doing on 300 pairs instead of just 53. Lots more.

However...

One potential concern regarding the Symbols is that unlike the Ps/Pe and Pi/Pf matched Punctuation pairs\, the Symbol pairs have no inherent left/start/initial member and then a complementary right/end/final member.

That means that if you hit a \p{Bidi_Mirrored} character that's a \pS\, then the reflection might be the "wrong" one\, in some senses. In particular\, some of these get listed twice\, once showing that the reflection of less-than is greater-than\, and then again for the other way around​:

( 0028 Ps LEFT PARENTHESIS ) 0029 Pe RIGHT PARENTHESIS

\< 003C Sm LESS-THAN SIGN

003E Sm GREATER-THAN SIGN

003E Sm GREATER-THAN SIGN \< 003C Sm LESS-THAN SIGN

[ 005B Ps LEFT SQUARE BRACKET ] 005D Pe RIGHT SQUARE BRACKET

{ 007B Ps LEFT CURLY BRACKET } 007D Pe RIGHT CURLY BRACKET

« 00AB Pi LEFT-POINTING DOUBLE ANGLE QUOTATION MARK » 00BB Pf RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

‹ 2039 Pi SINGLE LEFT-POINTING ANGLE QUOTATION MARK › 203A Pf SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

If we just accepted any Symbol and its mirror\, that would would seem to say to accept both \ and >example\<\, which we do not currently do. Of course\, we think of those as Punctuation not Symbols\, but Unicode doesn't.

Probably we should not allow >examples\< for fear of breaking code that uses >examples>\, but we should just go ahead and allow the rest.

Here\, this code is better​:

* It includes the category in the print out. * It makes sure both brackets are of the corresponding "type" * It only looks at code points in the Common script. * It flags things already seen with a "+". * It flags with "!" a starting bracket with the word right or vice versa. * It lists but flags with a "*" code points that aren't Pattern_Syntax. * It flags with a "~" anything labelled "best fit" in the mirror file. * It shows delimited samples of what those brackets would/might look like. * Any warning flags put the sample in uppercase.  
Output included as data.

--tom

#!/usr/bin/env perl use v5.14; use strict; use warnings; use charnames ();

use Unicode​::UCD qw(charinfo);

sub cat(_) { charinfo( shift() )->{category} } sub name(_) { charnames​::viacode( shift() ) }

$SIG{__WARN__} = sub { die "panic​: @​_" unless $^S };

my %seen = (); # \pS will appear twice\, once as left and once as right my $pairs = 0;

unshift @​ARGV\, "BidiMirroring.txt" if @​ARGV == 0 && -t; while (\<>) {   next if /^\s*#/;

  my($left\, $right) = /\b(\p{AHex}{4\,})\b/g;   next unless $left && $right;   for ($left\, $right) { $_ = hex }

  next unless chr($left) =~ /\p{common}/ && chr($right) =~ /\p{common}/; ## next unless chr($left) =~ /\p{bidim}/ && chr($right) =~ /\p{bidim}/; ## next unless chr($left) =~ /\p{patsyn}/ && chr($right) =~ /\p{patsyn}/;

  next unless chr($left) =~ /\pS/ && chr($right) =~ /\pS/   || chr($left) =~ /\p{Ps}/ && chr($right) =~ /\p{Pe}/   || chr($left) =~ /\p{Pi}/ && chr($right) =~ /\p{Pf}/ ;

  $pairs++;

  my $lwarn = "";   $lwarn .= "+" if $seen{$left};
  $lwarn .= "~" if /BEST FIT/;   $lwarn .= "!" if name($left) =~ /\bRIGHT\b/;   $lwarn .= "*" if chr($left) !~ /\p{patsyn}/;

  my $rwarn = "";   $rwarn .= "+" if $seen{$right};
  $rwarn .= "~" if /BEST FIT/;   $rwarn .= "!" if name($right) =~ /\bLEFT\b/;   $rwarn .= "*" if chr($right) !~ /\p{patsyn}/;

  my $lpad = " " x chr($left) !~ /[\p{EA=W}\p{EA=F}]/;   my $rpad = " " x chr($right) !~ /[\p{EA=W}\p{EA=F}]/;

  my($leg\, $reg) = qw(delimited example);

  $leg = uc($leg) if $lwarn;   $reg = uc($reg) if $rwarn;

  printf "%c%s %s%c\n"\, $left\, $leg\, $reg\, $right;   printf "%-4s %s %04X %2s %s\n"\, $lwarn\, chr($left) . $lpad\, $left\, cat($left)\, name($left);   printf "%-4s %s %04X %2s %s\n"\, $rwarn\, chr($right). $rpad\, $right\, cat($right)\, name($right);

  for ($left\, $right) { $seen{$_}++ }

  print "\n"; }

print "$pairs pairs.\n";

__END__ (delimited example)   ( 0028 Ps LEFT PARENTHESIS   ) 0029 Pe RIGHT PARENTHESIS

\   \< 003C Sm LESS-THAN SIGN   > 003E Sm GREATER-THAN SIGN

DELIMITED EXAMPLE\< + > 003E Sm GREATER-THAN SIGN + \< 003C Sm LESS-THAN SIGN

[delimited example]   [ 005B Ps LEFT SQUARE BRACKET   ] 005D Pe RIGHT SQUARE BRACKET

{delimited example}   { 007B Ps LEFT CURLY BRACKET   } 007D Pe RIGHT CURLY BRACKET

«delimited example»   « 00AB Pi LEFT-POINTING DOUBLE ANGLE QUOTATION MARK   » 00BB Pf RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

‹delimited example›   ‹ 2039 Pi SINGLE LEFT-POINTING ANGLE QUOTATION MARK   › 203A Pf SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

⁅delimited example⁆   ⁅ 2045 Ps LEFT SQUARE BRACKET WITH QUILL   ⁆ 2046 Pe RIGHT SQUARE BRACKET WITH QUILL

⁽DELIMITED EXAMPLE⁾ * ⁽ 207D Ps SUPERSCRIPT LEFT PARENTHESIS * ⁾ 207E Pe SUPERSCRIPT RIGHT PARENTHESIS

₍DELIMITED EXAMPLE₎ * ₍ 208D Ps SUBSCRIPT LEFT PARENTHESIS * ₎ 208E Pe SUBSCRIPT RIGHT PARENTHESIS

∈delimited example∋   ∈ 2208 Sm ELEMENT OF   ∋ 220B Sm CONTAINS AS MEMBER

∉delimited example∌   ∉ 2209 Sm NOT AN ELEMENT OF   ∌ 220C Sm DOES NOT CONTAIN AS MEMBER

∊delimited example∍   ∊ 220A Sm SMALL ELEMENT OF   ∍ 220D Sm SMALL CONTAINS AS MEMBER

∋DELIMITED EXAMPLE∈ + ∋ 220B Sm CONTAINS AS MEMBER + ∈ 2208 Sm ELEMENT OF

∌DELIMITED EXAMPLE∉ + ∌ 220C Sm DOES NOT CONTAIN AS MEMBER + ∉ 2209 Sm NOT AN ELEMENT OF

∍DELIMITED EXAMPLE∊ + ∍ 220D Sm SMALL CONTAINS AS MEMBER + ∊ 220A Sm SMALL ELEMENT OF

∕delimited example⧵   ∕ 2215 Sm DIVISION SLASH   ⧵ 29F5 Sm REVERSE SOLIDUS OPERATOR

∼delimited example∽   ∼ 223C Sm TILDE OPERATOR   ∽ 223D Sm REVERSED TILDE

∽DELIMITED EXAMPLE∼ + ∽ 223D Sm REVERSED TILDE + ∼ 223C Sm TILDE OPERATOR

≃delimited example⋍   ≃ 2243 Sm ASYMPTOTICALLY EQUAL TO   ⋍ 22CD Sm REVERSED TILDE EQUALS

≒delimited example≓   ≒ 2252 Sm APPROXIMATELY EQUAL TO OR THE IMAGE OF   ≓ 2253 Sm IMAGE OF OR APPROXIMATELY EQUAL TO

≓DELIMITED EXAMPLE≒ + ≓ 2253 Sm IMAGE OF OR APPROXIMATELY EQUAL TO + ≒ 2252 Sm APPROXIMATELY EQUAL TO OR THE IMAGE OF

≔delimited example≕   ≔ 2254 Sm COLON EQUALS   ≕ 2255 Sm EQUALS COLON

≕DELIMITED EXAMPLE≔ + ≕ 2255 Sm EQUALS COLON + ≔ 2254 Sm COLON EQUALS

≤delimited example≥   ≤ 2264 Sm LESS-THAN OR EQUAL TO   ≥ 2265 Sm GREATER-THAN OR EQUAL TO

≥DELIMITED EXAMPLE≤ + ≥ 2265 Sm GREATER-THAN OR EQUAL TO + ≤ 2264 Sm LESS-THAN OR EQUAL TO

≦delimited example≧   ≦ 2266 Sm LESS-THAN OVER EQUAL TO   ≧ 2267 Sm GREATER-THAN OVER EQUAL TO

≧DELIMITED EXAMPLE≦ + ≧ 2267 Sm GREATER-THAN OVER EQUAL TO + ≦ 2266 Sm LESS-THAN OVER EQUAL TO

≨DELIMITED EXAMPLE≩ ~ ≨ 2268 Sm LESS-THAN BUT NOT EQUAL TO ~ ≩ 2269 Sm GREATER-THAN BUT NOT EQUAL TO

≩DELIMITED EXAMPLE≨ +~ ≩ 2269 Sm GREATER-THAN BUT NOT EQUAL TO +~ ≨ 2268 Sm LESS-THAN BUT NOT EQUAL TO

≪delimited example≫   ≪ 226A Sm MUCH LESS-THAN   ≫ 226B Sm MUCH GREATER-THAN

≫DELIMITED EXAMPLE≪ + ≫ 226B Sm MUCH GREATER-THAN + ≪ 226A Sm MUCH LESS-THAN

≮DELIMITED EXAMPLE≯ ~ ≮ 226E Sm NOT LESS-THAN ~ ≯ 226F Sm NOT GREATER-THAN

≯DELIMITED EXAMPLE≮ +~ ≯ 226F Sm NOT GREATER-THAN +~ ≮ 226E Sm NOT LESS-THAN

≰DELIMITED EXAMPLE≱ ~ ≰ 2270 Sm NEITHER LESS-THAN NOR EQUAL TO ~ ≱ 2271 Sm NEITHER GREATER-THAN NOR EQUAL TO

≱DELIMITED EXAMPLE≰ +~ ≱ 2271 Sm NEITHER GREATER-THAN NOR EQUAL TO +~ ≰ 2270 Sm NEITHER LESS-THAN NOR EQUAL TO

≲DELIMITED EXAMPLE≳ ~ ≲ 2272 Sm LESS-THAN OR EQUIVALENT TO ~ ≳ 2273 Sm GREATER-THAN OR EQUIVALENT TO

≳DELIMITED EXAMPLE≲ +~ ≳ 2273 Sm GREATER-THAN OR EQUIVALENT TO +~ ≲ 2272 Sm LESS-THAN OR EQUIVALENT TO

≴DELIMITED EXAMPLE≵ ~ ≴ 2274 Sm NEITHER LESS-THAN NOR EQUIVALENT TO ~ ≵ 2275 Sm NEITHER GREATER-THAN NOR EQUIVALENT TO

≵DELIMITED EXAMPLE≴ +~ ≵ 2275 Sm NEITHER GREATER-THAN NOR EQUIVALENT TO +~ ≴ 2274 Sm NEITHER LESS-THAN NOR EQUIVALENT TO

≶delimited example≷   ≶ 2276 Sm LESS-THAN OR GREATER-THAN   ≷ 2277 Sm GREATER-THAN OR LESS-THAN

≷DELIMITED EXAMPLE≶ + ≷ 2277 Sm GREATER-THAN OR LESS-THAN + ≶ 2276 Sm LESS-THAN OR GREATER-THAN

≸DELIMITED EXAMPLE≹ ~ ≸ 2278 Sm NEITHER LESS-THAN NOR GREATER-THAN ~ ≹ 2279 Sm NEITHER GREATER-THAN NOR LESS-THAN

≹DELIMITED EXAMPLE≸ +~ ≹ 2279 Sm NEITHER GREATER-THAN NOR LESS-THAN +~ ≸ 2278 Sm NEITHER LESS-THAN NOR GREATER-THAN

≺delimited example≻   ≺ 227A Sm PRECEDES   ≻ 227B Sm SUCCEEDS

≻DELIMITED EXAMPLE≺ + ≻ 227B Sm SUCCEEDS + ≺ 227A Sm PRECEDES

≼delimited example≽   ≼ 227C Sm PRECEDES OR EQUAL TO   ≽ 227D Sm SUCCEEDS OR EQUAL TO

≽DELIMITED EXAMPLE≼ + ≽ 227D Sm SUCCEEDS OR EQUAL TO + ≼ 227C Sm PRECEDES OR EQUAL TO

≾DELIMITED EXAMPLE≿ ~ ≾ 227E Sm PRECEDES OR EQUIVALENT TO ~ ≿ 227F Sm SUCCEEDS OR EQUIVALENT TO

≿DELIMITED EXAMPLE≾ +~ ≿ 227F Sm SUCCEEDS OR EQUIVALENT TO +~ ≾ 227E Sm PRECEDES OR EQUIVALENT TO

⊀DELIMITED EXAMPLE⊁ ~ ⊀ 2280 Sm DOES NOT PRECEDE ~ ⊁ 2281 Sm DOES NOT SUCCEED

⊁DELIMITED EXAMPLE⊀ +~ ⊁ 2281 Sm DOES NOT SUCCEED +~ ⊀ 2280 Sm DOES NOT PRECEDE

⊂delimited example⊃   ⊂ 2282 Sm SUBSET OF   ⊃ 2283 Sm SUPERSET OF

⊃DELIMITED EXAMPLE⊂ + ⊃ 2283 Sm SUPERSET OF + ⊂ 2282 Sm SUBSET OF

⊄DELIMITED EXAMPLE⊅ ~ ⊄ 2284 Sm NOT A SUBSET OF ~ ⊅ 2285 Sm NOT A SUPERSET OF

⊅DELIMITED EXAMPLE⊄ +~ ⊅ 2285 Sm NOT A SUPERSET OF +~ ⊄ 2284 Sm NOT A SUBSET OF

⊆delimited example⊇   ⊆ 2286 Sm SUBSET OF OR EQUAL TO   ⊇ 2287 Sm SUPERSET OF OR EQUAL TO

⊇DELIMITED EXAMPLE⊆ + ⊇ 2287 Sm SUPERSET OF OR EQUAL TO + ⊆ 2286 Sm SUBSET OF OR EQUAL TO

⊈DELIMITED EXAMPLE⊉ ~ ⊈ 2288 Sm NEITHER A SUBSET OF NOR EQUAL TO ~ ⊉ 2289 Sm NEITHER A SUPERSET OF NOR EQUAL TO

⊉DELIMITED EXAMPLE⊈ +~ ⊉ 2289 Sm NEITHER A SUPERSET OF NOR EQUAL TO +~ ⊈ 2288 Sm NEITHER A SUBSET OF NOR EQUAL TO

⊊DELIMITED EXAMPLE⊋ ~ ⊊ 228A Sm SUBSET OF WITH NOT EQUAL TO ~ ⊋ 228B Sm SUPERSET OF WITH NOT EQUAL TO

⊋DELIMITED EXAMPLE⊊ +~ ⊋ 228B Sm SUPERSET OF WITH NOT EQUAL TO +~ ⊊ 228A Sm SUBSET OF WITH NOT EQUAL TO

⊏delimited example⊐   ⊏ 228F Sm SQUARE IMAGE OF   ⊐ 2290 Sm SQUARE ORIGINAL OF

⊐DELIMITED EXAMPLE⊏ + ⊐ 2290 Sm SQUARE ORIGINAL OF + ⊏ 228F Sm SQUARE IMAGE OF

⊑delimited example⊒   ⊑ 2291 Sm SQUARE IMAGE OF OR EQUAL TO   ⊒ 2292 Sm SQUARE ORIGINAL OF OR EQUAL TO

⊒DELIMITED EXAMPLE⊑ + ⊒ 2292 Sm SQUARE ORIGINAL OF OR EQUAL TO + ⊑ 2291 Sm SQUARE IMAGE OF OR EQUAL TO

⊘delimited example⦸   ⊘ 2298 Sm CIRCLED DIVISION SLASH   ⦸ 29B8 Sm CIRCLED REVERSE SOLIDUS

⊢DELIMITED EXAMPLE⊣ ! ⊢ 22A2 Sm RIGHT TACK ! ⊣ 22A3 Sm LEFT TACK

⊣DELIMITED EXAMPLE⊢ + ⊣ 22A3 Sm LEFT TACK + ⊢ 22A2 Sm RIGHT TACK

⊦delimited EXAMPLE⫞   ⊦ 22A6 Sm ASSERTION ! ⫞ 2ADE Sm SHORT LEFT TACK

⊨delimited EXAMPLE⫤   ⊨ 22A8 Sm TRUE ! ⫤ 2AE4 Sm VERTICAL BAR DOUBLE LEFT TURNSTILE

⊩delimited EXAMPLE⫣   ⊩ 22A9 Sm FORCES ! ⫣ 2AE3 Sm DOUBLE VERTICAL BAR LEFT TURNSTILE

⊫DELIMITED EXAMPLE⫥ ! ⊫ 22AB Sm DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE ! ⫥ 2AE5 Sm DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE

⊰delimited example⊱   ⊰ 22B0 Sm PRECEDES UNDER RELATION   ⊱ 22B1 Sm SUCCEEDS UNDER RELATION

⊱DELIMITED EXAMPLE⊰ + ⊱ 22B1 Sm SUCCEEDS UNDER RELATION + ⊰ 22B0 Sm PRECEDES UNDER RELATION

⊲delimited example⊳   ⊲ 22B2 Sm NORMAL SUBGROUP OF   ⊳ 22B3 Sm CONTAINS AS NORMAL SUBGROUP

⊳DELIMITED EXAMPLE⊲ + ⊳ 22B3 Sm CONTAINS AS NORMAL SUBGROUP + ⊲ 22B2 Sm NORMAL SUBGROUP OF

⊴delimited example⊵   ⊴ 22B4 Sm NORMAL SUBGROUP OF OR EQUAL TO   ⊵ 22B5 Sm CONTAINS AS NORMAL SUBGROUP OR EQUAL TO

⊵DELIMITED EXAMPLE⊴ + ⊵ 22B5 Sm CONTAINS AS NORMAL SUBGROUP OR EQUAL TO + ⊴ 22B4 Sm NORMAL SUBGROUP OF OR EQUAL TO

⊶delimited example⊷   ⊶ 22B6 Sm ORIGINAL OF   ⊷ 22B7 Sm IMAGE OF

⊷DELIMITED EXAMPLE⊶ + ⊷ 22B7 Sm IMAGE OF + ⊶ 22B6 Sm ORIGINAL OF

⋉delimited example⋊   ⋉ 22C9 Sm LEFT NORMAL FACTOR SEMIDIRECT PRODUCT   ⋊ 22CA Sm RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT

⋊DELIMITED EXAMPLE⋉ +! ⋊ 22CA Sm RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT +! ⋉ 22C9 Sm LEFT NORMAL FACTOR SEMIDIRECT PRODUCT

⋋delimited example⋌   ⋋ 22CB Sm LEFT SEMIDIRECT PRODUCT   ⋌ 22CC Sm RIGHT SEMIDIRECT PRODUCT

⋌DELIMITED EXAMPLE⋋ +! ⋌ 22CC Sm RIGHT SEMIDIRECT PRODUCT +! ⋋ 22CB Sm LEFT SEMIDIRECT PRODUCT

⋍DELIMITED EXAMPLE≃ + ⋍ 22CD Sm REVERSED TILDE EQUALS + ≃ 2243 Sm ASYMPTOTICALLY EQUAL TO

⋐delimited example⋑   ⋐ 22D0 Sm DOUBLE SUBSET   ⋑ 22D1 Sm DOUBLE SUPERSET

⋑DELIMITED EXAMPLE⋐ + ⋑ 22D1 Sm DOUBLE SUPERSET + ⋐ 22D0 Sm DOUBLE SUBSET

⋖delimited example⋗   ⋖ 22D6 Sm LESS-THAN WITH DOT   ⋗ 22D7 Sm GREATER-THAN WITH DOT

⋗DELIMITED EXAMPLE⋖ + ⋗ 22D7 Sm GREATER-THAN WITH DOT + ⋖ 22D6 Sm LESS-THAN WITH DOT

⋘delimited example⋙   ⋘ 22D8 Sm VERY MUCH LESS-THAN   ⋙ 22D9 Sm VERY MUCH GREATER-THAN

⋙DELIMITED EXAMPLE⋘ + ⋙ 22D9 Sm VERY MUCH GREATER-THAN + ⋘ 22D8 Sm VERY MUCH LESS-THAN

⋚delimited example⋛   ⋚ 22DA Sm LESS-THAN EQUAL TO OR GREATER-THAN   ⋛ 22DB Sm GREATER-THAN EQUAL TO OR LESS-THAN

⋛DELIMITED EXAMPLE⋚ + ⋛ 22DB Sm GREATER-THAN EQUAL TO OR LESS-THAN + ⋚ 22DA Sm LESS-THAN EQUAL TO OR GREATER-THAN

⋜delimited example⋝   ⋜ 22DC Sm EQUAL TO OR LESS-THAN   ⋝ 22DD Sm EQUAL TO OR GREATER-THAN

⋝DELIMITED EXAMPLE⋜ + ⋝ 22DD Sm EQUAL TO OR GREATER-THAN + ⋜ 22DC Sm EQUAL TO OR LESS-THAN

⋞delimited example⋟   ⋞ 22DE Sm EQUAL TO OR PRECEDES   ⋟ 22DF Sm EQUAL TO OR SUCCEEDS

⋟DELIMITED EXAMPLE⋞ + ⋟ 22DF Sm EQUAL TO OR SUCCEEDS + ⋞ 22DE Sm EQUAL TO OR PRECEDES

⋠DELIMITED EXAMPLE⋡ ~ ⋠ 22E0 Sm DOES NOT PRECEDE OR EQUAL ~ ⋡ 22E1 Sm DOES NOT SUCCEED OR EQUAL

⋡DELIMITED EXAMPLE⋠ +~ ⋡ 22E1 Sm DOES NOT SUCCEED OR EQUAL +~ ⋠ 22E0 Sm DOES NOT PRECEDE OR EQUAL

⋢DELIMITED EXAMPLE⋣ ~ ⋢ 22E2 Sm NOT SQUARE IMAGE OF OR EQUAL TO ~ ⋣ 22E3 Sm NOT SQUARE ORIGINAL OF OR EQUAL TO

⋣DELIMITED EXAMPLE⋢ +~ ⋣ 22E3 Sm NOT SQUARE ORIGINAL OF OR EQUAL TO +~ ⋢ 22E2 Sm NOT SQUARE IMAGE OF OR EQUAL TO

⋤DELIMITED EXAMPLE⋥ ~ ⋤ 22E4 Sm SQUARE IMAGE OF OR NOT EQUAL TO ~ ⋥ 22E5 Sm SQUARE ORIGINAL OF OR NOT EQUAL TO

⋥DELIMITED EXAMPLE⋤ +~ ⋥ 22E5 Sm SQUARE ORIGINAL OF OR NOT EQUAL TO +~ ⋤ 22E4 Sm SQUARE IMAGE OF OR NOT EQUAL TO

⋦DELIMITED EXAMPLE⋧ ~ ⋦ 22E6 Sm LESS-THAN BUT NOT EQUIVALENT TO ~ ⋧ 22E7 Sm GREATER-THAN BUT NOT EQUIVALENT TO

⋧DELIMITED EXAMPLE⋦ +~ ⋧ 22E7 Sm GREATER-THAN BUT NOT EQUIVALENT TO +~ ⋦ 22E6 Sm LESS-THAN BUT NOT EQUIVALENT TO

⋨DELIMITED EXAMPLE⋩ ~ ⋨ 22E8 Sm PRECEDES BUT NOT EQUIVALENT TO ~ ⋩ 22E9 Sm SUCCEEDS BUT NOT EQUIVALENT TO

⋩DELIMITED EXAMPLE⋨ +~ ⋩ 22E9 Sm SUCCEEDS BUT NOT EQUIVALENT TO +~ ⋨ 22E8 Sm PRECEDES BUT NOT EQUIVALENT TO

⋪DELIMITED EXAMPLE⋫ ~ ⋪ 22EA Sm NOT NORMAL SUBGROUP OF ~ ⋫ 22EB Sm DOES NOT CONTAIN AS NORMAL SUBGROUP

⋫DELIMITED EXAMPLE⋪ +~ ⋫ 22EB Sm DOES NOT CONTAIN AS NORMAL SUBGROUP +~ ⋪ 22EA Sm NOT NORMAL SUBGROUP OF

⋬DELIMITED EXAMPLE⋭ ~ ⋬ 22EC Sm NOT NORMAL SUBGROUP OF OR EQUAL TO ~ ⋭ 22ED Sm DOES NOT CONTAIN AS NORMAL SUBGROUP OR EQUAL

⋭DELIMITED EXAMPLE⋬ +~ ⋭ 22ED Sm DOES NOT CONTAIN AS NORMAL SUBGROUP OR EQUAL +~ ⋬ 22EC Sm NOT NORMAL SUBGROUP OF OR EQUAL TO

⋰DELIMITED example⋱ ! ⋰ 22F0 Sm UP RIGHT DIAGONAL ELLIPSIS   ⋱ 22F1 Sm DOWN RIGHT DIAGONAL ELLIPSIS

⋱DELIMITED EXAMPLE⋰ +! ⋱ 22F1 Sm DOWN RIGHT DIAGONAL ELLIPSIS + ⋰ 22F0 Sm UP RIGHT DIAGONAL ELLIPSIS

⋲delimited example⋺   ⋲ 22F2 Sm ELEMENT OF WITH LONG HORIZONTAL STROKE   ⋺ 22FA Sm CONTAINS WITH LONG HORIZONTAL STROKE

⋳delimited example⋻   ⋳ 22F3 Sm ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE   ⋻ 22FB Sm CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE

⋴delimited example⋼   ⋴ 22F4 Sm SMALL ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE   ⋼ 22FC Sm SMALL CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE

⋶delimited example⋽   ⋶ 22F6 Sm ELEMENT OF WITH OVERBAR   ⋽ 22FD Sm CONTAINS WITH OVERBAR

⋷delimited example⋾   ⋷ 22F7 Sm SMALL ELEMENT OF WITH OVERBAR   ⋾ 22FE Sm SMALL CONTAINS WITH OVERBAR

⋺DELIMITED EXAMPLE⋲ + ⋺ 22FA Sm CONTAINS WITH LONG HORIZONTAL STROKE + ⋲ 22F2 Sm ELEMENT OF WITH LONG HORIZONTAL STROKE

⋻DELIMITED EXAMPLE⋳ + ⋻ 22FB Sm CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE + ⋳ 22F3 Sm ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE

⋼DELIMITED EXAMPLE⋴ + ⋼ 22FC Sm SMALL CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE + ⋴ 22F4 Sm SMALL ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE

⋽DELIMITED EXAMPLE⋶ + ⋽ 22FD Sm CONTAINS WITH OVERBAR + ⋶ 22F6 Sm ELEMENT OF WITH OVERBAR

⋾DELIMITED EXAMPLE⋷ + ⋾ 22FE Sm SMALL CONTAINS WITH OVERBAR + ⋷ 22F7 Sm SMALL ELEMENT OF WITH OVERBAR

⌈delimited example⌉   ⌈ 2308 Sm LEFT CEILING   ⌉ 2309 Sm RIGHT CEILING

⌉DELIMITED EXAMPLE⌈ +! ⌉ 2309 Sm RIGHT CEILING +! ⌈ 2308 Sm LEFT CEILING

⌊delimited example⌋   ⌊ 230A Sm LEFT FLOOR   ⌋ 230B Sm RIGHT FLOOR

⌋DELIMITED EXAMPLE⌊ +! ⌋ 230B Sm RIGHT FLOOR +! ⌊ 230A Sm LEFT FLOOR

〈delimited example〉   〈 2329 Ps LEFT-POINTING ANGLE BRACKET   〉 232A Pe RIGHT-POINTING ANGLE BRACKET

❨delimited example❩   ❨ 2768 Ps MEDIUM LEFT PARENTHESIS ORNAMENT   ❩ 2769 Pe MEDIUM RIGHT PARENTHESIS ORNAMENT

❪delimited example❫   ❪ 276A Ps MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT   ❫ 276B Pe MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT

❬delimited example❭   ❬ 276C Ps MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT   ❭ 276D Pe MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT

❮delimited example❯   ❮ 276E Ps HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT   ❯ 276F Pe HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT

❰delimited example❱   ❰ 2770 Ps HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT   ❱ 2771 Pe HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT

❲delimited example❳   ❲ 2772 Ps LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT   ❳ 2773 Pe LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT

❴delimited example❵   ❴ 2774 Ps MEDIUM LEFT CURLY BRACKET ORNAMENT   ❵ 2775 Pe MEDIUM RIGHT CURLY BRACKET ORNAMENT

⟃delimited example⟄   ⟃ 27C3 Sm OPEN SUBSET   ⟄ 27C4 Sm OPEN SUPERSET

⟄DELIMITED EXAMPLE⟃ + ⟄ 27C4 Sm OPEN SUPERSET + ⟃ 27C3 Sm OPEN SUBSET

⟅delimited example⟆   ⟅ 27C5 Ps LEFT S-SHAPED BAG DELIMITER   ⟆ 27C6 Pe RIGHT S-SHAPED BAG DELIMITER

⟈delimited example⟉   ⟈ 27C8 Sm REVERSE SOLIDUS PRECEDING SUBSET   ⟉ 27C9 Sm SUPERSET PRECEDING SOLIDUS

⟉DELIMITED EXAMPLE⟈ + ⟉ 27C9 Sm SUPERSET PRECEDING SOLIDUS + ⟈ 27C8 Sm REVERSE SOLIDUS PRECEDING SUBSET

⟕delimited example⟖   ⟕ 27D5 Sm LEFT OUTER JOIN   ⟖ 27D6 Sm RIGHT OUTER JOIN

⟖DELIMITED EXAMPLE⟕ +! ⟖ 27D6 Sm RIGHT OUTER JOIN +! ⟕ 27D5 Sm LEFT OUTER JOIN

⟝DELIMITED EXAMPLE⟞ ! ⟝ 27DD Sm LONG RIGHT TACK ! ⟞ 27DE Sm LONG LEFT TACK

⟞DELIMITED EXAMPLE⟝ + ⟞ 27DE Sm LONG LEFT TACK + ⟝ 27DD Sm LONG RIGHT TACK

⟢delimited example⟣   ⟢ 27E2 Sm WHITE CONCAVE-SIDED DIAMOND WITH LEFTWARDS TICK   ⟣ 27E3 Sm WHITE CONCAVE-SIDED DIAMOND WITH RIGHTWARDS TICK

⟣DELIMITED EXAMPLE⟢ + ⟣ 27E3 Sm WHITE CONCAVE-SIDED DIAMOND WITH RIGHTWARDS TICK + ⟢ 27E2 Sm WHITE CONCAVE-SIDED DIAMOND WITH LEFTWARDS TICK

⟤delimited example⟥   ⟤ 27E4 Sm WHITE SQUARE WITH LEFTWARDS TICK   ⟥ 27E5 Sm WHITE SQUARE WITH RIGHTWARDS TICK

⟥DELIMITED EXAMPLE⟤ + ⟥ 27E5 Sm WHITE SQUARE WITH RIGHTWARDS TICK + ⟤ 27E4 Sm WHITE SQUARE WITH LEFTWARDS TICK

⟦delimited example⟧   ⟦ 27E6 Ps MATHEMATICAL LEFT WHITE SQUARE BRACKET   ⟧ 27E7 Pe MATHEMATICAL RIGHT WHITE SQUARE BRACKET

⟨delimited example⟩   ⟨ 27E8 Ps MATHEMATICAL LEFT ANGLE BRACKET   ⟩ 27E9 Pe MATHEMATICAL RIGHT ANGLE BRACKET

⟪delimited example⟫   ⟪ 27EA Ps MATHEMATICAL LEFT DOUBLE ANGLE BRACKET   ⟫ 27EB Pe MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET

⟬delimited example⟭   ⟬ 27EC Ps MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET   ⟭ 27ED Pe MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET

⟮delimited example⟯   ⟮ 27EE Ps MATHEMATICAL LEFT FLATTENED PARENTHESIS   ⟯ 27EF Pe MATHEMATICAL RIGHT FLATTENED PARENTHESIS

⦃delimited example⦄   ⦃ 2983 Ps LEFT WHITE CURLY BRACKET   ⦄ 2984 Pe RIGHT WHITE CURLY BRACKET

⦅delimited example⦆   ⦅ 2985 Ps LEFT WHITE PARENTHESIS   ⦆ 2986 Pe RIGHT WHITE PARENTHESIS

⦇delimited example⦈   ⦇ 2987 Ps Z NOTATION LEFT IMAGE BRACKET   ⦈ 2988 Pe Z NOTATION RIGHT IMAGE BRACKET

⦉delimited example⦊   ⦉ 2989 Ps Z NOTATION LEFT BINDING BRACKET   ⦊ 298A Pe Z NOTATION RIGHT BINDING BRACKET

⦋delimited example⦌   ⦋ 298B Ps LEFT SQUARE BRACKET WITH UNDERBAR   ⦌ 298C Pe RIGHT SQUARE BRACKET WITH UNDERBAR

⦍delimited example⦐   ⦍ 298D Ps LEFT SQUARE BRACKET WITH TICK IN TOP CORNER   ⦐ 2990 Pe RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER

⦏delimited example⦎   ⦏ 298F Ps LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER   ⦎ 298E Pe RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER

⦑delimited example⦒   ⦑ 2991 Ps LEFT ANGLE BRACKET WITH DOT   ⦒ 2992 Pe RIGHT ANGLE BRACKET WITH DOT

⦓delimited example⦔   ⦓ 2993 Ps LEFT ARC LESS-THAN BRACKET   ⦔ 2994 Pe RIGHT ARC GREATER-THAN BRACKET

⦕delimited example⦖   ⦕ 2995 Ps DOUBLE LEFT ARC GREATER-THAN BRACKET   ⦖ 2996 Pe DOUBLE RIGHT ARC LESS-THAN BRACKET

⦗delimited example⦘   ⦗ 2997 Ps LEFT BLACK TORTOISE SHELL BRACKET   ⦘ 2998 Pe RIGHT BLACK TORTOISE SHELL BRACKET

⦸DELIMITED EXAMPLE⊘ + ⦸ 29B8 Sm CIRCLED REVERSE SOLIDUS + ⊘ 2298 Sm CIRCLED DIVISION SLASH

⧀delimited example⧁   ⧀ 29C0 Sm CIRCLED LESS-THAN   ⧁ 29C1 Sm CIRCLED GREATER-THAN

⧁DELIMITED EXAMPLE⧀ + ⧁ 29C1 Sm CIRCLED GREATER-THAN + ⧀ 29C0 Sm CIRCLED LESS-THAN

⧄delimited example⧅   ⧄ 29C4 Sm SQUARED RISING DIAGONAL SLASH   ⧅ 29C5 Sm SQUARED FALLING DIAGONAL SLASH

⧅DELIMITED EXAMPLE⧄ + ⧅ 29C5 Sm SQUARED FALLING DIAGONAL SLASH + ⧄ 29C4 Sm SQUARED RISING DIAGONAL SLASH

⧏delimited example⧐   ⧏ 29CF Sm LEFT TRIANGLE BESIDE VERTICAL BAR   ⧐ 29D0 Sm VERTICAL BAR BESIDE RIGHT TRIANGLE

⧐DELIMITED EXAMPLE⧏ +! ⧐ 29D0 Sm VERTICAL BAR BESIDE RIGHT TRIANGLE +! ⧏ 29CF Sm LEFT TRIANGLE BESIDE VERTICAL BAR

⧑delimited example⧒   ⧑ 29D1 Sm BOWTIE WITH LEFT HALF BLACK   ⧒ 29D2 Sm BOWTIE WITH RIGHT HALF BLACK

⧒DELIMITED EXAMPLE⧑ +! ⧒ 29D2 Sm BOWTIE WITH RIGHT HALF BLACK +! ⧑ 29D1 Sm BOWTIE WITH LEFT HALF BLACK

⧔delimited example⧕   ⧔ 29D4 Sm TIMES WITH LEFT HALF BLACK   ⧕ 29D5 Sm TIMES WITH RIGHT HALF BLACK

⧕DELIMITED EXAMPLE⧔ +! ⧕ 29D5 Sm TIMES WITH RIGHT HALF BLACK +! ⧔ 29D4 Sm TIMES WITH LEFT HALF BLACK

⧘delimited example⧙   ⧘ 29D8 Ps LEFT WIGGLY FENCE   ⧙ 29D9 Pe RIGHT WIGGLY FENCE

⧚delimited example⧛   ⧚ 29DA Ps LEFT DOUBLE WIGGLY FENCE   ⧛ 29DB Pe RIGHT DOUBLE WIGGLY FENCE

⧵DELIMITED EXAMPLE∕ + ⧵ 29F5 Sm REVERSE SOLIDUS OPERATOR + ∕ 2215 Sm DIVISION SLASH

⧸delimited example⧹   ⧸ 29F8 Sm BIG SOLIDUS   ⧹ 29F9 Sm BIG REVERSE SOLIDUS

⧹DELIMITED EXAMPLE⧸ + ⧹ 29F9 Sm BIG REVERSE SOLIDUS + ⧸ 29F8 Sm BIG SOLIDUS

⧼delimited example⧽   ⧼ 29FC Ps LEFT-POINTING CURVED ANGLE BRACKET   ⧽ 29FD Pe RIGHT-POINTING CURVED ANGLE BRACKET

⨫delimited example⨬   ⨫ 2A2B Sm MINUS SIGN WITH FALLING DOTS   ⨬ 2A2C Sm MINUS SIGN WITH RISING DOTS

⨬DELIMITED EXAMPLE⨫ + ⨬ 2A2C Sm MINUS SIGN WITH RISING DOTS + ⨫ 2A2B Sm MINUS SIGN WITH FALLING DOTS

⨭delimited example⨮   ⨭ 2A2D Sm PLUS SIGN IN LEFT HALF CIRCLE   ⨮ 2A2E Sm PLUS SIGN IN RIGHT HALF CIRCLE

⨮DELIMITED EXAMPLE⨭ +! ⨮ 2A2E Sm PLUS SIGN IN RIGHT HALF CIRCLE +! ⨭ 2A2D Sm PLUS SIGN IN LEFT HALF CIRCLE

⨴delimited example⨵   ⨴ 2A34 Sm MULTIPLICATION SIGN IN LEFT HALF CIRCLE   ⨵ 2A35 Sm MULTIPLICATION SIGN IN RIGHT HALF CIRCLE

⨵DELIMITED EXAMPLE⨴ +! ⨵ 2A35 Sm MULTIPLICATION SIGN IN RIGHT HALF CIRCLE +! ⨴ 2A34 Sm MULTIPLICATION SIGN IN LEFT HALF CIRCLE

⨼delimited example⨽   ⨼ 2A3C Sm INTERIOR PRODUCT   ⨽ 2A3D Sm RIGHTHAND INTERIOR PRODUCT

⨽DELIMITED EXAMPLE⨼ + ⨽ 2A3D Sm RIGHTHAND INTERIOR PRODUCT + ⨼ 2A3C Sm INTERIOR PRODUCT

⩤delimited example⩥   ⩤ 2A64 Sm Z NOTATION DOMAIN ANTIRESTRICTION   ⩥ 2A65 Sm Z NOTATION RANGE ANTIRESTRICTION

⩥DELIMITED EXAMPLE⩤ + ⩥ 2A65 Sm Z NOTATION RANGE ANTIRESTRICTION + ⩤ 2A64 Sm Z NOTATION DOMAIN ANTIRESTRICTION

⩹delimited example⩺   ⩹ 2A79 Sm LESS-THAN WITH CIRCLE INSIDE   ⩺ 2A7A Sm GREATER-THAN WITH CIRCLE INSIDE

⩺DELIMITED EXAMPLE⩹ + ⩺ 2A7A Sm GREATER-THAN WITH CIRCLE INSIDE + ⩹ 2A79 Sm LESS-THAN WITH CIRCLE INSIDE

⩽delimited example⩾   ⩽ 2A7D Sm LESS-THAN OR SLANTED EQUAL TO   ⩾ 2A7E Sm GREATER-THAN OR SLANTED EQUAL TO

⩾DELIMITED EXAMPLE⩽ + ⩾ 2A7E Sm GREATER-THAN OR SLANTED EQUAL TO + ⩽ 2A7D Sm LESS-THAN OR SLANTED EQUAL TO

⩿delimited example⪀   ⩿ 2A7F Sm LESS-THAN OR SLANTED EQUAL TO WITH DOT INSIDE   ⪀ 2A80 Sm GREATER-THAN OR SLANTED EQUAL TO WITH DOT INSIDE

⪀DELIMITED EXAMPLE⩿ + ⪀ 2A80 Sm GREATER-THAN OR SLANTED EQUAL TO WITH DOT INSIDE + ⩿ 2A7F Sm LESS-THAN OR SLANTED EQUAL TO WITH DOT INSIDE

⪁delimited example⪂   ⪁ 2A81 Sm LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE   ⪂ 2A82 Sm GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE

⪂DELIMITED EXAMPLE⪁ + ⪂ 2A82 Sm GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE + ⪁ 2A81 Sm LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE

⪃DELIMITED EXAMPLE⪄ ! ⪃ 2A83 Sm LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE RIGHT ! ⪄ 2A84 Sm GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE LEFT

⪄DELIMITED EXAMPLE⪃ + ⪄ 2A84 Sm GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE LEFT + ⪃ 2A83 Sm LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE RIGHT

⪋delimited example⪌   ⪋ 2A8B Sm LESS-THAN ABOVE DOUBLE-LINE EQUAL ABOVE GREATER-THAN   ⪌ 2A8C Sm GREATER-THAN ABOVE DOUBLE-LINE EQUAL ABOVE LESS-THAN

⪌DELIMITED EXAMPLE⪋ + ⪌ 2A8C Sm GREATER-THAN ABOVE DOUBLE-LINE EQUAL ABOVE LESS-THAN + ⪋ 2A8B Sm LESS-THAN ABOVE DOUBLE-LINE EQUAL ABOVE GREATER-THAN

⪑delimited example⪒   ⪑ 2A91 Sm LESS-THAN ABOVE GREATER-THAN ABOVE DOUBLE-LINE EQUAL   ⪒ 2A92 Sm GREATER-THAN ABOVE LESS-THAN ABOVE DOUBLE-LINE EQUAL

⪒DELIMITED EXAMPLE⪑ + ⪒ 2A92 Sm GREATER-THAN ABOVE LESS-THAN ABOVE DOUBLE-LINE EQUAL + ⪑ 2A91 Sm LESS-THAN ABOVE GREATER-THAN ABOVE DOUBLE-LINE EQUAL

⪓delimited example⪔   ⪓ 2A93 Sm LESS-THAN ABOVE SLANTED EQUAL ABOVE GREATER-THAN ABOVE SLANTED EQUAL   ⪔ 2A94 Sm GREATER-THAN ABOVE SLANTED EQUAL ABOVE LESS-THAN ABOVE SLANTED EQUAL

⪔DELIMITED EXAMPLE⪓ + ⪔ 2A94 Sm GREATER-THAN ABOVE SLANTED EQUAL ABOVE LESS-THAN ABOVE SLANTED EQUAL + ⪓ 2A93 Sm LESS-THAN ABOVE SLANTED EQUAL ABOVE GREATER-THAN ABOVE SLANTED EQUAL

⪕delimited example⪖   ⪕ 2A95 Sm SLANTED EQUAL TO OR LESS-THAN   ⪖ 2A96 Sm SLANTED EQUAL TO OR GREATER-THAN

⪖DELIMITED EXAMPLE⪕ + ⪖ 2A96 Sm SLANTED EQUAL TO OR GREATER-THAN + ⪕ 2A95 Sm SLANTED EQUAL TO OR LESS-THAN

⪗delimited example⪘   ⪗ 2A97 Sm SLANTED EQUAL TO OR LESS-THAN WITH DOT INSIDE   ⪘ 2A98 Sm SLANTED EQUAL TO OR GREATER-THAN WITH DOT INSIDE

⪘DELIMITED EXAMPLE⪗ + ⪘ 2A98 Sm SLANTED EQUAL TO OR GREATER-THAN WITH DOT INSIDE + ⪗ 2A97 Sm SLANTED EQUAL TO OR LESS-THAN WITH DOT INSIDE

⪙delimited example⪚   ⪙ 2A99 Sm DOUBLE-LINE EQUAL TO OR LESS-THAN   ⪚ 2A9A Sm DOUBLE-LINE EQUAL TO OR GREATER-THAN

⪚DELIMITED EXAMPLE⪙ + ⪚ 2A9A Sm DOUBLE-LINE EQUAL TO OR GREATER-THAN + ⪙ 2A99 Sm DOUBLE-LINE EQUAL TO OR LESS-THAN

⪛delimited example⪜   ⪛ 2A9B Sm DOUBLE-LINE SLANTED EQUAL TO OR LESS-THAN   ⪜ 2A9C Sm DOUBLE-LINE SLANTED EQUAL TO OR GREATER-THAN

⪜DELIMITED EXAMPLE⪛ + ⪜ 2A9C Sm DOUBLE-LINE SLANTED EQUAL TO OR GREATER-THAN + ⪛ 2A9B Sm DOUBLE-LINE SLANTED EQUAL TO OR LESS-THAN

⪡delimited example⪢   ⪡ 2AA1 Sm DOUBLE NESTED LESS-THAN   ⪢ 2AA2 Sm DOUBLE NESTED GREATER-THAN

⪢DELIMITED EXAMPLE⪡ + ⪢ 2AA2 Sm DOUBLE NESTED GREATER-THAN + ⪡ 2AA1 Sm DOUBLE NESTED LESS-THAN

⪦delimited example⪧   ⪦ 2AA6 Sm LESS-THAN CLOSED BY CURVE   ⪧ 2AA7 Sm GREATER-THAN CLOSED BY CURVE

⪧DELIMITED EXAMPLE⪦ + ⪧ 2AA7 Sm GREATER-THAN CLOSED BY CURVE + ⪦ 2AA6 Sm LESS-THAN CLOSED BY CURVE

⪨delimited example⪩   ⪨ 2AA8 Sm LESS-THAN CLOSED BY CURVE ABOVE SLANTED EQUAL   ⪩ 2AA9 Sm GREATER-THAN CLOSED BY CURVE ABOVE SLANTED EQUAL

⪩DELIMITED EXAMPLE⪨ + ⪩ 2AA9 Sm GREATER-THAN CLOSED BY CURVE ABOVE SLANTED EQUAL + ⪨ 2AA8 Sm LESS-THAN CLOSED BY CURVE ABOVE SLANTED EQUAL

⪪delimited example⪫   ⪪ 2AAA Sm SMALLER THAN   ⪫ 2AAB Sm LARGER THAN

⪫DELIMITED EXAMPLE⪪ + ⪫ 2AAB Sm LARGER THAN + ⪪ 2AAA Sm SMALLER THAN

⪬delimited example⪭   ⪬ 2AAC Sm SMALLER THAN OR EQUAL TO   ⪭ 2AAD Sm LARGER THAN OR EQUAL TO

⪭DELIMITED EXAMPLE⪬ + ⪭ 2AAD Sm LARGER THAN OR EQUAL TO + ⪬ 2AAC Sm SMALLER THAN OR EQUAL TO

⪯delimited example⪰   ⪯ 2AAF Sm PRECEDES ABOVE SINGLE-LINE EQUALS SIGN   ⪰ 2AB0 Sm SUCCEEDS ABOVE SINGLE-LINE EQUALS SIGN

⪰DELIMITED EXAMPLE⪯ + ⪰ 2AB0 Sm SUCCEEDS ABOVE SINGLE-LINE EQUALS SIGN + ⪯ 2AAF Sm PRECEDES ABOVE SINGLE-LINE EQUALS SIGN

⪳delimited example⪴   ⪳ 2AB3 Sm PRECEDES ABOVE EQUALS SIGN   ⪴ 2AB4 Sm SUCCEEDS ABOVE EQUALS SIGN

⪴DELIMITED EXAMPLE⪳ + ⪴ 2AB4 Sm SUCCEEDS ABOVE EQUALS SIGN + ⪳ 2AB3 Sm PRECEDES ABOVE EQUALS SIGN

⪻delimited example⪼   ⪻ 2ABB Sm DOUBLE PRECEDES   ⪼ 2ABC Sm DOUBLE SUCCEEDS

⪼DELIMITED EXAMPLE⪻ + ⪼ 2ABC Sm DOUBLE SUCCEEDS + ⪻ 2ABB Sm DOUBLE PRECEDES

⪽delimited example⪾   ⪽ 2ABD Sm SUBSET WITH DOT   ⪾ 2ABE Sm SUPERSET WITH DOT

⪾DELIMITED EXAMPLE⪽ + ⪾ 2ABE Sm SUPERSET WITH DOT + ⪽ 2ABD Sm SUBSET WITH DOT

⪿delimited example⫀   ⪿ 2ABF Sm SUBSET WITH PLUS SIGN BELOW   ⫀ 2AC0 Sm SUPERSET WITH PLUS SIGN BELOW

⫀DELIMITED EXAMPLE⪿ + ⫀ 2AC0 Sm SUPERSET WITH PLUS SIGN BELOW + ⪿ 2ABF Sm SUBSET WITH PLUS SIGN BELOW

⫁delimited example⫂   ⫁ 2AC1 Sm SUBSET WITH MULTIPLICATION SIGN BELOW   ⫂ 2AC2 Sm SUPERSET WITH MULTIPLICATION SIGN BELOW

⫂DELIMITED EXAMPLE⫁ + ⫂ 2AC2 Sm SUPERSET WITH MULTIPLICATION SIGN BELOW + ⫁ 2AC1 Sm SUBSET WITH MULTIPLICATION SIGN BELOW

⫃delimited example⫄   ⫃ 2AC3 Sm SUBSET OF OR EQUAL TO WITH DOT ABOVE   ⫄ 2AC4 Sm SUPERSET OF OR EQUAL TO WITH DOT ABOVE

⫄DELIMITED EXAMPLE⫃ + ⫄ 2AC4 Sm SUPERSET OF OR EQUAL TO WITH DOT ABOVE + ⫃ 2AC3 Sm SUBSET OF OR EQUAL TO WITH DOT ABOVE

⫅delimited example⫆   ⫅ 2AC5 Sm SUBSET OF ABOVE EQUALS SIGN   ⫆ 2AC6 Sm SUPERSET OF ABOVE EQUALS SIGN

⫆DELIMITED EXAMPLE⫅ + ⫆ 2AC6 Sm SUPERSET OF ABOVE EQUALS SIGN + ⫅ 2AC5 Sm SUBSET OF ABOVE EQUALS SIGN

⫍delimited example⫎   ⫍ 2ACD Sm SQUARE LEFT OPEN BOX OPERATOR   ⫎ 2ACE Sm SQUARE RIGHT OPEN BOX OPERATOR

⫎DELIMITED EXAMPLE⫍ +! ⫎ 2ACE Sm SQUARE RIGHT OPEN BOX OPERATOR +! ⫍ 2ACD Sm SQUARE LEFT OPEN BOX OPERATOR

⫏delimited example⫐   ⫏ 2ACF Sm CLOSED SUBSET   ⫐ 2AD0 Sm CLOSED SUPERSET

⫐DELIMITED EXAMPLE⫏ + ⫐ 2AD0 Sm CLOSED SUPERSET + ⫏ 2ACF Sm CLOSED SUBSET

⫑delimited example⫒   ⫑ 2AD1 Sm CLOSED SUBSET OR EQUAL TO   ⫒ 2AD2 Sm CLOSED SUPERSET OR EQUAL TO

⫒DELIMITED EXAMPLE⫑ + ⫒ 2AD2 Sm CLOSED SUPERSET OR EQUAL TO + ⫑ 2AD1 Sm CLOSED SUBSET OR EQUAL TO

⫓delimited example⫔   ⫓ 2AD3 Sm SUBSET ABOVE SUPERSET   ⫔ 2AD4 Sm SUPERSET ABOVE SUBSET

⫔DELIMITED EXAMPLE⫓ + ⫔ 2AD4 Sm SUPERSET ABOVE SUBSET + ⫓ 2AD3 Sm SUBSET ABOVE SUPERSET

⫕delimited example⫖   ⫕ 2AD5 Sm SUBSET ABOVE SUBSET   ⫖ 2AD6 Sm SUPERSET ABOVE SUPERSET

⫖DELIMITED EXAMPLE⫕ + ⫖ 2AD6 Sm SUPERSET ABOVE SUPERSET + ⫕ 2AD5 Sm SUBSET ABOVE SUBSET

⫞DELIMITED EXAMPLE⊦ + ⫞ 2ADE Sm SHORT LEFT TACK + ⊦ 22A6 Sm ASSERTION

⫣DELIMITED EXAMPLE⊩ + ⫣ 2AE3 Sm DOUBLE VERTICAL BAR LEFT TURNSTILE + ⊩ 22A9 Sm FORCES

⫤DELIMITED EXAMPLE⊨ + ⫤ 2AE4 Sm VERTICAL BAR DOUBLE LEFT TURNSTILE + ⊨ 22A8 Sm TRUE

⫥DELIMITED EXAMPLE⊫ + ⫥ 2AE5 Sm DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE + ⊫ 22AB Sm DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE

⫬delimited example⫭   ⫬ 2AEC Sm DOUBLE STROKE NOT SIGN   ⫭ 2AED Sm REVERSED DOUBLE STROKE NOT SIGN

⫭DELIMITED EXAMPLE⫬ + ⫭ 2AED Sm REVERSED DOUBLE STROKE NOT SIGN + ⫬ 2AEC Sm DOUBLE STROKE NOT SIGN

⫷delimited example⫸   ⫷ 2AF7 Sm TRIPLE NESTED LESS-THAN   ⫸ 2AF8 Sm TRIPLE NESTED GREATER-THAN

⫸DELIMITED EXAMPLE⫷ + ⫸ 2AF8 Sm TRIPLE NESTED GREATER-THAN + ⫷ 2AF7 Sm TRIPLE NESTED LESS-THAN

⫹delimited example⫺   ⫹ 2AF9 Sm DOUBLE-LINE SLANTED LESS-THAN OR EQUAL TO   ⫺ 2AFA Sm DOUBLE-LINE SLANTED GREATER-THAN OR EQUAL TO

⫺DELIMITED EXAMPLE⫹ + ⫺ 2AFA Sm DOUBLE-LINE SLANTED GREATER-THAN OR EQUAL TO + ⫹ 2AF9 Sm DOUBLE-LINE SLANTED LESS-THAN OR EQUAL TO

⸂delimited example⸃   ⸂ 2E02 Pi LEFT SUBSTITUTION BRACKET   ⸃ 2E03 Pf RIGHT SUBSTITUTION BRACKET

⸄delimited example⸅   ⸄ 2E04 Pi LEFT DOTTED SUBSTITUTION BRACKET   ⸅ 2E05 Pf RIGHT DOTTED SUBSTITUTION BRACKET

⸉delimited example⸊   ⸉ 2E09 Pi LEFT TRANSPOSITION BRACKET   ⸊ 2E0A Pf RIGHT TRANSPOSITION BRACKET

⸌delimited example⸍   ⸌ 2E0C Pi LEFT RAISED OMISSION BRACKET   ⸍ 2E0D Pf RIGHT RAISED OMISSION BRACKET

⸜delimited example⸝   ⸜ 2E1C Pi LEFT LOW PARAPHRASE BRACKET   ⸝ 2E1D Pf RIGHT LOW PARAPHRASE BRACKET

⸠delimited example⸡   ⸠ 2E20 Pi LEFT VERTICAL BAR WITH QUILL   ⸡ 2E21 Pf RIGHT VERTICAL BAR WITH QUILL

⸢delimited example⸣   ⸢ 2E22 Ps TOP LEFT HALF BRACKET   ⸣ 2E23 Pe TOP RIGHT HALF BRACKET

⸤delimited example⸥   ⸤ 2E24 Ps BOTTOM LEFT HALF BRACKET   ⸥ 2E25 Pe BOTTOM RIGHT HALF BRACKET

⸦delimited example⸧   ⸦ 2E26 Ps LEFT SIDEWAYS U BRACKET   ⸧ 2E27 Pe RIGHT SIDEWAYS U BRACKET

⸨delimited example⸩   ⸨ 2E28 Ps LEFT DOUBLE PARENTHESIS   ⸩ 2E29 Pe RIGHT DOUBLE PARENTHESIS

〈delimited example〉   〈 3008 Ps LEFT ANGLE BRACKET   〉 3009 Pe RIGHT ANGLE BRACKET

《delimited example》   《 300A Ps LEFT DOUBLE ANGLE BRACKET   》 300B Pe RIGHT DOUBLE ANGLE BRACKET

「DELIMITED EXAMPLE」 ~ 「 300C Ps LEFT CORNER BRACKET ~ 」 300D Pe RIGHT CORNER BRACKET

『DELIMITED EXAMPLE』 ~ 『 300E Ps LEFT WHITE CORNER BRACKET ~ 』 300F Pe RIGHT WHITE CORNER BRACKET

【delimited example】   【 3010 Ps LEFT BLACK LENTICULAR BRACKET   】 3011 Pe RIGHT BLACK LENTICULAR BRACKET

〔delimited example〕   〔 3014 Ps LEFT TORTOISE SHELL BRACKET   〕 3015 Pe RIGHT TORTOISE SHELL BRACKET

〖delimited example〗   〖 3016 Ps LEFT WHITE LENTICULAR BRACKET   〗 3017 Pe RIGHT WHITE LENTICULAR BRACKET

〘delimited example〙   〘 3018 Ps LEFT WHITE TORTOISE SHELL BRACKET   〙 3019 Pe RIGHT WHITE TORTOISE SHELL BRACKET

〚delimited example〛   〚 301A Ps LEFT WHITE SQUARE BRACKET   〛 301B Pe RIGHT WHITE SQUARE BRACKET

﹙DELIMITED EXAMPLE﹚ * ﹙ FE59 Ps SMALL LEFT PARENTHESIS * ﹚ FE5A Pe SMALL RIGHT PARENTHESIS

﹛DELIMITED EXAMPLE﹜ * ﹛ FE5B Ps SMALL LEFT CURLY BRACKET * ﹜ FE5C Pe SMALL RIGHT CURLY BRACKET

﹝DELIMITED EXAMPLE﹞ * ﹝ FE5D Ps SMALL LEFT TORTOISE SHELL BRACKET * ﹞ FE5E Pe SMALL RIGHT TORTOISE SHELL BRACKET

﹤DELIMITED EXAMPLE﹥ * ﹤ FE64 Sm SMALL LESS-THAN SIGN * ﹥ FE65 Sm SMALL GREATER-THAN SIGN

﹥DELIMITED EXAMPLE﹤ +* ﹥ FE65 Sm SMALL GREATER-THAN SIGN +* ﹤ FE64 Sm SMALL LESS-THAN SIGN

(DELIMITED EXAMPLE) * ( FF08 Ps FULLWIDTH LEFT PARENTHESIS * ) FF09 Pe FULLWIDTH RIGHT PARENTHESIS

<DELIMITED EXAMPLE> * < FF1C Sm FULLWIDTH LESS-THAN SIGN * > FF1E Sm FULLWIDTH GREATER-THAN SIGN

>DELIMITED EXAMPLE< +* > FF1E Sm FULLWIDTH GREATER-THAN SIGN +* < FF1C Sm FULLWIDTH LESS-THAN SIGN

[DELIMITED EXAMPLE] * [ FF3B Ps FULLWIDTH LEFT SQUARE BRACKET * ] FF3D Pe FULLWIDTH RIGHT SQUARE BRACKET

{DELIMITED EXAMPLE} * { FF5B Ps FULLWIDTH LEFT CURLY BRACKET * } FF5D Pe FULLWIDTH RIGHT CURLY BRACKET

⦅DELIMITED EXAMPLE⦆ * ⦅ FF5F Ps FULLWIDTH LEFT WHITE PARENTHESIS * ⦆ FF60 Pe FULLWIDTH RIGHT WHITE PARENTHESIS

「DELIMITED EXAMPLE」 ~* 「 FF62 Ps HALFWIDTH LEFT CORNER BRACKET ~* 」 FF63 Pe HALFWIDTH RIGHT CORNER BRACKET

293 pairs.

p5pRT commented 12 years ago

From @cpansprout

On Sun Sep 04 13​:20​:10 2011\, tom christiansen wrote​:

"Father Chrysostomos via RT" \perlbug\-followup@&#8203;perl\.org wrote on Sun\, 04 Sep 2011 12​:15​:50 PDT​:

How do you defined ‘unused’? How many people know what version of Unicode their perl and their editor (actually\, their fonts) are using\, or whether they are even the same? What if I use a new character as a delimiter\, not realising that perl considers it to be ‘unused’\, and the next upgrade will break my script?

I don't understand what you're so worried about. The Unicode Pattern_Syntax character property has strong stability guarantees.

The stability guarantees do not guarantee anything if one’s editor and one’s perl installation have differing Unicode versions.

Why not use just the open/left thingies in BidiMirroring that are pattern syntax as openers and the corresponding mirrored bit for closers? Point this at BidiMirroring.txt...

My point was that I might not be trying to use paired delimiters at all. If I have shiny new fonts with something that looks like a nice delimiter as the glyph for U+10F001 (some time in the future)\, but my perl installation has the *previous* version of Unicode (before the one that introduced U+10F001)\, then a perl upgrade may break my code if it turns out U+10F001 is one of those paired delimiters and I was not aware of it.

This is not FUD\, either\, as there was a CPAN module that had to change to work in 5.14\, because of the way Unicode identifiers are parsed.

So use of any Unicode (non-ASCII) outside of comments and strings is going to cause problems. I don’t know of an elegant solution to that\, but until we have such a solution\, I don’t think we should spread the problem further by introducing paired delimiters.

Actually\, I do have an elegant solution​: Provide a plug-in mechanism that allows *modules* to do their own delimiter pairing.

p5pRT commented 12 years ago

From tchrist@perl.com

On Sun Sep 04 13​:20​:10 2011\, tom christiansen wrote​:

"Father Chrysostomos via RT" \perlbug\-followup@&#8203;perl\.org wrote on Sun\, 04 Sep 2011 12​:15​:50 PDT​:

How do you defined ‘unused’? How many people know what version of Unicode their perl and their editor (actually\, their fonts) are using\, or whether they are even the same? What if I use a new character as a delimiter\, not realising that perl considers it to be ‘unused’\, and the next upgrade will break my script?

I don't understand what you're so worried about. The Unicode Pattern_Syntax character property has strong stability guarantees.

The stability guarantees do not guarantee anything if one’s editor and one’s perl installation have differing Unicode versions.

No\, it does *NOT* matter​:

  The Pattern_Syntax and Pattern_White_Space properties are immutable   code point properties\, which means that their property values for   all Unicode code points will never change.

Therefore the rest of what you said doesn't matter either\, because it is based on a false premise. That premise is that the Pattern_Syntax property might change from one to release to the next.

It won't.

--tom

p5pRT commented 12 years ago

From tchrist@perl.com

This is not FUD\, either\, as there was a CPAN module that had to change to work in 5.14\, because of the way Unicode identifiers are parsed.

Yes\, it *is* FUD. What changed is that Perl stopped applying it own hack and started following the rules. We now apply the IDC property\, while before we didn't\, which caused a Pattern Syntax collision.

The rules are very clear. Perl wasn't following them.

Please familiarize yourself with this​:

  http​://unicode.org/policies/stability_policy.html

  3.0.1+

  * The Case_Folding property value is limited so that no string when case   folded expands to more than 3× in length (measured in code units). ======> * Once a character is ID_Continue\, it must continue to be so in all   future versions. ======> * If a character is ID_Start then it must also be ID_Continue. ======> * Once a character is ID_Start\, it must continue to be so in all   future versions. ======> * Once a character is XID_Continue\, it must continue to be so in all   future versions. ======> * If a character is XID_Start then it must also be XID_Continue. ======> * Once a character is XID_Start\, it must continue to be so in all   future versions.

  3.1.0+
  * The Noncharacter_Code_Point property is an immutable code point property\,   which means that its property values for all Unicode code points will   never change.

  4.0.0+
  * The property values for the bidirectional properties Bidi_Class and   Bidi_Mirrored preserve canonical equivalence. The set of characters having   General_CategoryNd will always be the same as the set of characters   having Numeric_Typede.   * Once a character is assigned\, its Decomposition_Mapping will never change.

  4.1.0+   * All characters with the Lowercase property and all characters with   the Uppercase property have the Alphabetic property . ======> * The Pattern_Syntax and Pattern_White_Space properties are   immutable code point properties\, which means that their property   values for all Unicode code points will never change. ======> * If a character has the Pattern_Syntax or Pattern_White_Space   property\, then it cannot have the ID_Continue or   XID_Continue property.

That suffices.

--tom

p5pRT commented 12 years ago

From perl-diddler@tlinx.org

Father Chrysostomos via RT wrote​:

My point was that I might not be trying to use paired delimiters at all. If I have shiny new fonts with something that looks like a nice delimiter as the glyph for U+10F001 (some time in the future)\, but my perl installation has the *previous* version of Unicode (before the one that introduced U+10F001)\, then a perl upgrade may break my code if it turns out U+10F001 is one of those paired delimiters and I was not aware of it.

So you are using a character that's in the 'Reserved' range and NOT in the User-defined area. And your previous version says\, ok\, that's a char in the reserved range. I can either (A)\, throw an error because the user is using an undefined character with unknown properties\, or (B) allow them to use it assuming that they know what they are doing.

Which behavior do you prefer?

If you claim to know what you are doing\, then using characters that are in the reserved range of the perl you are using\, before you install a version of perl that has those ranges defined (and their properties)\, then you are taking responsibility for any problems that occur.

If you expect perl to thow an error anytime it sees a character in an undefined range used\, (much as it would if it encountered an illegal encoding)\, I think it would cause much larger headaches.

But you admit to using a version of perl that is outdated for the characters that you are using\, and then want to complain when perl is updated and those characters 'gain' properties?

Perl might also be upgraded\, and those characters are chosen to be standard operators in a 'next generation' as well -- with the designers believing that since those characters were never defined before (i.e. were reserved)\, no one should be using them. Then you want them to hold things up because you saw a shiny new glyph in a font that is using some undefined area?

This is how it sounds to me\, am I misunderstanding something?

p5pRT commented 12 years ago

From @cpansprout

On Sun Sep 11 15​:10​:16 2011\, tom christiansen wrote​:

This is not FUD\, either\, as there was a CPAN module that had to change to work in 5.14\, because of the way Unicode identifiers are parsed.

Yes\, it *is* FUD. What changed is that Perl stopped applying it own hack and started following the rules. We now apply the IDC property\, while before we didn't\, which caused a Pattern Syntax collision.

The rules are very clear. Perl wasn't following them.

Please familiarize yourself with this​:

http&#8203;://unicode\.org/policies/stability\_policy\.html

3\.0\.1\+

\* The Case\_Folding property value is limited so that no string when

case folded expands to more than 3� in length (measured in code units). ======> * Once a character is ID_Continue\, it must continue to be so in all future versions. ======> * If a character is ID_Start then it must also be ID_Continue. ======> * Once a character is ID_Start\, it must continue to be so in all future versions. ======> * Once a character is XID_Continue\, it must continue to be so in all future versions. ======> * If a character is XID_Start then it must also be XID_Continue. ======> * Once a character is XID_Start\, it must continue to be so in all future versions.

3\.1\.0\+
\* The Noncharacter\_Code\_Point property is an immutable code point

property\, which means that its property values for all Unicode code points will never change.

4\.0\.0\+
\* The property values for the bidirectional properties Bidi\_Class and
  Bidi\_Mirrored preserve canonical equivalence\. The set of characters

having General_CategoryNd will always be the same as the set of characters having Numeric_Typede. * Once a character is assigned\, its Decomposition_Mapping will never change.

4\.1\.0\+
\* All characters with the Lowercase property and all characters with
  the Uppercase property have the Alphabetic property \.

======> * The Pattern_Syntax and Pattern_White_Space properties are immutable code point properties\, which means that their property values for all Unicode code points will never change.

Does that mean no new characters will ever become Pattern_Syntax?

If that’s the case\, it addresses my concerns. However\, I *still* think this should be a plugin/module thing.

======> * If a character has the Pattern_Syntax or Pattern_White_Space property\, then it cannot have the ID_Continue or XID_Continue property.

That suffices.

--tom

p5pRT commented 12 years ago

From @cpansprout

On Sun Sep 11 15​:15​:24 2011\, LAWalsh wrote​:

Perl might also be upgraded\, and those characters are chosen to be standard operators in a 'next generation' as well -- with the designers believing that since those characters were never defined before (i.e. were reserved)\, no one should be using them. Then you want them to hold things up because you saw a shiny new glyph in a font that is using some undefined area?

Not an undefined area\, but the next version of Unicode that Perl hasn’t upgraded to yet.

But from what Tom says\, it sounds as though this is not actually a concern.

p5pRT commented 12 years ago

From tchrist@perl.com

[Karl will correct me I'm wrong.]

So you are using a character that's in the 'Reserved' range and NOT in the User-defined area. And your previous version says\, ok\, that's a char in the reserved range. I can either (A)\, throw an error because the user is using an undefined character with unknown properties\, or (B) allow them to use it assuming that they know what they are doing.

Linda\, I agree with the gist of what you're saying.

I should point out however that (A) is not possible. *All* Unicode code points\, **including even unassigned code points**\, have all properties defined on them.

However\, Father Chrysostomos's would-be demo codepoint\, 0x10_F001\, is *not* an unassigned code point​:

  $ uniprops -agnat 10F001   U+10F001 ‹U+10F001› \N{U+10F001}   \pC \p{Co}   \W \S \D \H \V   All Any Assigned Private_Use Is_Private_Use In_Supplementary_Private_Use_Area_B C Other Co Graph Print Zzzz   Supplementary_Private_Use_Area_B Unknown X_POSIX_Graph X_POSIX_Print   Age=2.0 AHex=NO ASCII_Hex_Digit=NO Alpha=NO Alphabetic=NO Alpha=N AHex=N Bidi_C=NO Bidi_Control=NO Bidi_Class=L   Bidi_Class=Left_to_Right BC=L Bidi_C=N Bidi_M=NO Bidi_Mirrored=NO Bidi_M=N Block=Supplementary_Private_Use_Area_B   General_Category=Other Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR   Canonical_Combining_Class=NR Case_Ignorable=NO CI=NO CI=N Cased=NO CE=NO Composition_Exclusion=NO   Changes_When_Casefolded=NO CWCF=NO CWCF=N Changes_When_Casemapped=NO CWCM=NO CWCM=N Changes_When_Lowercased=NO CWL=NO   CWL=N Changes_When_NFKC_Casefolded=NO CWKCF=NO CWKCF=N Changes_When_Titlecased=NO CWT=NO CWT=N   Changes_When_Uppercased=NO CWU=NO CWU=N General_Category=Private_Use Comp_Ex=NO Full_Composition_Exclusion=NO CE=N   Dash=NO Decomposition_Type=None DT=None Default_Ignorable_Code_Point=NO DI=NO DI=N Dep=NO Deprecated=NO Dep=N Dia=NO   Diacritic=NO Dia=N East_Asian_Width=A East_Asian_Width=Ambiguous EA=A Ext=NO Extender=NO Ext=N Comp_Ex=N   General_Category=C General_Category=Co GC=C GC=Co Gr_Base=NO Grapheme_Base=NO Gr_Ext=NO Grapheme_Extend=NO Gr_Base=N   Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Gr_Ext=N Hangul_Syllable_Type=NA   Hangul_Syllable_Type=Not_Applicable HST=NA Hex=NO Hex_Digit=NO Hex=N ID_Continue=NO IDC=NO IDC=N ID_Start=NO IDS=NO   IDS=N Ideo=NO Ideographic=NO Ideo=N IDS_Binary_Operator=NO IDSB=NO IDSB=N IDS_Trinary_Operator=NO IDST=NO IDST=N   Join_C=NO Join_Control=NO Join_C=N Joining_Group=No_Joining_Group JG=No_Joining_Group Joining_Type=Non_Joining JT=U   Joining_Type=U Line_Break=Unknown LB=XX Line_Break=XX LOE=NO Logical_Order_Exception=NO LOE=N Lower=NO Lowercase=NO   Lower=N Math=NO NChar=NO Noncharacter_Code_Point=NO NFC_Quick_Check=Y NFC_Quick_Check=Yes NFCQC=Y NFD_Quick_Check=Y   NFD_Quick_Check=Yes NFDQC=Y NFKC_Quick_Check=Y NFKC_Quick_Check=Yes NFKCQC=Y NFKD_Quick_Check=Y NFKD_Quick_Check=Yes   NFKDQC=Y NChar=N Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Pat_Syn=NO Pattern_Syntax=NO Pat_WS=NO   Pattern_White_Space=NO Pat_Syn=N Pat_WS=N Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0   Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0   Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 QMark=NO Quotation_Mark=NO QMark=N Radical=NO   Script=Unknown SC=Zzzz Script=Zzzz SD=NO Soft_Dotted=NO Sentence_Break=Other SB=XX Sentence_Break=XX SD=N Space=NO   STerm=NO Term=NO Terminal_Punctuation=NO Term=N UIdeo=NO Unified_Ideograph=NO UIdeo=N Upper=NO Uppercase=NO Upper=N   Variation_Selector=NO VS=NO VS=N White_Space=NO WSpace=NO Space=N Word_Break=Other WB=XX Word_Break=XX XID_Continue=NO   XIDC=NO XIDC=N XID_Start=NO XIDS=NO XIDS=N

It is an assigned code point in the block called Supplementary Private Use Area B\, and has been with us since Unicode 2.0 as its age property shows.

Perhaps a better example would be U+090001\, as that one is actualy unassigned​:

  $ uniprops -agnat 09F001   U+9F001 ‹U+9F001› \N{U+9F001}   \pC \p{Cn}   \W \S \D \H \V   All Any In_NoBlock C Other Cn Unassigned No_Block Zzzz Unknown   Age=Unassigned AHex=NO ASCII_Hex_Digit=NO Alpha=NO Alphabetic=NO Alpha=N AHex=N Bidi_C=NO Bidi_Control=NO Bidi_Class=L   Bidi_Class=Left_to_Right BC=L Bidi_C=N Bidi_M=NO Bidi_Mirrored=NO Bidi_M=N Block=No_Block General_Category=Other   Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR   Case_Ignorable=NO CI=NO CI=N Cased=NO CE=NO Composition_Exclusion=NO Changes_When_Casefolded=NO CWCF=NO CWCF=N   Changes_When_Casemapped=NO CWCM=NO CWCM=N Changes_When_Lowercased=NO CWL=NO CWL=N Changes_When_NFKC_Casefolded=NO   CWKCF=NO CWKCF=N Changes_When_Titlecased=NO CWT=NO CWT=N Changes_When_Uppercased=NO CWU=NO CWU=N   General_Category=Unassigned Comp_Ex=NO Full_Composition_Exclusion=NO CE=N Dash=NO Decomposition_Type=None DT=None   Default_Ignorable_Code_Point=NO DI=NO DI=N Dep=NO Deprecated=NO Dep=N Dia=NO Diacritic=NO Dia=N East_Asian_Width=N   East_Asian_Width=Neutral EA=N Ext=NO Extender=NO Ext=N Comp_Ex=N General_Category=C General_Category=Cn GC=C GC=Cn   Gr_Base=NO Grapheme_Base=NO Gr_Ext=NO Grapheme_Extend=NO Gr_Base=N Grapheme_Cluster_Break=Other GCB=XX   Grapheme_Cluster_Break=XX Gr_Ext=N Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Hex=NO   Hex_Digit=NO Hex=N ID_Continue=NO IDC=NO IDC=N ID_Start=NO IDS=NO IDS=N Ideo=NO Ideographic=NO Ideo=N   IDS_Binary_Operator=NO IDSB=NO IDSB=N IDS_Trinary_Operator=NO IDST=NO IDST=N Join_C=NO Join_Control=NO Join_C=N   Joining_Group=No_Joining_Group JG=No_Joining_Group Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=Unknown   LB=XX Line_Break=XX LOE=NO Logical_Order_Exception=NO LOE=N Lower=NO Lowercase=NO Lower=N Math=NO NChar=NO   Noncharacter_Code_Point=NO NFC_Quick_Check=Y NFC_Quick_Check=Yes NFCQC=Y NFD_Quick_Check=Y NFD_Quick_Check=Yes NFDQC=Y   NFKC_Quick_Check=Y NFKC_Quick_Check=Yes NFKCQC=Y NFKD_Quick_Check=Y NFKD_Quick_Check=Yes NFKDQC=Y NChar=N   Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Pat_Syn=NO Pattern_Syntax=NO Pat_WS=NO Pattern_White_Space=NO   Pat_Syn=N Pat_WS=N Present_In=Unassigned IN=Unassigned QMark=NO Quotation_Mark=NO QMark=N Radical=NO Script=Unknown   SC=Zzzz Script=Zzzz SD=NO Soft_Dotted=NO Sentence_Break=Other SB=XX Sentence_Break=XX SD=N Space=NO STerm=NO Term=NO   Terminal_Punctuation=NO Term=N UIdeo=NO Unified_Ideograph=NO UIdeo=N Upper=NO Uppercase=NO Upper=N   Variation_Selector=NO VS=NO VS=N White_Space=NO WSpace=NO Space=N Word_Break=Other WB=XX Word_Break=XX XID_Continue=NO   XIDC=NO XIDC=N XID_Start=NO XIDS=NO XIDS=N

Both those code points have the property Pattern_Syntax=No. This is guaranteed never to become Pattern_Syntax=Yes. The values of the Pattern_Syntax and Pattern_White_Space properties are immutable due to the stabililty guarantee. *THAT IS WHY I SELECTED IT.*

I believe that the private code point can never have its IDC property value changed from the current No\, because I don't think they can assign properties to private code points. The unassigned one might become an IDC=Yes code point some day\, but that's a one-way trip\, due to the stability guarantee.

Telling us we cannot use Unicode because Unicode changes is not reasonable. That's why there are stability guarantees. Stay within them and you are fine.

Also\, I feel that having individual modules have their own pairs is a non-solution that would be in the problem set\, not the solution set.

--tom

p5pRT commented 12 years ago

From tchrist@perl.com

Does that mean no new characters will ever become Pattern_Syntax?

(By "new" character\, you probably mean one that goes from being Unassigned=Yes and Assigned=No in one Unicode release to one that in a later release is now Unassigned=No and Assigned=Yes.)

Yes\, that is my understanding. It says that *all* Pattern_Syntax values are immutable. It doesn't just say that the Yes values are immutable\, the way it does with IDC. That means that No values are also immutable.

If that’s the case\, it addresses my concerns.

I'm glad to hear that.

However\, I *still* think this should be a plugin/module thing.

That would be a mess. It's about the lexical texture of the language that we're talking. It is inconceivable that what is or is not a string\, or is or is not an identifier *SYNTACTICALLY* should be up to the disgression of the module. This is pure Perl.

--tom

p5pRT commented 12 years ago

From tchrist@perl.com

"Father Chrysostomos via RT" \perlbug\-followup@&#8203;perl\.org wrote   on Sun\, 11 Sep 2011 14​:15​:52 PDT​:

Please take a look at

  UAX#31​: Unicode Identifier and Pattern Syntax   http​://www.unicode.org/reports/tr31/

That spells out stability issues such as those that concerned you.

It also has formal conformance requirements\, some but not all of which I think we should consider meeting and claiming.

  R1 Default Identifiers   R1a Restricted Format Characters   R1b Stable Identifiers   R2 Alternative Identifiers   R3 Pattern_White_Space and Pattern_Syntax Characters   R4 Equivalent Normalized Identifiers   R5 Equivalent Case-Insensitive Identifiers   R6 Filtered Normalized Identifiers   R7 Filtered Case-Insensitive Identifiers

If we follow those guidelines\, I do not believe we will get into trouble.

Note that R2 deals with forwards compatibility regarding Unassigned characters\, something you were wondering about.

--tom

  R1 Default Identifiers

  To meet this requirement\, an implementation shall use definition   D1 and the properties ID_Start and ID_Continue (or XID_Start and   XID_Continue) to determine whether a string is an identifier.

  Alternatively\, it shall declare that it uses a profile and define that   profile with a precise specification of the characters that are added to   or removed from the above properties and/or provide a list of additional   constraints on identifiers.

  R1a Restricted Format Characters

  To meet this requirement\, an implementation shall define a profile for   R1 which allows format characters as described in Section 2.3\, Layout   and Format Control Characters. An implementation may further restrict   the context for ZWJ or ZWNJ\, such as by limiting the scripts\, if a clear   specification for such a further restriction is supplied.

  R1b Stable Identifiers

  To meet this requirement\, an implementation shall guarantee that   identifiers are stable across versions of the Unicode Standard​: that is\,   once a string qualifies as an identifier\, it does so in all future   versions.

  * This is typically achieved by using grandfathered characters.

  R2 Alternative Identifiers

  To meet this requirement\, an implementation shall define identifiers to   be any non-empty string of characters that contains no character having   any of the following property values​:

  * Pattern_White_Space=True   * Pattern_Syntax=True   * General_Category=Private_Use\, Surrogate\, or Control   * Noncharacter_Code_Point=True

  Alternatively\, it shall declare that it uses a profile and define that   profile with a precise specification of the characters that are added to   or removed from the sets of code points defined by these properties.

  R3 Pattern_White_Space and Pattern_Syntax Characters

  To meet this requirement\, an implementation shall use   Pattern_White_Space characters as all and only those characters   interpreted as whitespace in parsing\, and shall use Pattern_Syntax   characters as all and only those characters with syntactic use.

  Alternatively\, it shall declare that it uses a profile and define that   profile with a precise specification of the characters that are added to   or removed from the sets of code points defined by these properties.

  * All characters except those that have these properties are available   for use as identifiers or literals.

  R4 Equivalent Normalized Identifiers

  To meet this requirement\, an implementation shall specify the   Normalization Form and shall provide a precise specification of the   characters that are excluded from normalization\, if any. If the   Normalization Form is NFKC\, the implementation shall apply the   modifications in Section 5.1\, NFKC Modifications\, given by the   properties XID_Start and XID_Continue. Except for identifiers containing   excluded characters\, any two identifiers that have the same   Normalization Form shall be treated as equivalent by the implementation.

  R5 Equivalent Case-Insensitive Identifiers

  To meet this requirement\, an implementation shall specify either simple   or full case folding\, and adhere to the Unicode specification for that   folding. Any two identifiers that have the same case-folded form shall   be treated as equivalent by the implementation.

  R6 Filtered Normalized Identifiers

  To meet this requirement\, an implementation shall specify the   Normalization Form and shall provide a precise specification of the   characters that are excluded from normalization\, if any. If the   Normalization Form is NFKC\, the implementation shall apply the   modifications in Section 5.1\, NFKC Modifications\, given by the   properties XID_Start and XID_Continue. Except for identifiers containing   excluded characters\, allowed identifiers must be in the specified   Normalization Form.

  R7 Filtered Case-Insensitive Identifiers

  To meet this requirement\, an implementation shall specify either simple   or full case folding\, and adhere to the Unicode specification for that   folding. Except for identifiers containing excluded characters\, allowed   identifiers must be in the specified Normalization Form.

p5pRT commented 12 years ago

From @khwilliamson

Here is my proposal to go forward on this.

In 5.16 we document an intent to deprecate as regular expression pattern delimiters the code points that are pattern syntax and that are unassigned\, or have categories of initial\, open\, final and close punctuation\,

At some later point we start using the assigned ones as paired brackets.

This gives us the 53 pairs mentioned earlier in the thread\, and we grandfather in the less-than and greater-than signs.