Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 553 forks source link

Teach Perl about Unicode named character sequences #10629

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#77818 (status was 'resolved')

Searchable as RT77818$

p5pRT commented 14 years ago

From @khwilliamson

This series of commits mainly adds to Perl support for Unicode named character sequences. They still aren't accepted in regular expression bracketed character classes\, as those currently can handle only single character elements.

There are also some small performance enhancements to charnames\, and more randomized testing.

This patch is also on github at git​://github.com/khwilliamson/perl.git branch charnames.

p5pRT commented 14 years ago

From @khwilliamson

0001-mktables-Remove-stubbed-out-code.patch ```diff From a9d896ba00e3080d31f3aae09651a48166a97f2f Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 9 Sep 2010 16:34:55 -0600 Subject: [PATCH] mktables: Remove stubbed out code This commented out code will never be used, as a different solution was done in charnames. It was to automatically handle parenthesized character names. Unicode is extremely unlikely to ever add new names like this, and all the existing names are now hard-coded in charnames.pm --- lib/unicore/mktables | 27 --------------------------- 1 files changed, 0 insertions(+), 27 deletions(-) diff --git a/lib/unicore/mktables b/lib/unicore/mktables index 505e2ab..b88483c 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -8978,23 +8978,6 @@ END } } -# XXX Unused until revise charnames; -#sub check_and_handle_compound_name { -# This looks at Name properties for parenthesized components and splits -# them off. Thus it finds FF as an equivalent to Form Feed. -# my $code_point = shift; -# my $name = shift; -# if ($name =~ /^ ( .*? ) ( \s* ) \( ( [^)]* ) \) (.*) $/x) { -# #local $to_trace = 1 if main::DEBUG; -# trace $1, $2, $3, $4 if main::DEBUG && $to_trace; -# push @more_Names, "$code_point; $1"; -# push @more_Names, "$code_point; $3"; -# Carp::my_carp_bug("Expecting blank space before left parenthesis in '$_'. Proceeding and assuming it was there;") if $2 ne " "; -# Carp::my_carp_bug("Not expecting anything after the right parenthesis in '$_'. Proceeding and ignoring that;") if $4 ne ""; -# } -# return; -#} - { # Closure for UnicodeData.txt handling # This file was the first one in the UCD; its design leads to some @@ -9377,16 +9360,6 @@ END . $fields[$CHARNAME]; } $fields[$NAME] = $fields[$CHARNAME]; - - # Some official names are really two alternate names with one in - # parentheses. What we do here is use the full official one for - # the standard property (stored just above), but for the charnames - # table, we add two more entries, one for each of the alternate - # ones. - # elsif name ne "" - #check_and_handle_compound_name($cp, $fields[$CHARNAME]); - #check_and_handle_compound_name($cp, $unicode_1_name); - # XXX until charnames catches up. } elsif ($fields[$CHARNAME] =~ /^<(.+), First>$/) { $fields[$CHARNAME] = $fields[$NAME] = $1; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-charnames.pm-Small-performance-enhancements.patch ```diff From 3374a80ac373aa2097cf60c1137e409e89feca10 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 9 Sep 2010 17:16:53 -0600 Subject: [PATCH] charnames.pm: Small performance enhancements mktables is changed to output 5 digit code points, which means that charnames doesn't have to go looking for the boundaries, which gives a slight performance enhancement. --- lib/charnames.pm | 30 ++++++++---------------------- lib/unicore/mktables | 38 ++++++++++++++++++++++++++++---------- 2 files changed, 36 insertions(+), 32 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index 4123578..29eb8e8 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -464,7 +464,7 @@ sub alias (@) # Set up a single alias $^H{charnames_ord_aliases}{$name} = $value; # Use a canonical form. - $^H{charnames_inverse_ords}{sprintf("%04X", $value)} = $name; + $^H{charnames_inverse_ords}{sprintf("%05X", $value)} = $name; } else { # XXX validate syntax when deprecation cycle complete. ie. start @@ -578,7 +578,7 @@ sub lookup_name ($;$) { ## Suck in the code/name list as a big string. ## Lines look like: - ## "0052\t\tLATIN CAPITAL LETTER R\n" + ## "00052\t\tLATIN CAPITAL LETTER R\n" $txt = do "unicore/Name.pl" unless $txt; ## @off will hold the index into the code/name string of the start and @@ -639,24 +639,10 @@ sub lookup_name ($;$) { } if (! defined $ord) { - ## - ## Now know where in the string the name starts. - ## The code, in hex, is before that. - ## - ## The code can be 4-6 characters long, so we've got to sort of - ## go look for it, just after the newline that comes before $off[0]. - ## - ## This would be much easier if unicore/Name.pl had info in - ## a name/code order, instead of code/name order. - ## - ## The +1 after the rindex() is to skip past the newline we're finding, - ## or, if the rindex() fails, to put us to an offset of zero. - ## - my $hexstart = rindex($txt, "\n", $off[0]) + 1; - - ## we know where it starts, so turn into number - - ## the ordinal for the char. - $ord = CORE::hex substr($txt, $hexstart, $off[0] - 2 - $hexstart); + + # Now know where in the string the name starts. + # The code, 5 hex digits long (and 2 tabs) is before that. + $ord = CORE::hex substr($txt, $off[0] - 7, 5); } # Cache the input so as to not have to search the large table @@ -792,10 +778,10 @@ sub viacode { # Must check if decimal first; see comments at that definition my $hex; if ($arg =~ $decimal_qr) { - $hex = sprintf "%04X", $arg; + $hex = sprintf "%05X", $arg; } elsif ($arg =~ $hex_qr) { # Below is the line that differs from the _getcode() source - $hex = sprintf "%04X", hex $1; + $hex = sprintf "%05X", hex $1; } else { carp("unexpected arg \"$arg\" to charnames::viacode()"); return; diff --git a/lib/unicore/mktables b/lib/unicore/mktables index b88483c..b959bd3 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -4675,16 +4675,23 @@ sub trace { return main::trace(@_); } # If has or wants a single point range output if ($start == $end || $range_size_1) { - for my $i ($start .. $end) { - push @OUT, sprintf "%04X\t\t%s\n", $i, $value; - if ($output_names) { - if (! defined $viacode[$i]) { - $viacode[$i] = - Property::property_ref('Perl_Charnames') - ->value_of($i) - || ""; + if (ref $range_size_1 eq 'CODE') { + for my $i ($start .. $end) { + push @OUT, &$range_size_1($i, $value); + } + } + else { + for my $i ($start .. $end) { + push @OUT, sprintf "%04X\t\t%s\n", $i, $value; + if ($output_names) { + if (! defined $viacode[$i]) { + $viacode[$i] = + Property::property_ref('Perl_Charnames') + ->value_of($i) + || ""; + } + $OUT[-1] =~ s/\n/\t# $viacode[$i]\n/; } - $OUT[-1] =~ s/\n/\t# $viacode[$i]\n/; } } } @@ -8536,6 +8543,17 @@ END return @return; } +sub output_perl_charnames_line ($$) { + + # Output the entries in Perl_charnames specially, using 5 digits instead + # of four. This makes the entries a constant length, and simplifies + # charnames.pm which this table is for. Unicode can have 6 digit + # ordinals, but they are all private use or noncharacters which do not + # have names, so won't be in this table. + + return sprintf "%05X\t\t%s\n", $_[0], $_[1]; +} + { # Closure # This is used to store the range list of all the code points usable when # the little used $compare_versions feature is enabled. @@ -9120,7 +9138,7 @@ END File => 'Name', Internal_Only_Warning => 1, Perl_Extension => 1, - Range_Size_1 => 1, + Range_Size_1 => \&output_perl_charnames_line, Type => $STRING, ); -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0003-charnames-Remove-unnecessary-t-in-Name.pl.patch ```diff From 9ccea0dd37137622c691e0453939dc6d5d7283d4 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 9 Sep 2010 17:50:16 -0600 Subject: [PATCH] charnames: Remove unnecessary \t in Name.pl The double \t\t is unnecessary, and so we can remove one of them, shortening the table. --- lib/charnames.pm | 18 +++++++++--------- lib/unicore/mktables | 2 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index 29eb8e8..88eefef 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -578,7 +578,7 @@ sub lookup_name ($;$) { ## Suck in the code/name list as a big string. ## Lines look like: - ## "00052\t\tLATIN CAPITAL LETTER R\n" + ## "00052\tLATIN CAPITAL LETTER R\n" $txt = do "unicore/Name.pl" unless $txt; ## @off will hold the index into the code/name string of the start and @@ -600,8 +600,8 @@ sub lookup_name ($;$) { if (! defined ($ord = name_to_code_point_special($name))) { # Not algorthmically determinable; look up in the table. - if ($txt =~ /\t\t\Q$name\E$/m) { - @off = ($-[0] + 2, $+[0]); # The 2 is for the 2 tabs + if ($txt =~ /\t\Q$name\E$/m) { + @off = ($-[0] + 1, $+[0]); # The 1 is for the tab $found_full_in_table = 1; } } @@ -624,7 +624,7 @@ sub lookup_name ($;$) { my $case = $name =~ /[[:upper:]]/ ? "CAPITAL" : "SMALL"; if ($txt !~ - /\t\t (?: $scripts_trie ) \ (?:$case\ )? LETTER \ \U\Q$name\E $/xm) + /\t (?: $scripts_trie ) \ (?:$case\ )? LETTER \ \U\Q$name\E $/xm) { # Here we still don't have it, give up. return if $runtime; @@ -635,14 +635,14 @@ sub lookup_name ($;$) { return 0xFFFD; } - @off = ($-[0] + 2, $+[0]); + @off = ($-[0] + 1, $+[0]); # The 1 is for the tab } if (! defined $ord) { # Now know where in the string the name starts. - # The code, 5 hex digits long (and 2 tabs) is before that. - $ord = CORE::hex substr($txt, $off[0] - 7, 5); + # The code, 5 hex digits long (and a tab), is before that. + $ord = CORE::hex substr($txt, $off[0] - 6, 5); } # Cache the input so as to not have to search the large table @@ -740,7 +740,7 @@ sub import $txt = do "unicore/Name.pl" unless $txt; for my $script (@scripts) { - if (not $txt =~ m/\t\t$script (?:CAPITAL |SMALL )?LETTER /) { + if (not $txt =~ m/\t$script (?:CAPITAL |SMALL )?LETTER /) { warnings::warn('utf8', "No such script: '$script'"); $script = quotemeta $script; # Escape it, for use in the re. } @@ -804,7 +804,7 @@ sub viacode { # Return the official name, if exists. It's unclear to me (khw) at # this juncture if it is better to return a user-defined override, so # leaving it as is for now. - if ($txt =~ m/^$hex\t\t/m) { + if ($txt =~ m/^$hex\t/m) { # The name starts with the next character and goes up to the # next new-line. Using capturing parentheses above instead of diff --git a/lib/unicore/mktables b/lib/unicore/mktables index b959bd3..4e0b4c6 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -8551,7 +8551,7 @@ sub output_perl_charnames_line ($$) { # ordinals, but they are all private use or noncharacters which do not # have names, so won't be in this table. - return sprintf "%05X\t\t%s\n", $_[0], $_[1]; + return sprintf "%05X\t%s\n", $_[0], $_[1]; } { # Closure -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0004-charnames.t-Don-t-call-srand-undef.patch ```diff From be013ea281072a80d8b9e4f839bf2a683e023394 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 9 Sep 2010 18:06:22 -0600 Subject: [PATCH] charnames.t: Don't call srand(undef) srand(undef) is the same as srand(0). The code is trying to get random seeds, not a fixed one. --- lib/charnames.t | 12 +++++++++--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index eb7358c..43f4857 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -776,9 +776,15 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}"); # For randomized tests below. my $seed; - $seed = $ENV{PERL_TEST_CHARNAMES_SEED} if - defined $ENV{PERL_TEST_CHARNAMES_SEED}; - $seed = srand($seed); + if (defined $ENV{PERL_TEST_CHARNAMES_SEED}) { + $seed = srand($ENV{PERL_TEST_CHARNAMES_SEED}); + if ($seed != $ENV{PERL_TEST_CHARNAMES_SEED}) { + die "srand returned '$seed' instead of '$ENV{PERL_TEST_CHARNAMES_SEED}'"; + }; + } + else { + $seed = srand; + } # We will look at the data grouped in "blocks" of the following # size. -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0005-charnames.t-clarify-comments.patch ```diff From 09a3b67c1b32b3910e5df47d38d4483e681d74ca Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Fri, 10 Sep 2010 10:47:15 -0600 Subject: [PATCH] charnames.t: clarify comments --- lib/charnames.t | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index 43f4857..822053d 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -795,12 +795,14 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}"); # that are algorithmically determinable, such as "CKJ UNIFIED # IDEOGRAPH-hhhh" where the hhhh is the actual hex code point number # of the character. The percentage of each type to test is - # independently settable. + # fuzzily independently settable. This breaks down when the block size is + # 1 or is large enough that both types of names occur in the same block my $percentage_of_regular_names = 25; my $percentage_of_algorithmic_names = 100 / $block_size; # 1 test/block # Changing the block size doesn't change anything with regards to - # testing the regular names, but will affect the algorithmic names. + # testing the regular names (except if you set it to 1 so that each code + # point is in its own block), but will affect the algorithmic names. # If you make the size too big so that blocks include both regular # names and algorithmic, the whole block will be sampled at the sum # of the two rates. If you make it too small, then more algorithmic @@ -844,8 +846,9 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}"); /^(.*?);/; my $end_decimal = hex $1; - # Only the CJK ones have names, and they all have the code - # point as part of the name, which we can construct + # Only the CJK (and the Hangul which are instead dealt with below) + # ones have names, and they all have the code point as part of the + # name, which we can construct if ($name =~ /^
p5pRT commented 14 years ago

From @khwilliamson

0006-charnames.t-Add-code-so-can-test-100-of-names.patch ```diff From 660f1064b7d7dd21590bff7a65916cedb9383ee5 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Fri, 10 Sep 2010 10:47:47 -0600 Subject: [PATCH] charnames.t: Add code so can test 100% of names If the percentage of characters to test is changed to 100%, add code to make the block size 1. This guarantees each character gets tested in spite of randomness --- lib/charnames.t | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index 822053d..d909824 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -798,7 +798,18 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}"); # fuzzily independently settable. This breaks down when the block size is # 1 or is large enough that both types of names occur in the same block my $percentage_of_regular_names = 25; - my $percentage_of_algorithmic_names = 100 / $block_size; # 1 test/block + my $percentage_of_algorithmic_names = (100 / $block_size); # 1 test/block + + # If wants everything tested, do so by changing the block size to 1 so + # every character is in its own block, otherwise there is a risk that the + # randomness will cause something to be tested more than once at the + # expense of testing something else not at all. + if ($percentage_of_regular_names >= 100 + || $percentage_of_algorithmic_names >= 100) + { + $block_size_bits = 0; + $block_size = 2**$block_size_bits; + } # Changing the block size doesn't change anything with regards to # testing the regular names (except if you set it to 1 so that each code -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0007-charnames.pm-Change-variable-name.patch ```diff From c0da3c143ce10651c0012430bfc45739437f9c0c Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Fri, 10 Sep 2010 11:38:25 -0600 Subject: [PATCH] charnames.pm: Change variable name This is an intermediate commit in preparation for handling named sequences --- lib/charnames.pm | 26 +++++++++++++------------- 1 files changed, 13 insertions(+), 13 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index 88eefef..b8d1593 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -523,7 +523,7 @@ sub lookup_name ($;$) { my ($name, $hints_ref) = @_; - my $ord; + my $utf8; my $save_input; if ($runtime) { @@ -551,28 +551,28 @@ sub lookup_name ($;$) { # User alias should be checked first or else can't override ours, and if we # add any, could conflict with theirs. if (exists $^H{charnames_ord_aliases}{$name}) { - $ord = $^H{charnames_ord_aliases}{$name}; + $utf8 = $^H{charnames_ord_aliases}{$name}; } elsif (exists $^H{charnames_name_aliases}{$name}) { $name = $^H{charnames_name_aliases}{$name}; $save_input = $name; # Cache the result for any error message } elsif (exists $system_aliases{$name}) { - $ord = $system_aliases{$name}; + $utf8 = $system_aliases{$name}; } elsif (exists $deprecated_aliases{$name}) { require warnings; warnings::warnif('deprecated', "Unicode character name \"$name\" is deprecated, use \"" . viacode($deprecated_aliases{$name}) . "\" instead"); - $ord = $deprecated_aliases{$name}; + $utf8 = $deprecated_aliases{$name}; } my @off; - if (! defined $ord) { + if (! defined $utf8) { # See if has looked this up earlier. if ($^H{charnames_full} && exists $full_names_cache{$name}) { - $ord = $full_names_cache{$name}; + $utf8 = $full_names_cache{$name}; } else { @@ -597,7 +597,7 @@ sub lookup_name ($;$) { # Algorithmically determinables are not placed in the cache (that # $found_full_in_table indicates) because that uses up memory, # and finding these again is fast. - if (! defined ($ord = name_to_code_point_special($name))) { + if (! defined ($utf8 = name_to_code_point_special($name))) { # Not algorthmically determinable; look up in the table. if ($txt =~ /\t\Q$name\E$/m) { @@ -608,7 +608,7 @@ sub lookup_name ($;$) { } # If we didn't get it above, keep looking - if (! $found_full_in_table && ! defined $ord) { + if (! $found_full_in_table && ! defined $utf8) { # If :short is allowed, see if input is like "greek:Sigma". my $scripts_trie; @@ -638,20 +638,20 @@ sub lookup_name ($;$) { @off = ($-[0] + 1, $+[0]); # The 1 is for the tab } - if (! defined $ord) { + if (! defined $utf8) { # Now know where in the string the name starts. # The code, 5 hex digits long (and a tab), is before that. - $ord = CORE::hex substr($txt, $off[0] - 6, 5); + $utf8 = CORE::hex substr($txt, $off[0] - 6, 5); } # Cache the input so as to not have to search the large table # again, but only if it came from the one search that we cache. - $full_names_cache{$name} = $ord if $found_full_in_table; + $full_names_cache{$name} = $utf8 if $found_full_in_table; } } - return $ord if $runtime || $ord <= 255 || ! ($^H & $bytes::hint_bits); + return $utf8 if $runtime || $utf8 <= 255 || ! ($^H & $bytes::hint_bits); # Here is compile time, "use bytes" is in effect, and the character # won't fit in a byte @@ -662,7 +662,7 @@ sub lookup_name ($;$) { else { $name = (defined $save_input) ? $save_input : $_[0]; } - croak not_legal_use_bytes_msg($name, $ord); + croak not_legal_use_bytes_msg($name, $utf8); } # lookup_name sub charnames { -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0008-Fix-spelling.patch ```diff From b2dd023bc0ca0ed9d05fb0b5998e1f1b39b0f163 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 11 Sep 2010 09:20:04 -0600 Subject: [PATCH] Fix spelling --- t/lib/charnames/alias | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/t/lib/charnames/alias b/t/lib/charnames/alias index eba6d1b..bcbd2a6 100644 --- a/t/lib/charnames/alias +++ b/t/lib/charnames/alias @@ -127,7 +127,7 @@ EXPECT OPTIONS regex $ ######## -# alias with hashref using mixed aliasses +# alias with hashref using mixed aliases use warnings; use charnames ":short", ":alias" => { e_ACUTE => "LATIN:e WITH ACUTE", @@ -138,7 +138,7 @@ EXPECT OPTIONS regex Unknown charname 'LATIN SMALL LETTER A WITH ACUT' at ######## -# alias with hashref using mixed aliasses +# alias with hashref using mixed aliases use warnings; use charnames ":short", ":alias" => { e_ACUTE => "LATIN:e WITH ACUTE", @@ -149,7 +149,7 @@ EXPECT OPTIONS regex Unknown charname 'LATIN SMALL LETTER A WITH ACUTE' at ######## -# alias with hashref using mixed aliasses +# alias with hashref using mixed aliases use warnings; no warnings 'void'; use charnames ":full", ":alias" => { -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0009-Fix-casing-wording.patch ```diff From 8f2c2981be6d640a75f91e83ab07758ea9ed0342 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 11 Sep 2010 11:17:03 -0600 Subject: [PATCH] Fix casing, wording --- pod/perlrebackslash.pod | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index d460f7f..eb51d94 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -174,12 +174,12 @@ To specify by name, the name of the character goes between the curly braces. In this case, you have to C to load the Unicode names of the characters, otherwise Perl will complain. -To specify by Unicode ordinal number, use the form +To specify a character by Unicode code point, use the form C<\N{U+I}>, where I is a number in hexadecimal that gives the ordinal number that Unicode has assigned to the desired character. It is customary (but not required) to use leading zeros to pad the number to 4 digits. Thus C<\N{U+0041}> means -C, and you will rarely see it written without the two +C, and you will rarely see it written without the two leading zeros. C<\N{U+0041}> means "A" even on EBCDIC machines (where the ordinal value of "A" is not 0x41). -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0010-charnames.t-Add-tests-for-NameAliases.patch ```diff From 9c3e02cfb4634f7d666f948ea4e7cf3380b64532 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 11 Sep 2010 13:10:37 -0600 Subject: [PATCH] charnames.t: Add tests for NameAliases --- lib/charnames.t | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index d909824..11fe818 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -885,6 +885,16 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}"); $algorithmic_names_count[$block] = 1; } + open $fh, "<", "../../lib/unicore/NameAliases.txt" or + die "Can't open ../../lib/unicore/NameAliases.txt: $!"; + while (<$fh>) { + chomp; + s/^\s*#.*//; + next unless $_; + my ($hex, $name) = split ";"; + is(charnames::vianame($name), hex $hex, "Verify vianame(\"$name\") is 0x$hex"); + } + close $fh; # Now, have all the names populated. Do the tests -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0011-charnames.t-Clarify-value-is-hex.patch ```diff From f01c10fd60e2052486401681829bb7526f36eb33 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 11 Sep 2010 13:11:44 -0600 Subject: [PATCH] charnames.t: Clarify value is hex --- lib/charnames.t | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index 11fe818..c211f65 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -957,7 +957,7 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}"); # Otherwise, test that the name and code point map # correctly - $all_pass &= is(charnames::vianame($names[$i]), $i, "Verify vianame(\"$names[$i]\") is $hex"); + $all_pass &= is(charnames::vianame($names[$i]), $i, "Verify vianame(\"$names[$i]\") is 0x$hex"); $all_pass &= is(charnames::viacode($i), $names[$i], "Verify viacode(0x$hex) is \"$names[$i]\""); # And make sure that a non-algorithmically named code -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0012-charnames.t-Clarify-message.patch ```diff From eae477751c6cd6ddc1a0db69621144334dc9f2bd Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 11 Sep 2010 13:35:23 -0600 Subject: [PATCH] charnames.t: Clarify message --- lib/charnames.t | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index c211f65..8dfb383 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -169,7 +169,7 @@ sub to_bytes { use bytes; is(charnames::vianame("GOTHIC LETTER AHSA"), 0x10330, "Verify vianame \\N{name} is unaffected by 'use bytes'"); is(charnames::vianame("U+FF"), chr(0xFF), "Verify vianame \\N{U+FF} is unaffected by 'use bytes'"); - cmp_ok($warning_count, '==', scalar @WARN, "Verify vianame doesn't warn on legal inputs"); + cmp_ok($warning_count, '==', scalar @WARN, "Verify vianame doesn't warn on legal inputs under 'use bytes'"); ok(! defined charnames::vianame("U+100"), "Verify vianame \\N{U+100} is undef under 'use bytes'"); ok($warning_count == scalar @WARN - 1 && $WARN[-1] =~ /above 0xFF/, "Verify vianame gives appropriate warning for previous test"); } -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0013-charnames.t-Add-output-message.patch ```diff From ead5458c50d07a595d9169a859d7dd40249c56e1 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 12 Sep 2010 09:50:15 -0600 Subject: [PATCH] charnames.t: Add output message --- lib/charnames.t | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/charnames.t b/lib/charnames.t index 8dfb383..66be9a3 100644 --- a/lib/charnames.t +++ b/lib/charnames.t @@ -163,7 +163,7 @@ sub to_bytes { is(charnames::vianame("U+10330"), "\x{10330}", "Verify vianame \\N{U+hex} returns a chr"); use warnings; my $warning_count = @WARN; - ok (! defined charnames::vianame("NONE SUCH")); + ok (! defined charnames::vianame("NONE SUCH"), "Verify vianame returns undef for an undefined name"); cmp_ok($warning_count, '==', scalar @WARN, "Verify vianame doesn't warn on unknown names"); use bytes; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0014-charnames.pm-Clarify-comments.patch ```diff From 7f77ab4dc0d2fa2d2f192d9d602aa9120addcb80 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 12 Sep 2010 10:27:36 -0600 Subject: [PATCH] charnames.pm: Clarify comments --- lib/charnames.pm | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index b8d1593..925dccb 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -523,7 +523,7 @@ sub lookup_name ($;$) { my ($name, $hints_ref) = @_; - my $utf8; + my $utf8; # The string result my $save_input; if ($runtime) { @@ -549,7 +549,7 @@ sub lookup_name ($;$) { } # User alias should be checked first or else can't override ours, and if we - # add any, could conflict with theirs. + # were to add any, could conflict with theirs. if (exists $^H{charnames_ord_aliases}{$name}) { $utf8 = $^H{charnames_ord_aliases}{$name}; } @@ -570,7 +570,7 @@ sub lookup_name ($;$) { if (! defined $utf8) { - # See if has looked this up earlier. + # See if has looked this input up earlier. if ($^H{charnames_full} && exists $full_names_cache{$name}) { $utf8 = $full_names_cache{$name}; } @@ -618,7 +618,7 @@ sub lookup_name ($;$) { $scripts_trie = "\U\Q$1"; $name = $2; } - else { + else { # Otherwise look in allowed scripts $scripts_trie = $^H{charnames_scripts}; } @@ -668,8 +668,8 @@ sub lookup_name ($;$) { sub charnames { my $name = shift; - # For \N{...}. Looks up the character name and returns its ordinal if - # found, undef otherwise. If not in 'use bytes', forces into utf8 + # For \N{...}. Looks up the character name and returns the string + # representation of it. my $ord = lookup_name($name); return if ! defined $ord; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0015-perlrecharclass.pod-Add-caveat-about-multi-char-se.patch ```diff From b3df87a576f911b6c5b7bced45d7c4fcf027b509 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 12 Sep 2010 10:47:56 -0600 Subject: [PATCH] perlrecharclass.pod: Add caveat about multi-char sequences Inside a bracketed character class, any \N{name} which expands to more than one character will have only the first one considered. This doesn't need named character sequences, as user-defined aliases have long been able to be multi-char. --- pod/perlrecharclass.pod | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index 7fcb92d..5aa9348 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -350,8 +350,10 @@ C<\r>, C<\t>, and C<\x> -are also special and have the same meanings as they do outside a bracketed character -class. +are also special and have the same meanings as they do outside a +bracketed character class. (However, inside a bracketed character +class, if C<\N{I}> expands to a sequence of characters, only the first +one in the sequence is used, with a warning.) Also, a backslash followed by two or three octal digits is considered an octal number. -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0016-charnames.pm-Nits-in-pod.patch ```diff From 9f6614d9620692a0b0284ed07bb54eda173ec384 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 12 Sep 2010 12:46:07 -0600 Subject: [PATCH] charnames.pm: Nits in pod --- lib/charnames.pm | 29 ++++++++++++++++------------- 1 files changed, 16 insertions(+), 13 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index 925dccb..82f7903 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -900,7 +900,7 @@ function, L)>. Forms other than C> enable the use of of C<\N{I}> sequences to compile a Unicode character into a -string based on its name. +string, based on its name. Note that C<\N{U+I<...>}>, where the I<...> is a hexadecimal number, also inserts a character into a string, but doesn't require the use of @@ -910,8 +910,8 @@ the Unicode (white background, black foreground) smiley face; it doesn't require this pragma, whereas the equivalent, C<"\N{WHITE SMILING FACE}"> does. Also, C<\N{I<...>}> can mean a regex quantifier instead of a character -name, when the I<...> is a number (or comma separated pair of numbers; -see L), and is not related to this pragma. +name, when the I<...> is a number (or comma separated pair of numbers +(see L), and is not related to this pragma. The C pragma supports arguments C<:full>, C<:short>, script names and customized aliases. If C<:full> is present, for expansion of @@ -949,9 +949,9 @@ place, and ISO 6429 was updated, see L. If the input name is unknown, C<\N{NAME}> raises a warning and substitutes the Unicode REPLACEMENT CHARACTER (U+FFFD). -It is a fatal error if C is in effect and the input name is -that of a character that won't fit into a byte (i.e., whose ordinal is -above 255). +For C<\N{NAME}>, it is a fatal error if C is in effect and the +input name is that of a character that won't fit into a byte (i.e., whose +ordinal is above 255). Otherwise, any string that includes a C<\N{I}> or C}>> will automatically have Unicode semantics (see @@ -1095,7 +1095,7 @@ or by using a file containing aliases: use charnames ":alias" => "pro"; -will try to read C<"unicore/pro_alias.pl"> from the C<@INC> path. This +This will try to read C<"unicore/pro_alias.pl"> from the C<@INC> path. This file should return a list in plain perl: ( @@ -1115,6 +1115,10 @@ well, like use charnames ":full", ":alias" => "pro"; +Also, both these methods currently allow only a single character to be named. +To name a sequence of characters, use a +L (described below). + =head1 charnames::viacode(I) Returns the full name of the character indicated by the numeric code. @@ -1125,7 +1129,7 @@ For example, prints "FOUR TEARDROP-SPOKED ASTERISK". The name returned is the official name for the code point, if -available, otherwise your custom alias for it. This means that your +available; otherwise your custom alias for it. This means that your alias will only be returned for code points that don't have an official Unicode name (nor Unicode version 1 name), such as private use code points, and the 4 control characters U+0080, U+0081, U+0084, and U+0099. @@ -1208,11 +1212,10 @@ well. =head1 BUGS -vianame returns a chr if the input name is of the form C, and an ord -otherwise. It is proposed to change this to always return an ord. Send email -to C to comment on this proposal. If S> is in effect when a chr is returned, and if that chr won't fit -into a byte, C is returned instead. +vianame normally returns an ordinal code point, but when the input name is of +the form C, it returns a chr instead. In this case, if C is +in effect and the character won't fit into a byte, it returns C and +raises a warning. Names must be ASCII characters only, which means that you are out of luck if you want to create aliases in a language where some or all the characters of -- 1.5.6.3 ```
p5pRT commented 14 years ago
p5pRT commented 14 years ago

From @khwilliamson

0018-charnames.pm-indent-less-to-fit-in-80-columns.patch ```diff From 1f2ec513a6ebb68ec2febbf13ec576b035bf7603 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 12 Sep 2010 22:17:35 -0600 Subject: [PATCH] charnames.pm: indent less to fit in 80 columns This patch changes white space only. It lessens the indent of certain lines that were made longer in an earlier commit, and now most of them fit into 80 columns. --- lib/charnames.pm | 830 +++++++++++++++++++++++++++--------------------------- 1 files changed, 415 insertions(+), 415 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index 80b4d6e..2ff0c1d 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -13,397 +13,397 @@ use bytes (); # for $bytes::hint_bits # it alone, but since that is harder for a human to parse, I left it as-is. my %system_aliases = ( - # Icky 3.2 names with parentheses. - 'LINE FEED' => pack("U", 0x0A), # LINE FEED (LF) - 'FORM FEED' => pack("U", 0x0C), # FORM FEED (FF) - 'CARRIAGE RETURN' => pack("U", 0x0D), # CARRIAGE RETURN (CR) - 'NEXT LINE' => pack("U", 0x85), # NEXT LINE (NEL) - - # Some variant names from Wikipedia - 'SINGLE-SHIFT 2' => pack("U", 0x8E), - 'SINGLE-SHIFT 3' => pack("U", 0x8F), - 'PRIVATE USE 1' => pack("U", 0x91), - 'PRIVATE USE 2' => pack("U", 0x92), - 'START OF PROTECTED AREA' => pack("U", 0x96), - 'END OF PROTECTED AREA' => pack("U", 0x97), - - # Convenience. Standard abbreviations for the controls - 'NUL' => pack("U", 0x00), # NULL - 'SOH' => pack("U", 0x01), # START OF HEADING - 'STX' => pack("U", 0x02), # START OF TEXT - 'ETX' => pack("U", 0x03), # END OF TEXT - 'EOT' => pack("U", 0x04), # END OF TRANSMISSION - 'ENQ' => pack("U", 0x05), # ENQUIRY - 'ACK' => pack("U", 0x06), # ACKNOWLEDGE - 'BEL' => pack("U", 0x07), # BELL - 'BS' => pack("U", 0x08), # BACKSPACE - 'HT' => pack("U", 0x09), # HORIZONTAL TABULATION - 'LF' => pack("U", 0x0A), # LINE FEED (LF) - 'VT' => pack("U", 0x0B), # VERTICAL TABULATION - 'FF' => pack("U", 0x0C), # FORM FEED (FF) - 'CR' => pack("U", 0x0D), # CARRIAGE RETURN (CR) - 'SO' => pack("U", 0x0E), # SHIFT OUT - 'SI' => pack("U", 0x0F), # SHIFT IN - 'DLE' => pack("U", 0x10), # DATA LINK ESCAPE - 'DC1' => pack("U", 0x11), # DEVICE CONTROL ONE - 'DC2' => pack("U", 0x12), # DEVICE CONTROL TWO - 'DC3' => pack("U", 0x13), # DEVICE CONTROL THREE - 'DC4' => pack("U", 0x14), # DEVICE CONTROL FOUR - 'NAK' => pack("U", 0x15), # NEGATIVE ACKNOWLEDGE - 'SYN' => pack("U", 0x16), # SYNCHRONOUS IDLE - 'ETB' => pack("U", 0x17), # END OF TRANSMISSION BLOCK - 'CAN' => pack("U", 0x18), # CANCEL - 'EOM' => pack("U", 0x19), # END OF MEDIUM - 'SUB' => pack("U", 0x1A), # SUBSTITUTE - 'ESC' => pack("U", 0x1B), # ESCAPE - 'FS' => pack("U", 0x1C), # FILE SEPARATOR - 'GS' => pack("U", 0x1D), # GROUP SEPARATOR - 'RS' => pack("U", 0x1E), # RECORD SEPARATOR - 'US' => pack("U", 0x1F), # UNIT SEPARATOR - 'DEL' => pack("U", 0x7F), # DELETE - 'BPH' => pack("U", 0x82), # BREAK PERMITTED HERE - 'NBH' => pack("U", 0x83), # NO BREAK HERE - 'NEL' => pack("U", 0x85), # NEXT LINE (NEL) - 'SSA' => pack("U", 0x86), # START OF SELECTED AREA - 'ESA' => pack("U", 0x87), # END OF SELECTED AREA - 'HTS' => pack("U", 0x88), # CHARACTER TABULATION SET - 'HTJ' => pack("U", 0x89), # CHARACTER TABULATION WITH JUSTIFICATION - 'VTS' => pack("U", 0x8A), # LINE TABULATION SET - 'PLD' => pack("U", 0x8B), # PARTIAL LINE FORWARD - 'PLU' => pack("U", 0x8C), # PARTIAL LINE BACKWARD - 'RI ' => pack("U", 0x8D), # REVERSE LINE FEED - 'SS2' => pack("U", 0x8E), # SINGLE SHIFT TWO - 'SS3' => pack("U", 0x8F), # SINGLE SHIFT THREE - 'DCS' => pack("U", 0x90), # DEVICE CONTROL STRING - 'PU1' => pack("U", 0x91), # PRIVATE USE ONE - 'PU2' => pack("U", 0x92), # PRIVATE USE TWO - 'STS' => pack("U", 0x93), # SET TRANSMIT STATE - 'CCH' => pack("U", 0x94), # CANCEL CHARACTER - 'MW ' => pack("U", 0x95), # MESSAGE WAITING - 'SPA' => pack("U", 0x96), # START OF GUARDED AREA - 'EPA' => pack("U", 0x97), # END OF GUARDED AREA - 'SOS' => pack("U", 0x98), # START OF STRING - 'SCI' => pack("U", 0x9A), # SINGLE CHARACTER INTRODUCER - 'CSI' => pack("U", 0x9B), # CONTROL SEQUENCE INTRODUCER - 'ST ' => pack("U", 0x9C), # STRING TERMINATOR - 'OSC' => pack("U", 0x9D), # OPERATING SYSTEM COMMAND - 'PM ' => pack("U", 0x9E), # PRIVACY MESSAGE - 'APC' => pack("U", 0x9F), # APPLICATION PROGRAM COMMAND - - # There are no names for these in the Unicode standard; - # perhaps should be deprecated, but then again there are - # no alternative names, so am not deprecating. And if - # did, the code would have to change to not recommend an - # alternative for these. - 'PADDING CHARACTER' => pack("U", 0x80), - 'PAD' => pack("U", 0x80), - 'HIGH OCTET PRESET' => pack("U", 0x81), - 'HOP' => pack("U", 0x81), - 'INDEX' => pack("U", 0x84), - 'IND' => pack("U", 0x84), - 'SINGLE GRAPHIC CHARACTER INTRODUCER' => pack("U", 0x99), - 'SGC' => pack("U", 0x99), - - # More convenience. For further convenience, - # it is suggested some way of using the NamesList - # aliases be implemented, but there are ambiguities in - # NamesList.txt - 'BOM' => pack("U", 0xFEFF), # BYTE ORDER MARK - 'BYTE ORDER MARK'=> pack("U", 0xFEFF), - 'CGJ' => pack("U", 0x034F), # COMBINING GRAPHEME JOINER - 'FVS1' => pack("U", 0x180B), # MONGOLIAN FREE VARIATION SELECTOR ONE - 'FVS2' => pack("U", 0x180C), # MONGOLIAN FREE VARIATION SELECTOR TWO - 'FVS3' => pack("U", 0x180D), # MONGOLIAN FREE VARIATION SELECTOR THREE - 'LRE' => pack("U", 0x202A), # LEFT-TO-RIGHT EMBEDDING - 'LRM' => pack("U", 0x200E), # LEFT-TO-RIGHT MARK - 'LRO' => pack("U", 0x202D), # LEFT-TO-RIGHT OVERRIDE - 'MMSP' => pack("U", 0x205F), # MEDIUM MATHEMATICAL SPACE - 'MVS' => pack("U", 0x180E), # MONGOLIAN VOWEL SEPARATOR - 'NBSP' => pack("U", 0x00A0), # NO-BREAK SPACE - 'NNBSP' => pack("U", 0x202F), # NARROW NO-BREAK SPACE - 'PDF' => pack("U", 0x202C), # POP DIRECTIONAL FORMATTING - 'RLE' => pack("U", 0x202B), # RIGHT-TO-LEFT EMBEDDING - 'RLM' => pack("U", 0x200F), # RIGHT-TO-LEFT MARK - 'RLO' => pack("U", 0x202E), # RIGHT-TO-LEFT OVERRIDE - 'SHY' => pack("U", 0x00AD), # SOFT HYPHEN - 'VS1' => pack("U", 0xFE00), # VARIATION SELECTOR-1 - 'VS2' => pack("U", 0xFE01), # VARIATION SELECTOR-2 - 'VS3' => pack("U", 0xFE02), # VARIATION SELECTOR-3 - 'VS4' => pack("U", 0xFE03), # VARIATION SELECTOR-4 - 'VS5' => pack("U", 0xFE04), # VARIATION SELECTOR-5 - 'VS6' => pack("U", 0xFE05), # VARIATION SELECTOR-6 - 'VS7' => pack("U", 0xFE06), # VARIATION SELECTOR-7 - 'VS8' => pack("U", 0xFE07), # VARIATION SELECTOR-8 - 'VS9' => pack("U", 0xFE08), # VARIATION SELECTOR-9 - 'VS10' => pack("U", 0xFE09), # VARIATION SELECTOR-10 - 'VS11' => pack("U", 0xFE0A), # VARIATION SELECTOR-11 - 'VS12' => pack("U", 0xFE0B), # VARIATION SELECTOR-12 - 'VS13' => pack("U", 0xFE0C), # VARIATION SELECTOR-13 - 'VS14' => pack("U", 0xFE0D), # VARIATION SELECTOR-14 - 'VS15' => pack("U", 0xFE0E), # VARIATION SELECTOR-15 - 'VS16' => pack("U", 0xFE0F), # VARIATION SELECTOR-16 - 'VS17' => pack("U", 0xE0100), # VARIATION SELECTOR-17 - 'VS18' => pack("U", 0xE0101), # VARIATION SELECTOR-18 - 'VS19' => pack("U", 0xE0102), # VARIATION SELECTOR-19 - 'VS20' => pack("U", 0xE0103), # VARIATION SELECTOR-20 - 'VS21' => pack("U", 0xE0104), # VARIATION SELECTOR-21 - 'VS22' => pack("U", 0xE0105), # VARIATION SELECTOR-22 - 'VS23' => pack("U", 0xE0106), # VARIATION SELECTOR-23 - 'VS24' => pack("U", 0xE0107), # VARIATION SELECTOR-24 - 'VS25' => pack("U", 0xE0108), # VARIATION SELECTOR-25 - 'VS26' => pack("U", 0xE0109), # VARIATION SELECTOR-26 - 'VS27' => pack("U", 0xE010A), # VARIATION SELECTOR-27 - 'VS28' => pack("U", 0xE010B), # VARIATION SELECTOR-28 - 'VS29' => pack("U", 0xE010C), # VARIATION SELECTOR-29 - 'VS30' => pack("U", 0xE010D), # VARIATION SELECTOR-30 - 'VS31' => pack("U", 0xE010E), # VARIATION SELECTOR-31 - 'VS32' => pack("U", 0xE010F), # VARIATION SELECTOR-32 - 'VS33' => pack("U", 0xE0110), # VARIATION SELECTOR-33 - 'VS34' => pack("U", 0xE0111), # VARIATION SELECTOR-34 - 'VS35' => pack("U", 0xE0112), # VARIATION SELECTOR-35 - 'VS36' => pack("U", 0xE0113), # VARIATION SELECTOR-36 - 'VS37' => pack("U", 0xE0114), # VARIATION SELECTOR-37 - 'VS38' => pack("U", 0xE0115), # VARIATION SELECTOR-38 - 'VS39' => pack("U", 0xE0116), # VARIATION SELECTOR-39 - 'VS40' => pack("U", 0xE0117), # VARIATION SELECTOR-40 - 'VS41' => pack("U", 0xE0118), # VARIATION SELECTOR-41 - 'VS42' => pack("U", 0xE0119), # VARIATION SELECTOR-42 - 'VS43' => pack("U", 0xE011A), # VARIATION SELECTOR-43 - 'VS44' => pack("U", 0xE011B), # VARIATION SELECTOR-44 - 'VS45' => pack("U", 0xE011C), # VARIATION SELECTOR-45 - 'VS46' => pack("U", 0xE011D), # VARIATION SELECTOR-46 - 'VS47' => pack("U", 0xE011E), # VARIATION SELECTOR-47 - 'VS48' => pack("U", 0xE011F), # VARIATION SELECTOR-48 - 'VS49' => pack("U", 0xE0120), # VARIATION SELECTOR-49 - 'VS50' => pack("U", 0xE0121), # VARIATION SELECTOR-50 - 'VS51' => pack("U", 0xE0122), # VARIATION SELECTOR-51 - 'VS52' => pack("U", 0xE0123), # VARIATION SELECTOR-52 - 'VS53' => pack("U", 0xE0124), # VARIATION SELECTOR-53 - 'VS54' => pack("U", 0xE0125), # VARIATION SELECTOR-54 - 'VS55' => pack("U", 0xE0126), # VARIATION SELECTOR-55 - 'VS56' => pack("U", 0xE0127), # VARIATION SELECTOR-56 - 'VS57' => pack("U", 0xE0128), # VARIATION SELECTOR-57 - 'VS58' => pack("U", 0xE0129), # VARIATION SELECTOR-58 - 'VS59' => pack("U", 0xE012A), # VARIATION SELECTOR-59 - 'VS60' => pack("U", 0xE012B), # VARIATION SELECTOR-60 - 'VS61' => pack("U", 0xE012C), # VARIATION SELECTOR-61 - 'VS62' => pack("U", 0xE012D), # VARIATION SELECTOR-62 - 'VS63' => pack("U", 0xE012E), # VARIATION SELECTOR-63 - 'VS64' => pack("U", 0xE012F), # VARIATION SELECTOR-64 - 'VS65' => pack("U", 0xE0130), # VARIATION SELECTOR-65 - 'VS66' => pack("U", 0xE0131), # VARIATION SELECTOR-66 - 'VS67' => pack("U", 0xE0132), # VARIATION SELECTOR-67 - 'VS68' => pack("U", 0xE0133), # VARIATION SELECTOR-68 - 'VS69' => pack("U", 0xE0134), # VARIATION SELECTOR-69 - 'VS70' => pack("U", 0xE0135), # VARIATION SELECTOR-70 - 'VS71' => pack("U", 0xE0136), # VARIATION SELECTOR-71 - 'VS72' => pack("U", 0xE0137), # VARIATION SELECTOR-72 - 'VS73' => pack("U", 0xE0138), # VARIATION SELECTOR-73 - 'VS74' => pack("U", 0xE0139), # VARIATION SELECTOR-74 - 'VS75' => pack("U", 0xE013A), # VARIATION SELECTOR-75 - 'VS76' => pack("U", 0xE013B), # VARIATION SELECTOR-76 - 'VS77' => pack("U", 0xE013C), # VARIATION SELECTOR-77 - 'VS78' => pack("U", 0xE013D), # VARIATION SELECTOR-78 - 'VS79' => pack("U", 0xE013E), # VARIATION SELECTOR-79 - 'VS80' => pack("U", 0xE013F), # VARIATION SELECTOR-80 - 'VS81' => pack("U", 0xE0140), # VARIATION SELECTOR-81 - 'VS82' => pack("U", 0xE0141), # VARIATION SELECTOR-82 - 'VS83' => pack("U", 0xE0142), # VARIATION SELECTOR-83 - 'VS84' => pack("U", 0xE0143), # VARIATION SELECTOR-84 - 'VS85' => pack("U", 0xE0144), # VARIATION SELECTOR-85 - 'VS86' => pack("U", 0xE0145), # VARIATION SELECTOR-86 - 'VS87' => pack("U", 0xE0146), # VARIATION SELECTOR-87 - 'VS88' => pack("U", 0xE0147), # VARIATION SELECTOR-88 - 'VS89' => pack("U", 0xE0148), # VARIATION SELECTOR-89 - 'VS90' => pack("U", 0xE0149), # VARIATION SELECTOR-90 - 'VS91' => pack("U", 0xE014A), # VARIATION SELECTOR-91 - 'VS92' => pack("U", 0xE014B), # VARIATION SELECTOR-92 - 'VS93' => pack("U", 0xE014C), # VARIATION SELECTOR-93 - 'VS94' => pack("U", 0xE014D), # VARIATION SELECTOR-94 - 'VS95' => pack("U", 0xE014E), # VARIATION SELECTOR-95 - 'VS96' => pack("U", 0xE014F), # VARIATION SELECTOR-96 - 'VS97' => pack("U", 0xE0150), # VARIATION SELECTOR-97 - 'VS98' => pack("U", 0xE0151), # VARIATION SELECTOR-98 - 'VS99' => pack("U", 0xE0152), # VARIATION SELECTOR-99 - 'VS100' => pack("U", 0xE0153), # VARIATION SELECTOR-100 - 'VS101' => pack("U", 0xE0154), # VARIATION SELECTOR-101 - 'VS102' => pack("U", 0xE0155), # VARIATION SELECTOR-102 - 'VS103' => pack("U", 0xE0156), # VARIATION SELECTOR-103 - 'VS104' => pack("U", 0xE0157), # VARIATION SELECTOR-104 - 'VS105' => pack("U", 0xE0158), # VARIATION SELECTOR-105 - 'VS106' => pack("U", 0xE0159), # VARIATION SELECTOR-106 - 'VS107' => pack("U", 0xE015A), # VARIATION SELECTOR-107 - 'VS108' => pack("U", 0xE015B), # VARIATION SELECTOR-108 - 'VS109' => pack("U", 0xE015C), # VARIATION SELECTOR-109 - 'VS110' => pack("U", 0xE015D), # VARIATION SELECTOR-110 - 'VS111' => pack("U", 0xE015E), # VARIATION SELECTOR-111 - 'VS112' => pack("U", 0xE015F), # VARIATION SELECTOR-112 - 'VS113' => pack("U", 0xE0160), # VARIATION SELECTOR-113 - 'VS114' => pack("U", 0xE0161), # VARIATION SELECTOR-114 - 'VS115' => pack("U", 0xE0162), # VARIATION SELECTOR-115 - 'VS116' => pack("U", 0xE0163), # VARIATION SELECTOR-116 - 'VS117' => pack("U", 0xE0164), # VARIATION SELECTOR-117 - 'VS118' => pack("U", 0xE0165), # VARIATION SELECTOR-118 - 'VS119' => pack("U", 0xE0166), # VARIATION SELECTOR-119 - 'VS120' => pack("U", 0xE0167), # VARIATION SELECTOR-120 - 'VS121' => pack("U", 0xE0168), # VARIATION SELECTOR-121 - 'VS122' => pack("U", 0xE0169), # VARIATION SELECTOR-122 - 'VS123' => pack("U", 0xE016A), # VARIATION SELECTOR-123 - 'VS124' => pack("U", 0xE016B), # VARIATION SELECTOR-124 - 'VS125' => pack("U", 0xE016C), # VARIATION SELECTOR-125 - 'VS126' => pack("U", 0xE016D), # VARIATION SELECTOR-126 - 'VS127' => pack("U", 0xE016E), # VARIATION SELECTOR-127 - 'VS128' => pack("U", 0xE016F), # VARIATION SELECTOR-128 - 'VS129' => pack("U", 0xE0170), # VARIATION SELECTOR-129 - 'VS130' => pack("U", 0xE0171), # VARIATION SELECTOR-130 - 'VS131' => pack("U", 0xE0172), # VARIATION SELECTOR-131 - 'VS132' => pack("U", 0xE0173), # VARIATION SELECTOR-132 - 'VS133' => pack("U", 0xE0174), # VARIATION SELECTOR-133 - 'VS134' => pack("U", 0xE0175), # VARIATION SELECTOR-134 - 'VS135' => pack("U", 0xE0176), # VARIATION SELECTOR-135 - 'VS136' => pack("U", 0xE0177), # VARIATION SELECTOR-136 - 'VS137' => pack("U", 0xE0178), # VARIATION SELECTOR-137 - 'VS138' => pack("U", 0xE0179), # VARIATION SELECTOR-138 - 'VS139' => pack("U", 0xE017A), # VARIATION SELECTOR-139 - 'VS140' => pack("U", 0xE017B), # VARIATION SELECTOR-140 - 'VS141' => pack("U", 0xE017C), # VARIATION SELECTOR-141 - 'VS142' => pack("U", 0xE017D), # VARIATION SELECTOR-142 - 'VS143' => pack("U", 0xE017E), # VARIATION SELECTOR-143 - 'VS144' => pack("U", 0xE017F), # VARIATION SELECTOR-144 - 'VS145' => pack("U", 0xE0180), # VARIATION SELECTOR-145 - 'VS146' => pack("U", 0xE0181), # VARIATION SELECTOR-146 - 'VS147' => pack("U", 0xE0182), # VARIATION SELECTOR-147 - 'VS148' => pack("U", 0xE0183), # VARIATION SELECTOR-148 - 'VS149' => pack("U", 0xE0184), # VARIATION SELECTOR-149 - 'VS150' => pack("U", 0xE0185), # VARIATION SELECTOR-150 - 'VS151' => pack("U", 0xE0186), # VARIATION SELECTOR-151 - 'VS152' => pack("U", 0xE0187), # VARIATION SELECTOR-152 - 'VS153' => pack("U", 0xE0188), # VARIATION SELECTOR-153 - 'VS154' => pack("U", 0xE0189), # VARIATION SELECTOR-154 - 'VS155' => pack("U", 0xE018A), # VARIATION SELECTOR-155 - 'VS156' => pack("U", 0xE018B), # VARIATION SELECTOR-156 - 'VS157' => pack("U", 0xE018C), # VARIATION SELECTOR-157 - 'VS158' => pack("U", 0xE018D), # VARIATION SELECTOR-158 - 'VS159' => pack("U", 0xE018E), # VARIATION SELECTOR-159 - 'VS160' => pack("U", 0xE018F), # VARIATION SELECTOR-160 - 'VS161' => pack("U", 0xE0190), # VARIATION SELECTOR-161 - 'VS162' => pack("U", 0xE0191), # VARIATION SELECTOR-162 - 'VS163' => pack("U", 0xE0192), # VARIATION SELECTOR-163 - 'VS164' => pack("U", 0xE0193), # VARIATION SELECTOR-164 - 'VS165' => pack("U", 0xE0194), # VARIATION SELECTOR-165 - 'VS166' => pack("U", 0xE0195), # VARIATION SELECTOR-166 - 'VS167' => pack("U", 0xE0196), # VARIATION SELECTOR-167 - 'VS168' => pack("U", 0xE0197), # VARIATION SELECTOR-168 - 'VS169' => pack("U", 0xE0198), # VARIATION SELECTOR-169 - 'VS170' => pack("U", 0xE0199), # VARIATION SELECTOR-170 - 'VS171' => pack("U", 0xE019A), # VARIATION SELECTOR-171 - 'VS172' => pack("U", 0xE019B), # VARIATION SELECTOR-172 - 'VS173' => pack("U", 0xE019C), # VARIATION SELECTOR-173 - 'VS174' => pack("U", 0xE019D), # VARIATION SELECTOR-174 - 'VS175' => pack("U", 0xE019E), # VARIATION SELECTOR-175 - 'VS176' => pack("U", 0xE019F), # VARIATION SELECTOR-176 - 'VS177' => pack("U", 0xE01A0), # VARIATION SELECTOR-177 - 'VS178' => pack("U", 0xE01A1), # VARIATION SELECTOR-178 - 'VS179' => pack("U", 0xE01A2), # VARIATION SELECTOR-179 - 'VS180' => pack("U", 0xE01A3), # VARIATION SELECTOR-180 - 'VS181' => pack("U", 0xE01A4), # VARIATION SELECTOR-181 - 'VS182' => pack("U", 0xE01A5), # VARIATION SELECTOR-182 - 'VS183' => pack("U", 0xE01A6), # VARIATION SELECTOR-183 - 'VS184' => pack("U", 0xE01A7), # VARIATION SELECTOR-184 - 'VS185' => pack("U", 0xE01A8), # VARIATION SELECTOR-185 - 'VS186' => pack("U", 0xE01A9), # VARIATION SELECTOR-186 - 'VS187' => pack("U", 0xE01AA), # VARIATION SELECTOR-187 - 'VS188' => pack("U", 0xE01AB), # VARIATION SELECTOR-188 - 'VS189' => pack("U", 0xE01AC), # VARIATION SELECTOR-189 - 'VS190' => pack("U", 0xE01AD), # VARIATION SELECTOR-190 - 'VS191' => pack("U", 0xE01AE), # VARIATION SELECTOR-191 - 'VS192' => pack("U", 0xE01AF), # VARIATION SELECTOR-192 - 'VS193' => pack("U", 0xE01B0), # VARIATION SELECTOR-193 - 'VS194' => pack("U", 0xE01B1), # VARIATION SELECTOR-194 - 'VS195' => pack("U", 0xE01B2), # VARIATION SELECTOR-195 - 'VS196' => pack("U", 0xE01B3), # VARIATION SELECTOR-196 - 'VS197' => pack("U", 0xE01B4), # VARIATION SELECTOR-197 - 'VS198' => pack("U", 0xE01B5), # VARIATION SELECTOR-198 - 'VS199' => pack("U", 0xE01B6), # VARIATION SELECTOR-199 - 'VS200' => pack("U", 0xE01B7), # VARIATION SELECTOR-200 - 'VS201' => pack("U", 0xE01B8), # VARIATION SELECTOR-201 - 'VS202' => pack("U", 0xE01B9), # VARIATION SELECTOR-202 - 'VS203' => pack("U", 0xE01BA), # VARIATION SELECTOR-203 - 'VS204' => pack("U", 0xE01BB), # VARIATION SELECTOR-204 - 'VS205' => pack("U", 0xE01BC), # VARIATION SELECTOR-205 - 'VS206' => pack("U", 0xE01BD), # VARIATION SELECTOR-206 - 'VS207' => pack("U", 0xE01BE), # VARIATION SELECTOR-207 - 'VS208' => pack("U", 0xE01BF), # VARIATION SELECTOR-208 - 'VS209' => pack("U", 0xE01C0), # VARIATION SELECTOR-209 - 'VS210' => pack("U", 0xE01C1), # VARIATION SELECTOR-210 - 'VS211' => pack("U", 0xE01C2), # VARIATION SELECTOR-211 - 'VS212' => pack("U", 0xE01C3), # VARIATION SELECTOR-212 - 'VS213' => pack("U", 0xE01C4), # VARIATION SELECTOR-213 - 'VS214' => pack("U", 0xE01C5), # VARIATION SELECTOR-214 - 'VS215' => pack("U", 0xE01C6), # VARIATION SELECTOR-215 - 'VS216' => pack("U", 0xE01C7), # VARIATION SELECTOR-216 - 'VS217' => pack("U", 0xE01C8), # VARIATION SELECTOR-217 - 'VS218' => pack("U", 0xE01C9), # VARIATION SELECTOR-218 - 'VS219' => pack("U", 0xE01CA), # VARIATION SELECTOR-219 - 'VS220' => pack("U", 0xE01CB), # VARIATION SELECTOR-220 - 'VS221' => pack("U", 0xE01CC), # VARIATION SELECTOR-221 - 'VS222' => pack("U", 0xE01CD), # VARIATION SELECTOR-222 - 'VS223' => pack("U", 0xE01CE), # VARIATION SELECTOR-223 - 'VS224' => pack("U", 0xE01CF), # VARIATION SELECTOR-224 - 'VS225' => pack("U", 0xE01D0), # VARIATION SELECTOR-225 - 'VS226' => pack("U", 0xE01D1), # VARIATION SELECTOR-226 - 'VS227' => pack("U", 0xE01D2), # VARIATION SELECTOR-227 - 'VS228' => pack("U", 0xE01D3), # VARIATION SELECTOR-228 - 'VS229' => pack("U", 0xE01D4), # VARIATION SELECTOR-229 - 'VS230' => pack("U", 0xE01D5), # VARIATION SELECTOR-230 - 'VS231' => pack("U", 0xE01D6), # VARIATION SELECTOR-231 - 'VS232' => pack("U", 0xE01D7), # VARIATION SELECTOR-232 - 'VS233' => pack("U", 0xE01D8), # VARIATION SELECTOR-233 - 'VS234' => pack("U", 0xE01D9), # VARIATION SELECTOR-234 - 'VS235' => pack("U", 0xE01DA), # VARIATION SELECTOR-235 - 'VS236' => pack("U", 0xE01DB), # VARIATION SELECTOR-236 - 'VS237' => pack("U", 0xE01DC), # VARIATION SELECTOR-237 - 'VS238' => pack("U", 0xE01DD), # VARIATION SELECTOR-238 - 'VS239' => pack("U", 0xE01DE), # VARIATION SELECTOR-239 - 'VS240' => pack("U", 0xE01DF), # VARIATION SELECTOR-240 - 'VS241' => pack("U", 0xE01E0), # VARIATION SELECTOR-241 - 'VS242' => pack("U", 0xE01E1), # VARIATION SELECTOR-242 - 'VS243' => pack("U", 0xE01E2), # VARIATION SELECTOR-243 - 'VS244' => pack("U", 0xE01E3), # VARIATION SELECTOR-244 - 'VS245' => pack("U", 0xE01E4), # VARIATION SELECTOR-245 - 'VS246' => pack("U", 0xE01E5), # VARIATION SELECTOR-246 - 'VS247' => pack("U", 0xE01E6), # VARIATION SELECTOR-247 - 'VS248' => pack("U", 0xE01E7), # VARIATION SELECTOR-248 - 'VS249' => pack("U", 0xE01E8), # VARIATION SELECTOR-249 - 'VS250' => pack("U", 0xE01E9), # VARIATION SELECTOR-250 - 'VS251' => pack("U", 0xE01EA), # VARIATION SELECTOR-251 - 'VS252' => pack("U", 0xE01EB), # VARIATION SELECTOR-252 - 'VS253' => pack("U", 0xE01EC), # VARIATION SELECTOR-253 - 'VS254' => pack("U", 0xE01ED), # VARIATION SELECTOR-254 - 'VS255' => pack("U", 0xE01EE), # VARIATION SELECTOR-255 - 'VS256' => pack("U", 0xE01EF), # VARIATION SELECTOR-256 - 'WJ' => pack("U", 0x2060), # WORD JOINER - 'ZWJ' => pack("U", 0x200D), # ZERO WIDTH JOINER - 'ZWNJ' => pack("U", 0x200C), # ZERO WIDTH NON-JOINER - 'ZWSP' => pack("U", 0x200B), # ZERO WIDTH SPACE - ); + # Icky 3.2 names with parentheses. + 'LINE FEED' => pack("U", 0x0A), # LINE FEED (LF) + 'FORM FEED' => pack("U", 0x0C), # FORM FEED (FF) + 'CARRIAGE RETURN' => pack("U", 0x0D), # CARRIAGE RETURN (CR) + 'NEXT LINE' => pack("U", 0x85), # NEXT LINE (NEL) + + # Some variant names from Wikipedia + 'SINGLE-SHIFT 2' => pack("U", 0x8E), + 'SINGLE-SHIFT 3' => pack("U", 0x8F), + 'PRIVATE USE 1' => pack("U", 0x91), + 'PRIVATE USE 2' => pack("U", 0x92), + 'START OF PROTECTED AREA' => pack("U", 0x96), + 'END OF PROTECTED AREA' => pack("U", 0x97), + + # Convenience. Standard abbreviations for the controls + 'NUL' => pack("U", 0x00), # NULL + 'SOH' => pack("U", 0x01), # START OF HEADING + 'STX' => pack("U", 0x02), # START OF TEXT + 'ETX' => pack("U", 0x03), # END OF TEXT + 'EOT' => pack("U", 0x04), # END OF TRANSMISSION + 'ENQ' => pack("U", 0x05), # ENQUIRY + 'ACK' => pack("U", 0x06), # ACKNOWLEDGE + 'BEL' => pack("U", 0x07), # BELL + 'BS' => pack("U", 0x08), # BACKSPACE + 'HT' => pack("U", 0x09), # HORIZONTAL TABULATION + 'LF' => pack("U", 0x0A), # LINE FEED (LF) + 'VT' => pack("U", 0x0B), # VERTICAL TABULATION + 'FF' => pack("U", 0x0C), # FORM FEED (FF) + 'CR' => pack("U", 0x0D), # CARRIAGE RETURN (CR) + 'SO' => pack("U", 0x0E), # SHIFT OUT + 'SI' => pack("U", 0x0F), # SHIFT IN + 'DLE' => pack("U", 0x10), # DATA LINK ESCAPE + 'DC1' => pack("U", 0x11), # DEVICE CONTROL ONE + 'DC2' => pack("U", 0x12), # DEVICE CONTROL TWO + 'DC3' => pack("U", 0x13), # DEVICE CONTROL THREE + 'DC4' => pack("U", 0x14), # DEVICE CONTROL FOUR + 'NAK' => pack("U", 0x15), # NEGATIVE ACKNOWLEDGE + 'SYN' => pack("U", 0x16), # SYNCHRONOUS IDLE + 'ETB' => pack("U", 0x17), # END OF TRANSMISSION BLOCK + 'CAN' => pack("U", 0x18), # CANCEL + 'EOM' => pack("U", 0x19), # END OF MEDIUM + 'SUB' => pack("U", 0x1A), # SUBSTITUTE + 'ESC' => pack("U", 0x1B), # ESCAPE + 'FS' => pack("U", 0x1C), # FILE SEPARATOR + 'GS' => pack("U", 0x1D), # GROUP SEPARATOR + 'RS' => pack("U", 0x1E), # RECORD SEPARATOR + 'US' => pack("U", 0x1F), # UNIT SEPARATOR + 'DEL' => pack("U", 0x7F), # DELETE + 'BPH' => pack("U", 0x82), # BREAK PERMITTED HERE + 'NBH' => pack("U", 0x83), # NO BREAK HERE + 'NEL' => pack("U", 0x85), # NEXT LINE (NEL) + 'SSA' => pack("U", 0x86), # START OF SELECTED AREA + 'ESA' => pack("U", 0x87), # END OF SELECTED AREA + 'HTS' => pack("U", 0x88), # CHARACTER TABULATION SET + 'HTJ' => pack("U", 0x89), # CHARACTER TABULATION WITH JUSTIFICATION + 'VTS' => pack("U", 0x8A), # LINE TABULATION SET + 'PLD' => pack("U", 0x8B), # PARTIAL LINE FORWARD + 'PLU' => pack("U", 0x8C), # PARTIAL LINE BACKWARD + 'RI ' => pack("U", 0x8D), # REVERSE LINE FEED + 'SS2' => pack("U", 0x8E), # SINGLE SHIFT TWO + 'SS3' => pack("U", 0x8F), # SINGLE SHIFT THREE + 'DCS' => pack("U", 0x90), # DEVICE CONTROL STRING + 'PU1' => pack("U", 0x91), # PRIVATE USE ONE + 'PU2' => pack("U", 0x92), # PRIVATE USE TWO + 'STS' => pack("U", 0x93), # SET TRANSMIT STATE + 'CCH' => pack("U", 0x94), # CANCEL CHARACTER + 'MW ' => pack("U", 0x95), # MESSAGE WAITING + 'SPA' => pack("U", 0x96), # START OF GUARDED AREA + 'EPA' => pack("U", 0x97), # END OF GUARDED AREA + 'SOS' => pack("U", 0x98), # START OF STRING + 'SCI' => pack("U", 0x9A), # SINGLE CHARACTER INTRODUCER + 'CSI' => pack("U", 0x9B), # CONTROL SEQUENCE INTRODUCER + 'ST ' => pack("U", 0x9C), # STRING TERMINATOR + 'OSC' => pack("U", 0x9D), # OPERATING SYSTEM COMMAND + 'PM ' => pack("U", 0x9E), # PRIVACY MESSAGE + 'APC' => pack("U", 0x9F), # APPLICATION PROGRAM COMMAND + + # There are no names for these in the Unicode standard; + # perhaps should be deprecated, but then again there are + # no alternative names, so am not deprecating. And if + # did, the code would have to change to not recommend an + # alternative for these. + 'PADDING CHARACTER' => pack("U", 0x80), + 'PAD' => pack("U", 0x80), + 'HIGH OCTET PRESET' => pack("U", 0x81), + 'HOP' => pack("U", 0x81), + 'INDEX' => pack("U", 0x84), + 'IND' => pack("U", 0x84), + 'SINGLE GRAPHIC CHARACTER INTRODUCER' => pack("U", 0x99), + 'SGC' => pack("U", 0x99), + + # More convenience. For further convenience, + # it is suggested some way of using the NamesList + # aliases be implemented, but there are ambiguities in + # NamesList.txt + 'BOM' => pack("U", 0xFEFF), # BYTE ORDER MARK + 'BYTE ORDER MARK'=> pack("U", 0xFEFF), + 'CGJ' => pack("U", 0x034F), # COMBINING GRAPHEME JOINER + 'FVS1' => pack("U", 0x180B), # MONGOLIAN FREE VARIATION SELECTOR ONE + 'FVS2' => pack("U", 0x180C), # MONGOLIAN FREE VARIATION SELECTOR TWO + 'FVS3' => pack("U", 0x180D), # MONGOLIAN FREE VARIATION SELECTOR THREE + 'LRE' => pack("U", 0x202A), # LEFT-TO-RIGHT EMBEDDING + 'LRM' => pack("U", 0x200E), # LEFT-TO-RIGHT MARK + 'LRO' => pack("U", 0x202D), # LEFT-TO-RIGHT OVERRIDE + 'MMSP' => pack("U", 0x205F), # MEDIUM MATHEMATICAL SPACE + 'MVS' => pack("U", 0x180E), # MONGOLIAN VOWEL SEPARATOR + 'NBSP' => pack("U", 0x00A0), # NO-BREAK SPACE + 'NNBSP' => pack("U", 0x202F), # NARROW NO-BREAK SPACE + 'PDF' => pack("U", 0x202C), # POP DIRECTIONAL FORMATTING + 'RLE' => pack("U", 0x202B), # RIGHT-TO-LEFT EMBEDDING + 'RLM' => pack("U", 0x200F), # RIGHT-TO-LEFT MARK + 'RLO' => pack("U", 0x202E), # RIGHT-TO-LEFT OVERRIDE + 'SHY' => pack("U", 0x00AD), # SOFT HYPHEN + 'VS1' => pack("U", 0xFE00), # VARIATION SELECTOR-1 + 'VS2' => pack("U", 0xFE01), # VARIATION SELECTOR-2 + 'VS3' => pack("U", 0xFE02), # VARIATION SELECTOR-3 + 'VS4' => pack("U", 0xFE03), # VARIATION SELECTOR-4 + 'VS5' => pack("U", 0xFE04), # VARIATION SELECTOR-5 + 'VS6' => pack("U", 0xFE05), # VARIATION SELECTOR-6 + 'VS7' => pack("U", 0xFE06), # VARIATION SELECTOR-7 + 'VS8' => pack("U", 0xFE07), # VARIATION SELECTOR-8 + 'VS9' => pack("U", 0xFE08), # VARIATION SELECTOR-9 + 'VS10' => pack("U", 0xFE09), # VARIATION SELECTOR-10 + 'VS11' => pack("U", 0xFE0A), # VARIATION SELECTOR-11 + 'VS12' => pack("U", 0xFE0B), # VARIATION SELECTOR-12 + 'VS13' => pack("U", 0xFE0C), # VARIATION SELECTOR-13 + 'VS14' => pack("U", 0xFE0D), # VARIATION SELECTOR-14 + 'VS15' => pack("U", 0xFE0E), # VARIATION SELECTOR-15 + 'VS16' => pack("U", 0xFE0F), # VARIATION SELECTOR-16 + 'VS17' => pack("U", 0xE0100), # VARIATION SELECTOR-17 + 'VS18' => pack("U", 0xE0101), # VARIATION SELECTOR-18 + 'VS19' => pack("U", 0xE0102), # VARIATION SELECTOR-19 + 'VS20' => pack("U", 0xE0103), # VARIATION SELECTOR-20 + 'VS21' => pack("U", 0xE0104), # VARIATION SELECTOR-21 + 'VS22' => pack("U", 0xE0105), # VARIATION SELECTOR-22 + 'VS23' => pack("U", 0xE0106), # VARIATION SELECTOR-23 + 'VS24' => pack("U", 0xE0107), # VARIATION SELECTOR-24 + 'VS25' => pack("U", 0xE0108), # VARIATION SELECTOR-25 + 'VS26' => pack("U", 0xE0109), # VARIATION SELECTOR-26 + 'VS27' => pack("U", 0xE010A), # VARIATION SELECTOR-27 + 'VS28' => pack("U", 0xE010B), # VARIATION SELECTOR-28 + 'VS29' => pack("U", 0xE010C), # VARIATION SELECTOR-29 + 'VS30' => pack("U", 0xE010D), # VARIATION SELECTOR-30 + 'VS31' => pack("U", 0xE010E), # VARIATION SELECTOR-31 + 'VS32' => pack("U", 0xE010F), # VARIATION SELECTOR-32 + 'VS33' => pack("U", 0xE0110), # VARIATION SELECTOR-33 + 'VS34' => pack("U", 0xE0111), # VARIATION SELECTOR-34 + 'VS35' => pack("U", 0xE0112), # VARIATION SELECTOR-35 + 'VS36' => pack("U", 0xE0113), # VARIATION SELECTOR-36 + 'VS37' => pack("U", 0xE0114), # VARIATION SELECTOR-37 + 'VS38' => pack("U", 0xE0115), # VARIATION SELECTOR-38 + 'VS39' => pack("U", 0xE0116), # VARIATION SELECTOR-39 + 'VS40' => pack("U", 0xE0117), # VARIATION SELECTOR-40 + 'VS41' => pack("U", 0xE0118), # VARIATION SELECTOR-41 + 'VS42' => pack("U", 0xE0119), # VARIATION SELECTOR-42 + 'VS43' => pack("U", 0xE011A), # VARIATION SELECTOR-43 + 'VS44' => pack("U", 0xE011B), # VARIATION SELECTOR-44 + 'VS45' => pack("U", 0xE011C), # VARIATION SELECTOR-45 + 'VS46' => pack("U", 0xE011D), # VARIATION SELECTOR-46 + 'VS47' => pack("U", 0xE011E), # VARIATION SELECTOR-47 + 'VS48' => pack("U", 0xE011F), # VARIATION SELECTOR-48 + 'VS49' => pack("U", 0xE0120), # VARIATION SELECTOR-49 + 'VS50' => pack("U", 0xE0121), # VARIATION SELECTOR-50 + 'VS51' => pack("U", 0xE0122), # VARIATION SELECTOR-51 + 'VS52' => pack("U", 0xE0123), # VARIATION SELECTOR-52 + 'VS53' => pack("U", 0xE0124), # VARIATION SELECTOR-53 + 'VS54' => pack("U", 0xE0125), # VARIATION SELECTOR-54 + 'VS55' => pack("U", 0xE0126), # VARIATION SELECTOR-55 + 'VS56' => pack("U", 0xE0127), # VARIATION SELECTOR-56 + 'VS57' => pack("U", 0xE0128), # VARIATION SELECTOR-57 + 'VS58' => pack("U", 0xE0129), # VARIATION SELECTOR-58 + 'VS59' => pack("U", 0xE012A), # VARIATION SELECTOR-59 + 'VS60' => pack("U", 0xE012B), # VARIATION SELECTOR-60 + 'VS61' => pack("U", 0xE012C), # VARIATION SELECTOR-61 + 'VS62' => pack("U", 0xE012D), # VARIATION SELECTOR-62 + 'VS63' => pack("U", 0xE012E), # VARIATION SELECTOR-63 + 'VS64' => pack("U", 0xE012F), # VARIATION SELECTOR-64 + 'VS65' => pack("U", 0xE0130), # VARIATION SELECTOR-65 + 'VS66' => pack("U", 0xE0131), # VARIATION SELECTOR-66 + 'VS67' => pack("U", 0xE0132), # VARIATION SELECTOR-67 + 'VS68' => pack("U", 0xE0133), # VARIATION SELECTOR-68 + 'VS69' => pack("U", 0xE0134), # VARIATION SELECTOR-69 + 'VS70' => pack("U", 0xE0135), # VARIATION SELECTOR-70 + 'VS71' => pack("U", 0xE0136), # VARIATION SELECTOR-71 + 'VS72' => pack("U", 0xE0137), # VARIATION SELECTOR-72 + 'VS73' => pack("U", 0xE0138), # VARIATION SELECTOR-73 + 'VS74' => pack("U", 0xE0139), # VARIATION SELECTOR-74 + 'VS75' => pack("U", 0xE013A), # VARIATION SELECTOR-75 + 'VS76' => pack("U", 0xE013B), # VARIATION SELECTOR-76 + 'VS77' => pack("U", 0xE013C), # VARIATION SELECTOR-77 + 'VS78' => pack("U", 0xE013D), # VARIATION SELECTOR-78 + 'VS79' => pack("U", 0xE013E), # VARIATION SELECTOR-79 + 'VS80' => pack("U", 0xE013F), # VARIATION SELECTOR-80 + 'VS81' => pack("U", 0xE0140), # VARIATION SELECTOR-81 + 'VS82' => pack("U", 0xE0141), # VARIATION SELECTOR-82 + 'VS83' => pack("U", 0xE0142), # VARIATION SELECTOR-83 + 'VS84' => pack("U", 0xE0143), # VARIATION SELECTOR-84 + 'VS85' => pack("U", 0xE0144), # VARIATION SELECTOR-85 + 'VS86' => pack("U", 0xE0145), # VARIATION SELECTOR-86 + 'VS87' => pack("U", 0xE0146), # VARIATION SELECTOR-87 + 'VS88' => pack("U", 0xE0147), # VARIATION SELECTOR-88 + 'VS89' => pack("U", 0xE0148), # VARIATION SELECTOR-89 + 'VS90' => pack("U", 0xE0149), # VARIATION SELECTOR-90 + 'VS91' => pack("U", 0xE014A), # VARIATION SELECTOR-91 + 'VS92' => pack("U", 0xE014B), # VARIATION SELECTOR-92 + 'VS93' => pack("U", 0xE014C), # VARIATION SELECTOR-93 + 'VS94' => pack("U", 0xE014D), # VARIATION SELECTOR-94 + 'VS95' => pack("U", 0xE014E), # VARIATION SELECTOR-95 + 'VS96' => pack("U", 0xE014F), # VARIATION SELECTOR-96 + 'VS97' => pack("U", 0xE0150), # VARIATION SELECTOR-97 + 'VS98' => pack("U", 0xE0151), # VARIATION SELECTOR-98 + 'VS99' => pack("U", 0xE0152), # VARIATION SELECTOR-99 + 'VS100' => pack("U", 0xE0153), # VARIATION SELECTOR-100 + 'VS101' => pack("U", 0xE0154), # VARIATION SELECTOR-101 + 'VS102' => pack("U", 0xE0155), # VARIATION SELECTOR-102 + 'VS103' => pack("U", 0xE0156), # VARIATION SELECTOR-103 + 'VS104' => pack("U", 0xE0157), # VARIATION SELECTOR-104 + 'VS105' => pack("U", 0xE0158), # VARIATION SELECTOR-105 + 'VS106' => pack("U", 0xE0159), # VARIATION SELECTOR-106 + 'VS107' => pack("U", 0xE015A), # VARIATION SELECTOR-107 + 'VS108' => pack("U", 0xE015B), # VARIATION SELECTOR-108 + 'VS109' => pack("U", 0xE015C), # VARIATION SELECTOR-109 + 'VS110' => pack("U", 0xE015D), # VARIATION SELECTOR-110 + 'VS111' => pack("U", 0xE015E), # VARIATION SELECTOR-111 + 'VS112' => pack("U", 0xE015F), # VARIATION SELECTOR-112 + 'VS113' => pack("U", 0xE0160), # VARIATION SELECTOR-113 + 'VS114' => pack("U", 0xE0161), # VARIATION SELECTOR-114 + 'VS115' => pack("U", 0xE0162), # VARIATION SELECTOR-115 + 'VS116' => pack("U", 0xE0163), # VARIATION SELECTOR-116 + 'VS117' => pack("U", 0xE0164), # VARIATION SELECTOR-117 + 'VS118' => pack("U", 0xE0165), # VARIATION SELECTOR-118 + 'VS119' => pack("U", 0xE0166), # VARIATION SELECTOR-119 + 'VS120' => pack("U", 0xE0167), # VARIATION SELECTOR-120 + 'VS121' => pack("U", 0xE0168), # VARIATION SELECTOR-121 + 'VS122' => pack("U", 0xE0169), # VARIATION SELECTOR-122 + 'VS123' => pack("U", 0xE016A), # VARIATION SELECTOR-123 + 'VS124' => pack("U", 0xE016B), # VARIATION SELECTOR-124 + 'VS125' => pack("U", 0xE016C), # VARIATION SELECTOR-125 + 'VS126' => pack("U", 0xE016D), # VARIATION SELECTOR-126 + 'VS127' => pack("U", 0xE016E), # VARIATION SELECTOR-127 + 'VS128' => pack("U", 0xE016F), # VARIATION SELECTOR-128 + 'VS129' => pack("U", 0xE0170), # VARIATION SELECTOR-129 + 'VS130' => pack("U", 0xE0171), # VARIATION SELECTOR-130 + 'VS131' => pack("U", 0xE0172), # VARIATION SELECTOR-131 + 'VS132' => pack("U", 0xE0173), # VARIATION SELECTOR-132 + 'VS133' => pack("U", 0xE0174), # VARIATION SELECTOR-133 + 'VS134' => pack("U", 0xE0175), # VARIATION SELECTOR-134 + 'VS135' => pack("U", 0xE0176), # VARIATION SELECTOR-135 + 'VS136' => pack("U", 0xE0177), # VARIATION SELECTOR-136 + 'VS137' => pack("U", 0xE0178), # VARIATION SELECTOR-137 + 'VS138' => pack("U", 0xE0179), # VARIATION SELECTOR-138 + 'VS139' => pack("U", 0xE017A), # VARIATION SELECTOR-139 + 'VS140' => pack("U", 0xE017B), # VARIATION SELECTOR-140 + 'VS141' => pack("U", 0xE017C), # VARIATION SELECTOR-141 + 'VS142' => pack("U", 0xE017D), # VARIATION SELECTOR-142 + 'VS143' => pack("U", 0xE017E), # VARIATION SELECTOR-143 + 'VS144' => pack("U", 0xE017F), # VARIATION SELECTOR-144 + 'VS145' => pack("U", 0xE0180), # VARIATION SELECTOR-145 + 'VS146' => pack("U", 0xE0181), # VARIATION SELECTOR-146 + 'VS147' => pack("U", 0xE0182), # VARIATION SELECTOR-147 + 'VS148' => pack("U", 0xE0183), # VARIATION SELECTOR-148 + 'VS149' => pack("U", 0xE0184), # VARIATION SELECTOR-149 + 'VS150' => pack("U", 0xE0185), # VARIATION SELECTOR-150 + 'VS151' => pack("U", 0xE0186), # VARIATION SELECTOR-151 + 'VS152' => pack("U", 0xE0187), # VARIATION SELECTOR-152 + 'VS153' => pack("U", 0xE0188), # VARIATION SELECTOR-153 + 'VS154' => pack("U", 0xE0189), # VARIATION SELECTOR-154 + 'VS155' => pack("U", 0xE018A), # VARIATION SELECTOR-155 + 'VS156' => pack("U", 0xE018B), # VARIATION SELECTOR-156 + 'VS157' => pack("U", 0xE018C), # VARIATION SELECTOR-157 + 'VS158' => pack("U", 0xE018D), # VARIATION SELECTOR-158 + 'VS159' => pack("U", 0xE018E), # VARIATION SELECTOR-159 + 'VS160' => pack("U", 0xE018F), # VARIATION SELECTOR-160 + 'VS161' => pack("U", 0xE0190), # VARIATION SELECTOR-161 + 'VS162' => pack("U", 0xE0191), # VARIATION SELECTOR-162 + 'VS163' => pack("U", 0xE0192), # VARIATION SELECTOR-163 + 'VS164' => pack("U", 0xE0193), # VARIATION SELECTOR-164 + 'VS165' => pack("U", 0xE0194), # VARIATION SELECTOR-165 + 'VS166' => pack("U", 0xE0195), # VARIATION SELECTOR-166 + 'VS167' => pack("U", 0xE0196), # VARIATION SELECTOR-167 + 'VS168' => pack("U", 0xE0197), # VARIATION SELECTOR-168 + 'VS169' => pack("U", 0xE0198), # VARIATION SELECTOR-169 + 'VS170' => pack("U", 0xE0199), # VARIATION SELECTOR-170 + 'VS171' => pack("U", 0xE019A), # VARIATION SELECTOR-171 + 'VS172' => pack("U", 0xE019B), # VARIATION SELECTOR-172 + 'VS173' => pack("U", 0xE019C), # VARIATION SELECTOR-173 + 'VS174' => pack("U", 0xE019D), # VARIATION SELECTOR-174 + 'VS175' => pack("U", 0xE019E), # VARIATION SELECTOR-175 + 'VS176' => pack("U", 0xE019F), # VARIATION SELECTOR-176 + 'VS177' => pack("U", 0xE01A0), # VARIATION SELECTOR-177 + 'VS178' => pack("U", 0xE01A1), # VARIATION SELECTOR-178 + 'VS179' => pack("U", 0xE01A2), # VARIATION SELECTOR-179 + 'VS180' => pack("U", 0xE01A3), # VARIATION SELECTOR-180 + 'VS181' => pack("U", 0xE01A4), # VARIATION SELECTOR-181 + 'VS182' => pack("U", 0xE01A5), # VARIATION SELECTOR-182 + 'VS183' => pack("U", 0xE01A6), # VARIATION SELECTOR-183 + 'VS184' => pack("U", 0xE01A7), # VARIATION SELECTOR-184 + 'VS185' => pack("U", 0xE01A8), # VARIATION SELECTOR-185 + 'VS186' => pack("U", 0xE01A9), # VARIATION SELECTOR-186 + 'VS187' => pack("U", 0xE01AA), # VARIATION SELECTOR-187 + 'VS188' => pack("U", 0xE01AB), # VARIATION SELECTOR-188 + 'VS189' => pack("U", 0xE01AC), # VARIATION SELECTOR-189 + 'VS190' => pack("U", 0xE01AD), # VARIATION SELECTOR-190 + 'VS191' => pack("U", 0xE01AE), # VARIATION SELECTOR-191 + 'VS192' => pack("U", 0xE01AF), # VARIATION SELECTOR-192 + 'VS193' => pack("U", 0xE01B0), # VARIATION SELECTOR-193 + 'VS194' => pack("U", 0xE01B1), # VARIATION SELECTOR-194 + 'VS195' => pack("U", 0xE01B2), # VARIATION SELECTOR-195 + 'VS196' => pack("U", 0xE01B3), # VARIATION SELECTOR-196 + 'VS197' => pack("U", 0xE01B4), # VARIATION SELECTOR-197 + 'VS198' => pack("U", 0xE01B5), # VARIATION SELECTOR-198 + 'VS199' => pack("U", 0xE01B6), # VARIATION SELECTOR-199 + 'VS200' => pack("U", 0xE01B7), # VARIATION SELECTOR-200 + 'VS201' => pack("U", 0xE01B8), # VARIATION SELECTOR-201 + 'VS202' => pack("U", 0xE01B9), # VARIATION SELECTOR-202 + 'VS203' => pack("U", 0xE01BA), # VARIATION SELECTOR-203 + 'VS204' => pack("U", 0xE01BB), # VARIATION SELECTOR-204 + 'VS205' => pack("U", 0xE01BC), # VARIATION SELECTOR-205 + 'VS206' => pack("U", 0xE01BD), # VARIATION SELECTOR-206 + 'VS207' => pack("U", 0xE01BE), # VARIATION SELECTOR-207 + 'VS208' => pack("U", 0xE01BF), # VARIATION SELECTOR-208 + 'VS209' => pack("U", 0xE01C0), # VARIATION SELECTOR-209 + 'VS210' => pack("U", 0xE01C1), # VARIATION SELECTOR-210 + 'VS211' => pack("U", 0xE01C2), # VARIATION SELECTOR-211 + 'VS212' => pack("U", 0xE01C3), # VARIATION SELECTOR-212 + 'VS213' => pack("U", 0xE01C4), # VARIATION SELECTOR-213 + 'VS214' => pack("U", 0xE01C5), # VARIATION SELECTOR-214 + 'VS215' => pack("U", 0xE01C6), # VARIATION SELECTOR-215 + 'VS216' => pack("U", 0xE01C7), # VARIATION SELECTOR-216 + 'VS217' => pack("U", 0xE01C8), # VARIATION SELECTOR-217 + 'VS218' => pack("U", 0xE01C9), # VARIATION SELECTOR-218 + 'VS219' => pack("U", 0xE01CA), # VARIATION SELECTOR-219 + 'VS220' => pack("U", 0xE01CB), # VARIATION SELECTOR-220 + 'VS221' => pack("U", 0xE01CC), # VARIATION SELECTOR-221 + 'VS222' => pack("U", 0xE01CD), # VARIATION SELECTOR-222 + 'VS223' => pack("U", 0xE01CE), # VARIATION SELECTOR-223 + 'VS224' => pack("U", 0xE01CF), # VARIATION SELECTOR-224 + 'VS225' => pack("U", 0xE01D0), # VARIATION SELECTOR-225 + 'VS226' => pack("U", 0xE01D1), # VARIATION SELECTOR-226 + 'VS227' => pack("U", 0xE01D2), # VARIATION SELECTOR-227 + 'VS228' => pack("U", 0xE01D3), # VARIATION SELECTOR-228 + 'VS229' => pack("U", 0xE01D4), # VARIATION SELECTOR-229 + 'VS230' => pack("U", 0xE01D5), # VARIATION SELECTOR-230 + 'VS231' => pack("U", 0xE01D6), # VARIATION SELECTOR-231 + 'VS232' => pack("U", 0xE01D7), # VARIATION SELECTOR-232 + 'VS233' => pack("U", 0xE01D8), # VARIATION SELECTOR-233 + 'VS234' => pack("U", 0xE01D9), # VARIATION SELECTOR-234 + 'VS235' => pack("U", 0xE01DA), # VARIATION SELECTOR-235 + 'VS236' => pack("U", 0xE01DB), # VARIATION SELECTOR-236 + 'VS237' => pack("U", 0xE01DC), # VARIATION SELECTOR-237 + 'VS238' => pack("U", 0xE01DD), # VARIATION SELECTOR-238 + 'VS239' => pack("U", 0xE01DE), # VARIATION SELECTOR-239 + 'VS240' => pack("U", 0xE01DF), # VARIATION SELECTOR-240 + 'VS241' => pack("U", 0xE01E0), # VARIATION SELECTOR-241 + 'VS242' => pack("U", 0xE01E1), # VARIATION SELECTOR-242 + 'VS243' => pack("U", 0xE01E2), # VARIATION SELECTOR-243 + 'VS244' => pack("U", 0xE01E3), # VARIATION SELECTOR-244 + 'VS245' => pack("U", 0xE01E4), # VARIATION SELECTOR-245 + 'VS246' => pack("U", 0xE01E5), # VARIATION SELECTOR-246 + 'VS247' => pack("U", 0xE01E6), # VARIATION SELECTOR-247 + 'VS248' => pack("U", 0xE01E7), # VARIATION SELECTOR-248 + 'VS249' => pack("U", 0xE01E8), # VARIATION SELECTOR-249 + 'VS250' => pack("U", 0xE01E9), # VARIATION SELECTOR-250 + 'VS251' => pack("U", 0xE01EA), # VARIATION SELECTOR-251 + 'VS252' => pack("U", 0xE01EB), # VARIATION SELECTOR-252 + 'VS253' => pack("U", 0xE01EC), # VARIATION SELECTOR-253 + 'VS254' => pack("U", 0xE01ED), # VARIATION SELECTOR-254 + 'VS255' => pack("U", 0xE01EE), # VARIATION SELECTOR-255 + 'VS256' => pack("U", 0xE01EF), # VARIATION SELECTOR-256 + 'WJ' => pack("U", 0x2060), # WORD JOINER + 'ZWJ' => pack("U", 0x200D), # ZERO WIDTH JOINER + 'ZWNJ' => pack("U", 0x200C), # ZERO WIDTH NON-JOINER + 'ZWSP' => pack("U", 0x200B), # ZERO WIDTH SPACE +); my %deprecated_aliases = ( - # Pre-3.2 compatibility (only for the first 256 characters). - # Use of these gives deprecated message. - 'HORIZONTAL TABULATION' => pack("U", 0x09), # CHARACTER TABULATION - 'VERTICAL TABULATION' => pack("U", 0x0B), # LINE TABULATION - 'FILE SEPARATOR' => pack("U", 0x1C), # INFORMATION SEPARATOR FOUR - 'GROUP SEPARATOR' => pack("U", 0x1D), # INFORMATION SEPARATOR THREE - 'RECORD SEPARATOR' => pack("U", 0x1E), # INFORMATION SEPARATOR TWO - 'UNIT SEPARATOR' => pack("U", 0x1F), # INFORMATION SEPARATOR ONE - 'HORIZONTAL TABULATION SET' => pack("U", 0x88), # CHARACTER TABULATION SET - 'HORIZONTAL TABULATION WITH JUSTIFICATION' => pack("U", 0x89), # CHARACTER TABULATION WITH JUSTIFICATION - 'PARTIAL LINE DOWN' => pack("U", 0x8B), # PARTIAL LINE FORWARD - 'PARTIAL LINE UP' => pack("U", 0x8C), # PARTIAL LINE BACKWARD - 'VERTICAL TABULATION SET' => pack("U", 0x8A), # LINE TABULATION SET - 'REVERSE INDEX' => pack("U", 0x8D), # REVERSE LINE FEED - ); + # Pre-3.2 compatibility (only for the first 256 characters). + # Use of these gives deprecated message. + 'HORIZONTAL TABULATION' => pack("U", 0x09), # CHARACTER TABULATION + 'VERTICAL TABULATION' => pack("U", 0x0B), # LINE TABULATION + 'FILE SEPARATOR' => pack("U", 0x1C), # INFORMATION SEPARATOR FOUR + 'GROUP SEPARATOR' => pack("U", 0x1D), # INFORMATION SEPARATOR THREE + 'RECORD SEPARATOR' => pack("U", 0x1E), # INFORMATION SEPARATOR TWO + 'UNIT SEPARATOR' => pack("U", 0x1F), # INFORMATION SEPARATOR ONE + 'HORIZONTAL TABULATION SET' => pack("U", 0x88), # CHARACTER TABULATION SET + 'HORIZONTAL TABULATION WITH JUSTIFICATION' => pack("U", 0x89), # CHARACTER TABULATION WITH JUSTIFICATION + 'PARTIAL LINE DOWN' => pack("U", 0x8B), # PARTIAL LINE FORWARD + 'PARTIAL LINE UP' => pack("U", 0x8C), # PARTIAL LINE BACKWARD + 'VERTICAL TABULATION SET' => pack("U", 0x8A), # LINE TABULATION SET + 'REVERSE INDEX' => pack("U", 0x8D), # REVERSE LINE FEED +); my $txt; # The table of official character names @@ -970,32 +970,32 @@ charnames - access to Unicode character names and named character sequences; als =head1 SYNOPSIS - use charnames ':full'; - print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n"; - print "\N{LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW}", - " is an officially named sequence of two Unicode characters\n"; - - use charnames ':short'; - print "\N{greek:Sigma} is an upper-case sigma.\n"; - - use charnames qw(cyrillic greek); - print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n"; - - use charnames ":full", ":alias" => { - e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE", - mychar => 0xE8000, # Private use area - }; - print "\N{e_ACUTE} is a small letter e with an acute.\n"; - print "\\N{mychar} allows me to name private use characters.\n"; - - use charnames (); - print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE" - printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints - # "10330" - print charnames::vianame("LATIN CAPITAL LETTER A"); # prints 65 on - # ASCII platforms; - # 193 on EBCDIC - print charnames::string_vianame("LATIN CAPITAL LETTER A"); # prints "A" + use charnames ':full'; + print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n"; + print "\N{LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW}", + " is an officially named sequence of two Unicode characters\n"; + + use charnames ':short'; + print "\N{greek:Sigma} is an upper-case sigma.\n"; + + use charnames qw(cyrillic greek); + print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n"; + + use charnames ":full", ":alias" => { + e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE", + mychar => 0xE8000, # Private use area + }; + print "\N{e_ACUTE} is a small letter e with an acute.\n"; + print "\\N{mychar} allows me to name private use characters.\n"; + + use charnames (); + print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE" + printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints + # "10330" + print charnames::vianame("LATIN CAPITAL LETTER A"); # prints 65 on + # ASCII platforms; + # 193 on EBCDIC + print charnames::string_vianame("LATIN CAPITAL LETTER A"); # prints "A" =head1 DESCRIPTION -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0019-charnames.pm-reformat-comments.patch ```diff From 033e18638fea6316688ec9657c9817bd9c9c0e37 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 12 Sep 2010 22:19:54 -0600 Subject: [PATCH] charnames.pm: reformat comments Now that have less indent, don't need so many lines. The only changes in this commit are several blocks of comments to occupy more of each line. No wording changes are involved. --- lib/charnames.pm | 14 ++++++-------- 1 files changed, 6 insertions(+), 8 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index 2ff0c1d..677edfc 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -90,11 +90,10 @@ my %system_aliases = ( 'PM ' => pack("U", 0x9E), # PRIVACY MESSAGE 'APC' => pack("U", 0x9F), # APPLICATION PROGRAM COMMAND - # There are no names for these in the Unicode standard; - # perhaps should be deprecated, but then again there are - # no alternative names, so am not deprecating. And if - # did, the code would have to change to not recommend an - # alternative for these. + # There are no names for these in the Unicode standard; perhaps should be + # deprecated, but then again there are no alternative names, so am not + # deprecating. And if did, the code would have to change to not recommend + # an alternative for these. 'PADDING CHARACTER' => pack("U", 0x80), 'PAD' => pack("U", 0x80), 'HIGH OCTET PRESET' => pack("U", 0x81), @@ -104,9 +103,8 @@ my %system_aliases = ( 'SINGLE GRAPHIC CHARACTER INTRODUCER' => pack("U", 0x99), 'SGC' => pack("U", 0x99), - # More convenience. For further convenience, - # it is suggested some way of using the NamesList - # aliases be implemented, but there are ambiguities in + # More convenience. For further convenience, it is suggested some way of + # using the NamesList aliases be implemented, but there are ambiguities in # NamesList.txt 'BOM' => pack("U", 0xFEFF), # BYTE ORDER MARK 'BYTE ORDER MARK'=> pack("U", 0xFEFF), -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @cpansprout

On Sun Sep 12 21​:29​:22 2010\, public@​khwilliamson.com wrote​:

This series of commits mainly adds to Perl support for Unicode named character sequences. They still aren't accepted in regular expression bracketed character classes\, as those currently can handle only single character elements.

There are also some small performance enhancements to charnames\, and more randomized testing.

This patch is also on github at git​://github.com/khwilliamson/perl.git branch charnames.

These patches look fine to me\, so I’ve just pushed them as​:

a79b922baa417139b1a0a4393e181b72d5ebc030 b1c167a3f17cc65c27981e99ce05526cb080220d 73d9566fc50e22faa4e73bd2038ccdce8321e862 f3397f68aadcbcd597a7272af2d183071061cb63 bc13d9e29250cbf25d87d52f7cca07e5d2a63b1d a6769a7827140e1470d8ef0f4e57f7643757ca72 f1ccd77d455bc9db83c3cf6395f791c0a78f8954 2b03cdd54aa6cdc31eb146937438e523f8691096 835df198d2bff566733a20d3ed123d32883ebe29 92a56f4b8cc8311af66b29cae4f69855ee059544 34ee1bc7e38c47ad1d9c71d70c57deb3a31bcefa 35a5ace5b48bd7ca4aef7f20f0f5843da3365ec3 a2cfe56dbcb3330a37926b668c136f00360986a0 9deebca376903e87a5f8496ce3baf418d3e9d0b7 06ee63cd8792bb62ac70a693a5f6e7af1a16ea05 8ebef31d4feab4b7c35ff0eb427632a67b1abdd9 fb121860c2407cd1d1566d63a95a5220fa93d8e4 bcc08981dcac9739492da9cf6f4de3830c6a6a66 81965e2b9c462d35b04a8c01631dbb664e3b9431

I hope nobody kills me for doing that!

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

@cpansprout - Status changed from 'open' to 'resolved'