Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

There doesn't exist a way for a user to create a custom \N{} alias for a private use code point #10410

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#75450 (status was 'resolved')

Searchable as RT75450$

p5pRT commented 14 years ago

From @khwilliamson

This is a bug report for perl from khw@​khw-desktop.nonet\, generated with the help of perlbug 1.39 running under perl 5.13.1.


A user-defined \N{} alias must currently map to a known Unicode name. This precludes using the Unicode private use areas.



Flags​:   category=library   severity=medium   module=charnames


Site configuration information for perl 5.13.1​:

Configured by khw at Mon May 24 07​:06​:34 MDT 2010.

Summary of my perl5 (revision 5 version 13 subversion 1) configuration​:   Commit id​: b081dd7eaaec2b6ee43335645ab40cff0ca3f91a   Platform​:   osname=linux\, osvers=2.6.27-17-generic\, archname=i686-linux   uname='linux khw-desktop 2.6.27-17-generic #1 smp fri mar 12 03​:09​:00 utc 2010 i686 gnulinux '   config_args='-s -d -Dprefix=/home/khw/blead -Dusedevel -D'optimize=-g3' -A'optimize=-g3' -A'optimize=-O0''   hint=recommended\, useposix=true\, d_sigaction=define   useithreads=undef\, usemultiplicity=undef   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=undef\, use64bitall=undef\, uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-O0 -g3'\,   cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'   ccversion=''\, gccversion='4.3.2'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -fstack-protector -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib   libs=-lnsl -ldl -lm -lcrypt -lutil -lc   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc   libc=/lib/libc-2.8.90.so\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version='2.8.90'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'\, lddlflags='-shared -g3 -g3 -O0 -L/usr/local/lib -fstack-protector'

Locally applied patches​:


@​INC for perl 5.13.1​:   lib   /home/khw/blead/lib/perl5/site_perl/5.13.1/i686-linux   /home/khw/blead/lib/perl5/site_perl/5.13.1   /home/khw/blead/lib/perl5/5.13.1/i686-linux   /home/khw/blead/lib/perl5/5.13.1   /home/khw/blead/lib/perl5/site_perl   .


Environment for perl 5.13.1​:   HOME=/home/khw   LANG=en_US.UTF-8   LANGUAGE (unset)   LD_LIBRARY_PATH (unset)   LOGDIR (unset)

PATH=/home/khw/bin​:/home/khw/print/bin​:/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/usr/games​:/opt/real/RealPlayer​:/home/khw/cxoffice/bin   PERL_BADLANG (unset)   SHELL=/bin/ksh

p5pRT commented 14 years ago

From @khwilliamson

This series of commits does most of my planned changes to charnames for 5.14.

It adds the abbreviations of the controls\, like BEL\, to its repertoire\, and makes sure vianame() operates on the same domain as \N{}. It also adds all the commonly accepted abbreviations that Unicode publishes\, so things like NBSP\, SHY\, join BOM as being recognized.

It fixes #75450 so that a user can now name private use code points\, as well as any other. It fixes the bug whereby if a user already had defined NBSP to be something else\, our new abbreviation would have clobbered it.

It more than doubles the speed of viacode().

The pod is extensively cleaned up\, and perldelta to correspond with this patch.

Many new tests are added.

p5pRT commented 14 years ago

From @khwilliamson

0001-Add-a-number-of-abbrs-and-variants-to-N.patch ```diff From 94a6f3f033be57239a9ad63583ca3765e049e640 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Tue, 29 Jun 2010 12:54:33 -0600 Subject: [PATCH] Add a number of abbrs and variants to \N{} This patch adds the standard abbreviations for the control characters (such as ACK, BEL, etc) to the repertoire that \N{} knows about. It also adds a few common variants of their full names, and the old names for the 4 controls that Unicode has chosen not to have any names at all for. The patch also adds all the abbreviations that Unicode lists in 5.2 for longer characters, such as NBSP, SHY, LRE, ... To preserve complete backward compatibilty for these and future changes, user-defined aliases are now checked first, before these are. As a performance enhancement, these aliases are mapped to their actual code values instead of their full names which then had to be looked up in the large table. Now that is avoided, and the table is not loaded at all until a name is encountered that is not one of these aliases. The pod and .t are updated. --- lib/charnames.pm | 539 +++++++++++++++++++++++++++++++++++++++++++------ lib/charnames.t | 430 +++++++++++++++++++++++++++++++++++++++- pod/perl5133delta.pod | 11 + 3 files changed, 918 insertions(+), 62 deletions(-) diff --git a/lib/charnames.pm b/lib/charnames.pm index eddf66a..578ca46 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -2,44 +2,406 @@ package charnames; use strict; use warnings; use File::Spec; -our $VERSION = '1.08'; +our $VERSION = '1.09'; use bytes (); # for $bytes::hint_bits my %alias1 = ( - # Icky 3.2 names with parentheses. - 'LINE FEED' => 'LINE FEED (LF)', - 'FORM FEED' => 'FORM FEED (FF)', - 'CARRIAGE RETURN' => 'CARRIAGE RETURN (CR)', - 'NEXT LINE' => 'NEXT LINE (NEL)', - # Convenience. - 'LF' => 'LINE FEED (LF)', - 'FF' => 'FORM FEED (FF)', - 'CR' => 'CARRIAGE RETURN (CR)', - 'NEL' => 'NEXT LINE (NEL)', - # More convenience. For futher convencience, - # it is suggested some way using using the NamesList - # aliases is implemented. - 'ZWNJ' => 'ZERO WIDTH NON-JOINER', - 'ZWJ' => 'ZERO WIDTH JOINER', - 'BOM' => 'BYTE ORDER MARK', - ); + # Icky 3.2 names with parentheses. + 'LINE FEED' => 0x0A, # LINE FEED (LF) + 'FORM FEED' => 0x0C, # FORM FEED (FF) + 'CARRIAGE RETURN' => 0x0D, # CARRIAGE RETURN (CR) + 'NEXT LINE' => 0x85, # NEXT LINE (NEL) + + # Some variant names from Wikipedia + 'SINGLE-SHIFT 2' => 0x8E, + 'SINGLE-SHIFT 3' => 0x8F, + 'PRIVATE USE 1' => 0x91, + 'PRIVATE USE 2' => 0x92, + 'START OF PROTECTED AREA' => 0x96, + 'END OF PROTECTED AREA' => 0x97, + + # Convenience. Standard abbreviations for the controls + 'NUL' => 0x00, # NULL + 'SOH' => 0x01, # START OF HEADING + 'STX' => 0x02, # START OF TEXT + 'ETX' => 0x03, # END OF TEXT + 'EOT' => 0x04, # END OF TRANSMISSION + 'ENQ' => 0x05, # ENQUIRY + 'ACK' => 0x06, # ACKNOWLEDGE + 'BEL' => 0x07, # BELL + 'BS' => 0x08, # BACKSPACE + 'HT' => 0x09, # HORIZONTAL TABULATION + 'LF' => 0x0A, # LINE FEED (LF) + 'VT' => 0x0B, # VERTICAL TABULATION + 'FF' => 0x0C, # FORM FEED (FF) + 'CR' => 0x0D, # CARRIAGE RETURN (CR) + 'SO' => 0x0E, # SHIFT OUT + 'SI' => 0x0F, # SHIFT IN + 'DLE' => 0x10, # DATA LINK ESCAPE + 'DC1' => 0x11, # DEVICE CONTROL ONE + 'DC2' => 0x12, # DEVICE CONTROL TWO + 'DC3' => 0x13, # DEVICE CONTROL THREE + 'DC4' => 0x14, # DEVICE CONTROL FOUR + 'NAK' => 0x15, # NEGATIVE ACKNOWLEDGE + 'SYN' => 0x16, # SYNCHRONOUS IDLE + 'ETB' => 0x17, # END OF TRANSMISSION BLOCK + 'CAN' => 0x18, # CANCEL + 'EOM' => 0x19, # END OF MEDIUM + 'SUB' => 0x1A, # SUBSTITUTE + 'ESC' => 0x1B, # ESCAPE + 'FS' => 0x1C, # FILE SEPARATOR + 'GS' => 0x1D, # GROUP SEPARATOR + 'RS' => 0x1E, # RECORD SEPARATOR + 'US' => 0x1F, # UNIT SEPARATOR + 'DEL' => 0x7F, # DELETE + 'BPH' => 0x82, # BREAK PERMITTED HERE + 'NBH' => 0x83, # NO BREAK HERE + 'NEL' => 0x85, # NEXT LINE (NEL) + 'SSA' => 0x86, # START OF SELECTED AREA + 'ESA' => 0x87, # END OF SELECTED AREA + 'HTS' => 0x88, # CHARACTER TABULATION SET + 'HTJ' => 0x89, # CHARACTER TABULATION WITH JUSTIFICATION + 'VTS' => 0x8A, # LINE TABULATION SET + 'PLD' => 0x8B, # PARTIAL LINE FORWARD + 'PLU' => 0x8C, # PARTIAL LINE BACKWARD + 'RI ' => 0x8D, # REVERSE LINE FEED + 'SS2' => 0x8E, # SINGLE SHIFT TWO + 'SS3' => 0x8F, # SINGLE SHIFT THREE + 'DCS' => 0x90, # DEVICE CONTROL STRING + 'PU1' => 0x91, # PRIVATE USE ONE + 'PU2' => 0x92, # PRIVATE USE TWO + 'STS' => 0x93, # SET TRANSMIT STATE + 'CCH' => 0x94, # CANCEL CHARACTER + 'MW ' => 0x95, # MESSAGE WAITING + 'SPA' => 0x96, # START OF GUARDED AREA + 'EPA' => 0x97, # END OF GUARDED AREA + 'SOS' => 0x98, # START OF STRING + 'SCI' => 0x9A, # SINGLE CHARACTER INTRODUCER + 'CSI' => 0x9B, # CONTROL SEQUENCE INTRODUCER + 'ST ' => 0x9C, # STRING TERMINATOR + 'OSC' => 0x9D, # OPERATING SYSTEM COMMAND + 'PM ' => 0x9E, # PRIVACY MESSAGE + 'APC' => 0x9F, # APPLICATION PROGRAM COMMAND + + # There are no names for these in the Unicode standard; + # perhaps should be deprecated, but then again there are + # no alternative names, so am not deprecating. And if + # did, the code would have to change to not recommend an + # alternative for these. + 'PADDING CHARACTER' => 0x80, + 'PAD' => 0x80, + 'HIGH OCTET PRESET' => 0x81, + 'HOP' => 0x81, + 'INDEX' => 0x84, + 'IND' => 0x84, + 'SINGLE GRAPHIC CHARACTER INTRODUCER' => 0x99, + 'SGC' => 0x99, + + # More convenience. For further convenience, + # it is suggested some way of using the NamesList + # aliases be implemented, but there are ambiguities in + # NamesList.txt) + 'BOM' => 0xFEFF, # BYTE ORDER MARK + 'BYTE ORDER MARK'=> 0xFEFF, + 'CGJ' => 0x034F, # COMBINING GRAPHEME JOINER + 'FVS1' => 0x180B, # MONGOLIAN FREE VARIATION SELECTOR ONE + 'FVS2' => 0x180C, # MONGOLIAN FREE VARIATION SELECTOR TWO + 'FVS3' => 0x180D, # MONGOLIAN FREE VARIATION SELECTOR THREE + 'LRE' => 0x202A, # LEFT-TO-RIGHT EMBEDDING + 'LRM' => 0x200E, # LEFT-TO-RIGHT MARK + 'LRO' => 0x202D, # LEFT-TO-RIGHT OVERRIDE + 'MMSP' => 0x205F, # MEDIUM MATHEMATICAL SPACE + 'MVS' => 0x180E, # MONGOLIAN VOWEL SEPARATOR + 'NBSP' => 0x00A0, # NO-BREAK SPACE + 'NNBSP' => 0x202F, # NARROW NO-BREAK SPACE + 'PDF' => 0x202C, # POP DIRECTIONAL FORMATTING + 'RLE' => 0x202B, # RIGHT-TO-LEFT EMBEDDING + 'RLM' => 0x200F, # RIGHT-TO-LEFT MARK + 'RLO' => 0x202E, # RIGHT-TO-LEFT OVERRIDE + 'SHY' => 0x00AD, # SOFT HYPHEN + 'VS1' => 0xFE00, # VARIATION SELECTOR-1 + 'VS2' => 0xFE01, # VARIATION SELECTOR-2 + 'VS3' => 0xFE02, # VARIATION SELECTOR-3 + 'VS4' => 0xFE03, # VARIATION SELECTOR-4 + 'VS5' => 0xFE04, # VARIATION SELECTOR-5 + 'VS6' => 0xFE05, # VARIATION SELECTOR-6 + 'VS7' => 0xFE06, # VARIATION SELECTOR-7 + 'VS8' => 0xFE07, # VARIATION SELECTOR-8 + 'VS9' => 0xFE08, # VARIATION SELECTOR-9 + 'VS10' => 0xFE09, # VARIATION SELECTOR-10 + 'VS11' => 0xFE0A, # VARIATION SELECTOR-11 + 'VS12' => 0xFE0B, # VARIATION SELECTOR-12 + 'VS13' => 0xFE0C, # VARIATION SELECTOR-13 + 'VS14' => 0xFE0D, # VARIATION SELECTOR-14 + 'VS15' => 0xFE0E, # VARIATION SELECTOR-15 + 'VS16' => 0xFE0F, # VARIATION SELECTOR-16 + 'VS17' => 0xE0100, # VARIATION SELECTOR-17 + 'VS18' => 0xE0101, # VARIATION SELECTOR-18 + 'VS19' => 0xE0102, # VARIATION SELECTOR-19 + 'VS20' => 0xE0103, # VARIATION SELECTOR-20 + 'VS21' => 0xE0104, # VARIATION SELECTOR-21 + 'VS22' => 0xE0105, # VARIATION SELECTOR-22 + 'VS23' => 0xE0106, # VARIATION SELECTOR-23 + 'VS24' => 0xE0107, # VARIATION SELECTOR-24 + 'VS25' => 0xE0108, # VARIATION SELECTOR-25 + 'VS26' => 0xE0109, # VARIATION SELECTOR-26 + 'VS27' => 0xE010A, # VARIATION SELECTOR-27 + 'VS28' => 0xE010B, # VARIATION SELECTOR-28 + 'VS29' => 0xE010C, # VARIATION SELECTOR-29 + 'VS30' => 0xE010D, # VARIATION SELECTOR-30 + 'VS31' => 0xE010E, # VARIATION SELECTOR-31 + 'VS32' => 0xE010F, # VARIATION SELECTOR-32 + 'VS33' => 0xE0110, # VARIATION SELECTOR-33 + 'VS34' => 0xE0111, # VARIATION SELECTOR-34 + 'VS35' => 0xE0112, # VARIATION SELECTOR-35 + 'VS36' => 0xE0113, # VARIATION SELECTOR-36 + 'VS37' => 0xE0114, # VARIATION SELECTOR-37 + 'VS38' => 0xE0115, # VARIATION SELECTOR-38 + 'VS39' => 0xE0116, # VARIATION SELECTOR-39 + 'VS40' => 0xE0117, # VARIATION SELECTOR-40 + 'VS41' => 0xE0118, # VARIATION SELECTOR-41 + 'VS42' => 0xE0119, # VARIATION SELECTOR-42 + 'VS43' => 0xE011A, # VARIATION SELECTOR-43 + 'VS44' => 0xE011B, # VARIATION SELECTOR-44 + 'VS45' => 0xE011C, # VARIATION SELECTOR-45 + 'VS46' => 0xE011D, # VARIATION SELECTOR-46 + 'VS47' => 0xE011E, # VARIATION SELECTOR-47 + 'VS48' => 0xE011F, # VARIATION SELECTOR-48 + 'VS49' => 0xE0120, # VARIATION SELECTOR-49 + 'VS50' => 0xE0121, # VARIATION SELECTOR-50 + 'VS51' => 0xE0122, # VARIATION SELECTOR-51 + 'VS52' => 0xE0123, # VARIATION SELECTOR-52 + 'VS53' => 0xE0124, # VARIATION SELECTOR-53 + 'VS54' => 0xE0125, # VARIATION SELECTOR-54 + 'VS55' => 0xE0126, # VARIATION SELECTOR-55 + 'VS56' => 0xE0127, # VARIATION SELECTOR-56 + 'VS57' => 0xE0128, # VARIATION SELECTOR-57 + 'VS58' => 0xE0129, # VARIATION SELECTOR-58 + 'VS59' => 0xE012A, # VARIATION SELECTOR-59 + 'VS60' => 0xE012B, # VARIATION SELECTOR-60 + 'VS61' => 0xE012C, # VARIATION SELECTOR-61 + 'VS62' => 0xE012D, # VARIATION SELECTOR-62 + 'VS63' => 0xE012E, # VARIATION SELECTOR-63 + 'VS64' => 0xE012F, # VARIATION SELECTOR-64 + 'VS65' => 0xE0130, # VARIATION SELECTOR-65 + 'VS66' => 0xE0131, # VARIATION SELECTOR-66 + 'VS67' => 0xE0132, # VARIATION SELECTOR-67 + 'VS68' => 0xE0133, # VARIATION SELECTOR-68 + 'VS69' => 0xE0134, # VARIATION SELECTOR-69 + 'VS70' => 0xE0135, # VARIATION SELECTOR-70 + 'VS71' => 0xE0136, # VARIATION SELECTOR-71 + 'VS72' => 0xE0137, # VARIATION SELECTOR-72 + 'VS73' => 0xE0138, # VARIATION SELECTOR-73 + 'VS74' => 0xE0139, # VARIATION SELECTOR-74 + 'VS75' => 0xE013A, # VARIATION SELECTOR-75 + 'VS76' => 0xE013B, # VARIATION SELECTOR-76 + 'VS77' => 0xE013C, # VARIATION SELECTOR-77 + 'VS78' => 0xE013D, # VARIATION SELECTOR-78 + 'VS79' => 0xE013E, # VARIATION SELECTOR-79 + 'VS80' => 0xE013F, # VARIATION SELECTOR-80 + 'VS81' => 0xE0140, # VARIATION SELECTOR-81 + 'VS82' => 0xE0141, # VARIATION SELECTOR-82 + 'VS83' => 0xE0142, # VARIATION SELECTOR-83 + 'VS84' => 0xE0143, # VARIATION SELECTOR-84 + 'VS85' => 0xE0144, # VARIATION SELECTOR-85 + 'VS86' => 0xE0145, # VARIATION SELECTOR-86 + 'VS87' => 0xE0146, # VARIATION SELECTOR-87 + 'VS88' => 0xE0147, # VARIATION SELECTOR-88 + 'VS89' => 0xE0148, # VARIATION SELECTOR-89 + 'VS90' => 0xE0149, # VARIATION SELECTOR-90 + 'VS91' => 0xE014A, # VARIATION SELECTOR-91 + 'VS92' => 0xE014B, # VARIATION SELECTOR-92 + 'VS93' => 0xE014C, # VARIATION SELECTOR-93 + 'VS94' => 0xE014D, # VARIATION SELECTOR-94 + 'VS95' => 0xE014E, # VARIATION SELECTOR-95 + 'VS96' => 0xE014F, # VARIATION SELECTOR-96 + 'VS97' => 0xE0150, # VARIATION SELECTOR-97 + 'VS98' => 0xE0151, # VARIATION SELECTOR-98 + 'VS99' => 0xE0152, # VARIATION SELECTOR-99 + 'VS100' => 0xE0153, # VARIATION SELECTOR-100 + 'VS101' => 0xE0154, # VARIATION SELECTOR-101 + 'VS102' => 0xE0155, # VARIATION SELECTOR-102 + 'VS103' => 0xE0156, # VARIATION SELECTOR-103 + 'VS104' => 0xE0157, # VARIATION SELECTOR-104 + 'VS105' => 0xE0158, # VARIATION SELECTOR-105 + 'VS106' => 0xE0159, # VARIATION SELECTOR-106 + 'VS107' => 0xE015A, # VARIATION SELECTOR-107 + 'VS108' => 0xE015B, # VARIATION SELECTOR-108 + 'VS109' => 0xE015C, # VARIATION SELECTOR-109 + 'VS110' => 0xE015D, # VARIATION SELECTOR-110 + 'VS111' => 0xE015E, # VARIATION SELECTOR-111 + 'VS112' => 0xE015F, # VARIATION SELECTOR-112 + 'VS113' => 0xE0160, # VARIATION SELECTOR-113 + 'VS114' => 0xE0161, # VARIATION SELECTOR-114 + 'VS115' => 0xE0162, # VARIATION SELECTOR-115 + 'VS116' => 0xE0163, # VARIATION SELECTOR-116 + 'VS117' => 0xE0164, # VARIATION SELECTOR-117 + 'VS118' => 0xE0165, # VARIATION SELECTOR-118 + 'VS119' => 0xE0166, # VARIATION SELECTOR-119 + 'VS120' => 0xE0167, # VARIATION SELECTOR-120 + 'VS121' => 0xE0168, # VARIATION SELECTOR-121 + 'VS122' => 0xE0169, # VARIATION SELECTOR-122 + 'VS123' => 0xE016A, # VARIATION SELECTOR-123 + 'VS124' => 0xE016B, # VARIATION SELECTOR-124 + 'VS125' => 0xE016C, # VARIATION SELECTOR-125 + 'VS126' => 0xE016D, # VARIATION SELECTOR-126 + 'VS127' => 0xE016E, # VARIATION SELECTOR-127 + 'VS128' => 0xE016F, # VARIATION SELECTOR-128 + 'VS129' => 0xE0170, # VARIATION SELECTOR-129 + 'VS130' => 0xE0171, # VARIATION SELECTOR-130 + 'VS131' => 0xE0172, # VARIATION SELECTOR-131 + 'VS132' => 0xE0173, # VARIATION SELECTOR-132 + 'VS133' => 0xE0174, # VARIATION SELECTOR-133 + 'VS134' => 0xE0175, # VARIATION SELECTOR-134 + 'VS135' => 0xE0176, # VARIATION SELECTOR-135 + 'VS136' => 0xE0177, # VARIATION SELECTOR-136 + 'VS137' => 0xE0178, # VARIATION SELECTOR-137 + 'VS138' => 0xE0179, # VARIATION SELECTOR-138 + 'VS139' => 0xE017A, # VARIATION SELECTOR-139 + 'VS140' => 0xE017B, # VARIATION SELECTOR-140 + 'VS141' => 0xE017C, # VARIATION SELECTOR-141 + 'VS142' => 0xE017D, # VARIATION SELECTOR-142 + 'VS143' => 0xE017E, # VARIATION SELECTOR-143 + 'VS144' => 0xE017F, # VARIATION SELECTOR-144 + 'VS145' => 0xE0180, # VARIATION SELECTOR-145 + 'VS146' => 0xE0181, # VARIATION SELECTOR-146 + 'VS147' => 0xE0182, # VARIATION SELECTOR-147 + 'VS148' => 0xE0183, # VARIATION SELECTOR-148 + 'VS149' => 0xE0184, # VARIATION SELECTOR-149 + 'VS150' => 0xE0185, # VARIATION SELECTOR-150 + 'VS151' => 0xE0186, # VARIATION SELECTOR-151 + 'VS152' => 0xE0187, # VARIATION SELECTOR-152 + 'VS153' => 0xE0188, # VARIATION SELECTOR-153 + 'VS154' => 0xE0189, # VARIATION SELECTOR-154 + 'VS155' => 0xE018A, # VARIATION SELECTOR-155 + 'VS156' => 0xE018B, # VARIATION SELECTOR-156 + 'VS157' => 0xE018C, # VARIATION SELECTOR-157 + 'VS158' => 0xE018D, # VARIATION SELECTOR-158 + 'VS159' => 0xE018E, # VARIATION SELECTOR-159 + 'VS160' => 0xE018F, # VARIATION SELECTOR-160 + 'VS161' => 0xE0190, # VARIATION SELECTOR-161 + 'VS162' => 0xE0191, # VARIATION SELECTOR-162 + 'VS163' => 0xE0192, # VARIATION SELECTOR-163 + 'VS164' => 0xE0193, # VARIATION SELECTOR-164 + 'VS165' => 0xE0194, # VARIATION SELECTOR-165 + 'VS166' => 0xE0195, # VARIATION SELECTOR-166 + 'VS167' => 0xE0196, # VARIATION SELECTOR-167 + 'VS168' => 0xE0197, # VARIATION SELECTOR-168 + 'VS169' => 0xE0198, # VARIATION SELECTOR-169 + 'VS170' => 0xE0199, # VARIATION SELECTOR-170 + 'VS171' => 0xE019A, # VARIATION SELECTOR-171 + 'VS172' => 0xE019B, # VARIATION SELECTOR-172 + 'VS173' => 0xE019C, # VARIATION SELECTOR-173 + 'VS174' => 0xE019D, # VARIATION SELECTOR-174 + 'VS175' => 0xE019E, # VARIATION SELECTOR-175 + 'VS176' => 0xE019F, # VARIATION SELECTOR-176 + 'VS177' => 0xE01A0, # VARIATION SELECTOR-177 + 'VS178' => 0xE01A1, # VARIATION SELECTOR-178 + 'VS179' => 0xE01A2, # VARIATION SELECTOR-179 + 'VS180' => 0xE01A3, # VARIATION SELECTOR-180 + 'VS181' => 0xE01A4, # VARIATION SELECTOR-181 + 'VS182' => 0xE01A5, # VARIATION SELECTOR-182 + 'VS183' => 0xE01A6, # VARIATION SELECTOR-183 + 'VS184' => 0xE01A7, # VARIATION SELECTOR-184 + 'VS185' => 0xE01A8, # VARIATION SELECTOR-185 + 'VS186' => 0xE01A9, # VARIATION SELECTOR-186 + 'VS187' => 0xE01AA, # VARIATION SELECTOR-187 + 'VS188' => 0xE01AB, # VARIATION SELECTOR-188 + 'VS189' => 0xE01AC, # VARIATION SELECTOR-189 + 'VS190' => 0xE01AD, # VARIATION SELECTOR-190 + 'VS191' => 0xE01AE, # VARIATION SELECTOR-191 + 'VS192' => 0xE01AF, # VARIATION SELECTOR-192 + 'VS193' => 0xE01B0, # VARIATION SELECTOR-193 + 'VS194' => 0xE01B1, # VARIATION SELECTOR-194 + 'VS195' => 0xE01B2, # VARIATION SELECTOR-195 + 'VS196' => 0xE01B3, # VARIATION SELECTOR-196 + 'VS197' => 0xE01B4, # VARIATION SELECTOR-197 + 'VS198' => 0xE01B5, # VARIATION SELECTOR-198 + 'VS199' => 0xE01B6, # VARIATION SELECTOR-199 + 'VS200' => 0xE01B7, # VARIATION SELECTOR-200 + 'VS201' => 0xE01B8, # VARIATION SELECTOR-201 + 'VS202' => 0xE01B9, # VARIATION SELECTOR-202 + 'VS203' => 0xE01BA, # VARIATION SELECTOR-203 + 'VS204' => 0xE01BB, # VARIATION SELECTOR-204 + 'VS205' => 0xE01BC, # VARIATION SELECTOR-205 + 'VS206' => 0xE01BD, # VARIATION SELECTOR-206 + 'VS207' => 0xE01BE, # VARIATION SELECTOR-207 + 'VS208' => 0xE01BF, # VARIATION SELECTOR-208 + 'VS209' => 0xE01C0, # VARIATION SELECTOR-209 + 'VS210' => 0xE01C1, # VARIATION SELECTOR-210 + 'VS211' => 0xE01C2, # VARIATION SELECTOR-211 + 'VS212' => 0xE01C3, # VARIATION SELECTOR-212 + 'VS213' => 0xE01C4, # VARIATION SELECTOR-213 + 'VS214' => 0xE01C5, # VARIATION SELECTOR-214 + 'VS215' => 0xE01C6, # VARIATION SELECTOR-215 + 'VS216' => 0xE01C7, # VARIATION SELECTOR-216 + 'VS217' => 0xE01C8, # VARIATION SELECTOR-217 + 'VS218' => 0xE01C9, # VARIATION SELECTOR-218 + 'VS219' => 0xE01CA, # VARIATION SELECTOR-219 + 'VS220' => 0xE01CB, # VARIATION SELECTOR-220 + 'VS221' => 0xE01CC, # VARIATION SELECTOR-221 + 'VS222' => 0xE01CD, # VARIATION SELECTOR-222 + 'VS223' => 0xE01CE, # VARIATION SELECTOR-223 + 'VS224' => 0xE01CF, # VARIATION SELECTOR-224 + 'VS225' => 0xE01D0, # VARIATION SELECTOR-225 + 'VS226' => 0xE01D1, # VARIATION SELECTOR-226 + 'VS227' => 0xE01D2, # VARIATION SELECTOR-227 + 'VS228' => 0xE01D3, # VARIATION SELECTOR-228 + 'VS229' => 0xE01D4, # VARIATION SELECTOR-229 + 'VS230' => 0xE01D5, # VARIATION SELECTOR-230 + 'VS231' => 0xE01D6, # VARIATION SELECTOR-231 + 'VS232' => 0xE01D7, # VARIATION SELECTOR-232 + 'VS233' => 0xE01D8, # VARIATION SELECTOR-233 + 'VS234' => 0xE01D9, # VARIATION SELECTOR-234 + 'VS235' => 0xE01DA, # VARIATION SELECTOR-235 + 'VS236' => 0xE01DB, # VARIATION SELECTOR-236 + 'VS237' => 0xE01DC, # VARIATION SELECTOR-237 + 'VS238' => 0xE01DD, # VARIATION SELECTOR-238 + 'VS239' => 0xE01DE, # VARIATION SELECTOR-239 + 'VS240' => 0xE01DF, # VARIATION SELECTOR-240 + 'VS241' => 0xE01E0, # VARIATION SELECTOR-241 + 'VS242' => 0xE01E1, # VARIATION SELECTOR-242 + 'VS243' => 0xE01E2, # VARIATION SELECTOR-243 + 'VS244' => 0xE01E3, # VARIATION SELECTOR-244 + 'VS245' => 0xE01E4, # VARIATION SELECTOR-245 + 'VS246' => 0xE01E5, # VARIATION SELECTOR-246 + 'VS247' => 0xE01E6, # VARIATION SELECTOR-247 + 'VS248' => 0xE01E7, # VARIATION SELECTOR-248 + 'VS249' => 0xE01E8, # VARIATION SELECTOR-249 + 'VS250' => 0xE01E9, # VARIATION SELECTOR-250 + 'VS251' => 0xE01EA, # VARIATION SELECTOR-251 + 'VS252' => 0xE01EB, # VARIATION SELECTOR-252 + 'VS253' => 0xE01EC, # VARIATION SELECTOR-253 + 'VS254' => 0xE01ED, # VARIATION SELECTOR-254 + 'VS255' => 0xE01EE, # VARIATION SELECTOR-255 + 'VS256' => 0xE01EF, # VARIATION SELECTOR-256 + 'WJ' => 0x2060, # WORD JOINER + 'ZWJ' => 0x200D, # ZERO WIDTH JOINER + 'ZWNJ' => 0x200C, # ZERO WIDTH NON-JOINER + 'ZWSP' => 0x200B, # ZERO WIDTH SPACE + ); my %alias2 = ( - # Pre-3.2 compatibility (only for the first 256 characters). - 'HORIZONTAL TABULATION' => 'CHARACTER TABULATION', - 'VERTICAL TABULATION' => 'LINE TABULATION', - 'FILE SEPARATOR' => 'INFORMATION SEPARATOR FOUR', - 'GROUP SEPARATOR' => 'INFORMATION SEPARATOR THREE', - 'RECORD SEPARATOR' => 'INFORMATION SEPARATOR TWO', - 'UNIT SEPARATOR' => 'INFORMATION SEPARATOR ONE', - 'PARTIAL LINE DOWN' => 'PARTIAL LINE FORWARD', - 'PARTIAL LINE UP' => 'PARTIAL LINE BACKWARD', - ); + # Pre-3.2 compatibility (only for the first 256 characters). + # Use of these gives deprecated message. + 'HORIZONTAL TABULATION' => 0x09, # CHARACTER TABULATION + 'VERTICAL TABULATION' => 0x0B, # LINE TABULATION + 'FILE SEPARATOR' => 0x1C, # INFORMATION SEPARATOR FOUR + 'GROUP SEPARATOR' => 0x1D, # INFORMATION SEPARATOR THREE + 'RECORD SEPARATOR' => 0x1E, # INFORMATION SEPARATOR TWO + 'UNIT SEPARATOR' => 0x1F, # INFORMATION SEPARATOR ONE + 'HORIZONTAL TABULATION SET' => 0x88, # CHARACTER TABULATION SET + 'HORIZONTAL TABULATION WITH JUSTIFICATION' => 0x89, # CHARACTER TABULATION WITH JUSTIFICATION + 'PARTIAL LINE DOWN' => 0x8B, # PARTIAL LINE FORWARD + 'PARTIAL LINE UP' => 0x8C, # PARTIAL LINE BACKWARD + 'VERTICAL TABULATION SET' => 0x8A, # LINE TABULATION SET + 'REVERSE INDEX' => 0x8D, # REVERSE LINE FEED + ); my %alias3 = ( - # User defined aliasses. Even more convenient :) - ); + # User defined aliases. Even more convenient :) + ); my $txt; sub croak @@ -86,27 +448,28 @@ sub alias_file ($) sub charnames { my $name = shift; + my $ord; + my $fname; - if (exists $alias1{$name}) { - $name = $alias1{$name}; + if (exists $alias3{$name}) { # User alias should be checked first, or else + # can't override ours, and if we add any, + # could conflict with theirs. + $name = $alias3{$name}; + } + elsif (exists $alias1{$name}) { + $ord = $alias1{$name}; + $fname = $name; } elsif (exists $alias2{$name}) { require warnings; - warnings::warnif('deprecated', qq{Unicode character name "$name" is deprecated, use "$alias2{$name}" instead}); - $name = $alias2{$name}; - } - elsif (exists $alias3{$name}) { - $name = $alias3{$name}; + warnings::warnif('deprecated', "Unicode character name \"$name\" is deprecated, use \"" . viacode($alias2{$name}) . "\" instead"); + $ord = $alias2{$name}; + $fname = $name; } - my $ord; my @off; - my $fname; - if ($name eq "BYTE ORDER MARK") { - $fname = $name; - $ord = 0xFEFF; - } else { + if (! defined $ord) { ## Suck in the code/name list as a big string. ## Lines look like: ## "0052\t\tLATIN CAPITAL LETTER R\n" @@ -347,7 +710,8 @@ charnames - define character names for C<\N{named}> string literal escapes use charnames (); print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE" - printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints "10330" + printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints + # "10330" =head1 DESCRIPTION @@ -359,7 +723,8 @@ C has the form C, then C is looked up as a letter in script C Githubissues.
  • Githubissues is a development platform for aggregating issues.