Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

Problem with Unicode class \p{ccc} and a proposed fix #9666

Closed p5pRT closed 14 years ago

p5pRT commented 15 years ago

Migrated from rt.perl.org#63550 (status was 'resolved')

Searchable as RT63550$

p5pRT commented 15 years ago

From john.imrie@vodafoneemail.co.uk

Created by john.imrie@vodafoneemail.co.uk

This is a bug report for perl from john.imrie@​vodafoneemail.co.uk\, generated with the help of perlbug 1.35 running under perl v5.8.8.

----------------------------------------------------------------- Currently the Unicode character class \p{ccc} either dies when you use a numeric code of does not correctly match when you use a letter code.

The diffs that follow

a) Allow numeric codes to correctly work so that \p{ccc=0} and   \p{ccc=000} both work as expected.

and

b) Allow both numeric and alphabetic codes match as expected

Inline Patch ```diff --- /usr/lib/perl5/5.8.8/utf8_heavy.pl 2009-02-27 17:26:09.000000000 +0000 +++ utf8_heavy.pl 2009-02-27 17:23:22.000000000 +0000 @@ -84,9 +84,11 @@ $val =~ tr/ _-//d; my $pa = $PropertyAlias{$enum} ? $enum : $PA_reverse{$enum}; - my $f = $PropValueAlias{$pa}{$val} ? $val : ```

$PVA_reverse{$pa}{lc $val}; + $val+=0 if $val=~/^\d+$/; + my $f = defined $PropValueAlias{$pa}{$val} ? $val : $PVA_reverse{$pa}{lc $val};

- if ($pa and $f) { + if ($pa and defined $f) { + $f+=0 if $f=~/^\d+$/;   $pa = "gc_sc" if $pa eq "gc" or $pa eq "sc";   $file = "unicore/lib/$pa/$PVA_abbr_map{$pa}{lc $f}.pl";   last GETFILE;

--- /usr/lib/perl5/5.8.8/unicore/mktables 2009-02-19 17​:50​:16.000000000 +0000 +++ mktables 2009-02-27 17​:09​:23.000000000 +0000 @​@​ -288\,6 +288\,11 @​@​   if ($prop eq 'ccc') {   $PropValueAlias{$prop}{$data[1]} = [ @​data[0\,2] ];   $PVA_reverse{$prop}{$data[2]} = [ @​data[0\,1] ]; + # Fixup for numeric CCC + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[1]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[0]} = $data[0];   }   else {   next if $data[0] eq "n/a"; @​@​ -302\,6 +307\,7 @​@​   $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1];   $utf8​::PVA_reverse{$prop}{lc $data[1]} = $data[0];

+ next if $prop = 'ccc';   my $abbr_class = ($prop eq 'gc' or $prop eq 'sc') ? 'gc_sc' : $prop;   $utf8​::PVA_abbr_map{$abbr_class}{lc $data[0]} = $data[0];   } @​@​ -775\,7 +781\,6 @​@​ {   my $Bidi = Table->New();   my $Deco = Table->New(); - my $Comb = Table->New();   my $Number = Table->New();   my $Mirrored = Table->New();#Is => 'Mirrored'\,   #Desc => "Mirrored in bidirectional text"\, @​@​ -784\,6 +789\,7 @​@​   my %DC;   my %Bidi;   my %Number; + my %Comb;   $DC{can} = Table->New();   $DC{com} = Table->New();

@​@​ -983\,7 +989\,12 @​@​   $To{Digit}->Append($code\, $decimal) if length $decimal;

  $Bidi->Append($code\, $bidi); - $Comb->Append($code\, $comb) if $comb; + # Fixup for CCC + if (defined $comb) { # $comb can be 0 + $Comb{$comb} ||= Table->New(); + $Comb{$comb}->Append($code) + } +   $Number->Append($code\, $number) if length $number;

  length($decimal) and ($Number{De} ||= Table->New())->Append($code) @​@​ -1125\,13 +1136\,11 @​@​   );   }

- $Comb->Write("CombiningClass.pl"); - for (keys %{ $PropValueAlias{ccc} }) { - my ($code\, $name) = @​{ $PropValueAlias{ccc}{$_} }; - (my $c = Table->New())->Append($code); - $c->Write( + # $Comb->Write("CombiningClass.pl"); + for (keys %Comb) { + $Comb{$_}->Write(   ["lib"\,"ccc"\,"$_.pl"]\, - "CombiningClass category '$name'" + "CombiningClass category '$_'"   );   }

John Imrie

Perl Info ``` Flags: category=core severity=high Site configuration information for perl v5.8.8: Configured by Gentoo at Thu Feb 19 17:44:25 GMT 2009. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.22-gentoo-r9, archname=i686-linux uname='linux john 2.6.22-gentoo-r9 #2 smp sat apr 26 07:28:23 bst 2008 i686 intel(r) core(tm)2 cpu 6400 @ 2.13ghz genuineintel gnulinu x ' config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorpref ix=/usr -Dsiteprefix=/usr -Dlocincpth= -Doptimize=-O2 -march=i686 -pipe -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/ usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dm an3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux 5.8.5 5.8.5/i686-linux 5.8.6 5.8.6/i686 -linux 5.8.7 5.8.7/i686-linux -Dcf_by=Gentoo -Ud_csh -Dusenm -Di_ndbm -Di_gdbm -Di_db' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='i686-pc-linux-gnu-gcc', ccflags ='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS =64 -I/usr/include/gdbm', optimize='-O2 -march=i686 -pipe', cppflags='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/include/gdbm' ccversion='', gccversion='4.1.2 (Gentoo 4.1.2 p1.3)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.6.1.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.6.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.8: /etc/perl /usr/lib/perl5/vendor_perl/5.8.8/i686-linux /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/site_perl/5.8.8/i686-linux /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/5.8.8/i686-linux /usr/lib/perl5/5.8.8 /usr/local/lib/site_perl . Environment for perl v5.8.8: HOME=/home/john LANG=en_GB.UTF-8 LANGUAGE (unset) LC_ALL=en_GB.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/kde/3.5/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.1.2:/opt/blackdown-jdk-1.4.2.03/bin:/opt/blackdown-jdk-1.4.2.03/jre/bin:/usr/qt/3/bin:/usr/games/bin:/opt/real/RealPlayer PERL_BADLANG (unset) SHELL=/bin/bash ______________________________________________ This email has been scanned by Netintelligence http://www.netintelligence.com/email ```
p5pRT commented 15 years ago

From john.imrie@vodafoneemail.co.uk

Patch for mktables is incorrect.

Corrected patch follows

--- /usr/lib/perl5/5.8.8/unicore/mktables 2009-02-19 17​:50​:16.000000000 +0000 +++ mktables 2009-02-27 17​:09​:23.000000000 +0000 @​@​ -288\,6 +288\,11 @​@​ if ($prop eq 'ccc') { $PropValueAlias{$prop}{$data[1]} = [ @​data[0\,2] ]; $PVA_reverse{$prop}{$data[2]} = [ @​data[0\,1] ]; + # Fixup for numeric CCC + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[1]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[0]} = $data[0]; } else { next if $data[0] eq "n/a"; @​@​ -302\,6 +307\,7 @​@​ $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; $utf8​::PVA_reverse{$prop}{lc $data[1]} = $data[0];

+ next if $prop eq 'ccc'; my $abbr_class = ($prop eq 'gc' or $prop eq 'sc') ? 'gc_sc' : $prop; $utf8​::PVA_abbr_map{$abbr_class}{lc $data[0]} = $data[0]; } @​@​ -775\,7 +781\,6 @​@​ { my $Bidi = Table->New(); my $Deco = Table->New(); - my $Comb = Table->New(); my $Number = Table->New(); my $Mirrored = Table->New();#Is => 'Mirrored'\, #Desc => "Mirrored in bidirectional text"\, @​@​ -784\,6 +789\,7 @​@​ my %DC; my %Bidi; my %Number; + my %Comb; $DC{can} = Table->New(); $DC{com} = Table->New();

@​@​ -983\,7 +989\,12 @​@​ $To{Digit}->Append($code\, $decimal) if length $decimal;

$Bidi->Append($code\, $bidi); - $Comb->Append($code\, $comb) if $comb; + # Fixup for CCC + if (defined $comb) { # $comb can be 0 + $Comb{$comb} ||= Table->New(); + $Comb{$comb}->Append($code) + } + $Number->Append($code\, $number) if length $number;

length($decimal) and ($Number{De} ||= Table->New())->Append($code) @​@​ -1125\,13 +1136\,11 @​@​ ); }

- $Comb->Write("CombiningClass.pl"); - for (keys %{ $PropValueAlias{ccc} }) { - my ($code\, $name) = @​{ $PropValueAlias{ccc}{$_} }; - (my $c = Table->New())->Append($code); - $c->Write( + # $Comb->Write("CombiningClass.pl"); + for (keys %Comb) { + $Comb{$_}->Write( ["lib"\,"ccc"\,"$_.pl"]\, - "CombiningClass category '$name'" + "CombiningClass category '$_'" ); }

______________________________________________
This email has been scanned by Netintelligence
http​://www.netintelligence.com/email

p5pRT commented 15 years ago

From @khwilliamson

John wrote​:

Patch for mktables is incorrect.

Corrected patch follows

--- /usr/lib/perl5/5.8.8/unicore/mktables 2009-02-19 17​:50​:16.000000000 +0000 +++ mktables 2009-02-27 17​:09​:23.000000000 +0000 @​@​ -288\,6 +288\,11 @​@​ if ($prop eq 'ccc') { $PropValueAlias{$prop}{$data[1]} = [ @​data[0\,2] ]; $PVA_reverse{$prop}{$data[2]} = [ @​data[0\,1] ]; + # Fixup for numeric CCC + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[1]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[0]} = $data[0]; } else { next if $data[0] eq "n/a"; @​@​ -302\,6 +307\,7 @​@​ $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; $utf8​::PVA_reverse{$prop}{lc $data[1]} = $data[0];

+ next if $prop eq 'ccc'; my $abbr_class = ($prop eq 'gc' or $prop eq 'sc') ? 'gc_sc' : $prop; $utf8​::PVA_abbr_map{$abbr_class}{lc $data[0]} = $data[0]; } @​@​ -775\,7 +781\,6 @​@​ { my $Bidi = Table->New(); my $Deco = Table->New(); - my $Comb = Table->New(); my $Number = Table->New(); my $Mirrored = Table->New();#Is => 'Mirrored'\, #Desc => "Mirrored in bidirectional text"\, @​@​ -784\,6 +789\,7 @​@​ my %DC; my %Bidi; my %Number; + my %Comb; $DC{can} = Table->New(); $DC{com} = Table->New();

@​@​ -983\,7 +989\,12 @​@​ $To{Digit}->Append($code\, $decimal) if length $decimal;

$Bidi->Append($code\, $bidi); - $Comb->Append($code\, $comb) if $comb; + # Fixup for CCC + if (defined $comb) { # $comb can be 0 + $Comb{$comb} ||= Table->New(); + $Comb{$comb}->Append($code) + } + $Number->Append($code\, $number) if length $number;

length($decimal) and ($Number{De} ||= Table->New())->Append($code) @​@​ -1125\,13 +1136\,11 @​@​ ); }

- $Comb->Write("CombiningClass.pl"); - for (keys %{ $PropValueAlias{ccc} }) { - my ($code\, $name) = @​{ $PropValueAlias{ccc}{$_} }; - (my $c = Table->New())->Append($code); - $c->Write( + # $Comb->Write("CombiningClass.pl"); + for (keys %Comb) { + $Comb{$_}->Write( ["lib"\,"ccc"\,"$_.pl"]\, - "CombiningClass category '$name'" + "CombiningClass category '$_'" ); }

______________________________________________ This email has been scanned by Netintelligence http​://www.netintelligence.com/email

FYI\,

There are a number of problems in mktables besides the ccc ones. I've been working on revamping mktables to correct all these\, and expect to finish it in a week.

Note that some ccc values have no names\, but should be referrable in re's\, hence the file names should be something like 0.pl\, 240.pl

p5pRT commented 15 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 15 years ago

From @khwilliamson

John wrote​:

Patch for mktables is incorrect.

Corrected patch follows

--- /usr/lib/perl5/5.8.8/unicore/mktables 2009-02-19 17​:50​:16.000000000 +0000 +++ mktables 2009-02-27 17​:09​:23.000000000 +0000 @​@​ -288\,6 +288\,11 @​@​ if ($prop eq 'ccc') { $PropValueAlias{$prop}{$data[1]} = [ @​data[0\,2] ]; $PVA_reverse{$prop}{$data[2]} = [ @​data[0\,1] ]; + # Fixup for numeric CCC + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; + $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[1]} = $data[0]; + $utf8​::PVA_abbr_map{$prop}{lc $data[0]} = $data[0]; } else { next if $data[0] eq "n/a"; @​@​ -302\,6 +307\,7 @​@​ $utf8​::PropValueAlias{$prop}{lc $data[0]} = $data[1]; $utf8​::PVA_reverse{$prop}{lc $data[1]} = $data[0];

+ next if $prop eq 'ccc'; my $abbr_class = ($prop eq 'gc' or $prop eq 'sc') ? 'gc_sc' : $prop; $utf8​::PVA_abbr_map{$abbr_class}{lc $data[0]} = $data[0]; } @​@​ -775\,7 +781\,6 @​@​ { my $Bidi = Table->New(); my $Deco = Table->New(); - my $Comb = Table->New(); my $Number = Table->New(); my $Mirrored = Table->New();#Is => 'Mirrored'\, #Desc => "Mirrored in bidirectional text"\, @​@​ -784\,6 +789\,7 @​@​ my %DC; my %Bidi; my %Number; + my %Comb; $DC{can} = Table->New(); $DC{com} = Table->New();

@​@​ -983\,7 +989\,12 @​@​ $To{Digit}->Append($code\, $decimal) if length $decimal;

$Bidi->Append($code\, $bidi); - $Comb->Append($code\, $comb) if $comb; + # Fixup for CCC + if (defined $comb) { # $comb can be 0 + $Comb{$comb} ||= Table->New(); + $Comb{$comb}->Append($code) + } + $Number->Append($code\, $number) if length $number;

length($decimal) and ($Number{De} ||= Table->New())->Append($code) @​@​ -1125\,13 +1136\,11 @​@​ ); }

- $Comb->Write("CombiningClass.pl"); - for (keys %{ $PropValueAlias{ccc} }) { - my ($code\, $name) = @​{ $PropValueAlias{ccc}{$_} }; - (my $c = Table->New())->Append($code); - $c->Write( + # $Comb->Write("CombiningClass.pl"); + for (keys %Comb) { + $Comb{$_}->Write( ["lib"\,"ccc"\,"$_.pl"]\, - "CombiningClass category '$name'" + "CombiningClass category '$_'" ); }

______________________________________________ This email has been scanned by Netintelligence http​://www.netintelligence.com/email

FYI\,

There are a number of problems in mktables besides the ccc ones. I've been working on revamping mktables to correct all these\, and expect to finish it in a week.

Note that some ccc values have no names\, but should be referrable in re's\, hence the file names should be something like 0.pl\, 240.pl

p5pRT commented 15 years ago

From john.imrie@vodafoneemail.co.uk

FYI\,

There are a number of problems in mktables besides the ccc ones. I've been working on revamping mktables to correct all these\, and expect to finish it in a week.

Note that some ccc values have no names\, but should be referrable in re's\, hence the file names should be something like 0.pl\, 240.pl

______________________________________________ This email has been scanned by Netintelligence
http​://www.netintelligence.com/email

Karl\,

Are these changes going to have an impact on Perl 5.10.0

The reason I ask is; I am working on getting the Common Local Data Repository (CLDR) http​://unicode.org/cldr/ into Perl and the CLDR requires some properties listed in the auxiliary directory of the Unicode 5.0 spec. So will your improvements include that and the extracted directory.

John

PS if you are interested in the CLDR my code is currently publicly
available at http​://github.com/ThePilgrim/perlcldr/tree/master

______________________________________________
This email has been scanned by Netintelligence
http​://www.netintelligence.com/email

p5pRT commented 15 years ago

From @khwilliamson

John wrote​:

FYI\,

There are a number of problems in mktables besides the ccc ones. I've been working on revamping mktables to correct all these\, and expect to finish it in a week.

Note that some ccc values have no names\, but should be referrable in re's\, hence the file names should be something like 0.pl\, 240.pl

______________________________________________ This email has been scanned by Netintelligence
http​://www.netintelligence.com/email

Karl\,

Are these changes going to have an impact on Perl 5.10.0

The reason I ask is; I am working on getting the Common Local Data Repository (CLDR) http​://unicode.org/cldr/ into Perl and the CLDR requires some properties listed in the auxiliary directory of the Unicode 5.0 spec. So will your improvements include that and the extracted directory.

John

PS if you are interested in the CLDR my code is currently publicly
available at http​://github.com/ThePilgrim/perlcldr/tree/master

______________________________________________ This email has been scanned by Netintelligence http​://www.netintelligence.com/email

I'm working to get all the Unicode 5.1 database files (not the test nor documentation ones) processed by mktables\, including those in the auxiliary and extracted subdirectories\, but not including the Unihan\, for which there is a CPAN module.

Since Perl 5.10.0 has already been released\, this wouldn't affect it\, but one could use this to transparently change the tables it uses in any given installation. I hope that this would be considered for inclusion in 5.10.1.

Below are code and comments that I've added to my working version of mktables that describe in detail the properties and files that aren't fully processed (I apologize for the email's folding these)​:

# The following are properties that are in the files that we process\, but we # don't use them. The reasons are in the comments my @​skipped_properties = (   qr/^FC_NFKC$/\, # Unimplemented\, but in Unicode​::Normalize   qr/^Grapheme_Link$/\, # Deprecated by Unicode   qr/^Other_/\, # These are used by Unicode for constructing   # other properties\, and should not be exposed );

# Below are the properties that aren't fully accessible through the Perl core. # All the binary (True or False) properties are considered to be fully # accessible through regular expression property matching (\p{XX}) (so don't # appear here). Many of the rest are partially accessible through that # mechanism\, and some fully through library modules. There are several that # are accessible through .pl files that this script creates (but which as of # this writing aren't documented). The comments give the accessibility

my @​ignored_properties = (   'Age'\, # But \p{age​:XX} works   'Bidi_Class'\, # But can access through Unicode​::UCD 'charinfo'\,   # and \p{bc​:XX} works   'Bidi_Mirroring_Glyph'\, # Unimplemented   'Block'\, # But can access through   # Unicode​::UCD 'charblock'\, and \p{IsBLOCK}   # works   'Canonical_Combining_Class'\, # But can access through   # Unicode​::UCD 'charinfo'\, and \p{ccc​:XX} works   'Case_Folding'\, # But can access through   # Unicode​::UCD 'casefold'\, and /RE/i works   'Decomposition_Mapping'\, # But can access through Unicode​::UCD 'charinfo'   # and furnished in Decomposition.pl   'Decomposition_Type'\, # But can access through Unicode​::UCD 'charinfo'   # and furnished in Decomposition.pl\, and   # \p{dt​:XX} works   'East_Asian_Width'\, # But \p{ea​:XX} works   'General_Category'\, # But can access through   # Unicode​::UCD 'charinfo'\, and /\p{IsCATEGORY}/   # works   'Grapheme_Cluster_Break'\, # But \p{gcb​:XX} works   'Hangul_Syllable_Type'\, # But \p{hst​:XX} works   'ISO_Comment'\, # But can access through Unicode​::UCD 'charinfo'   'Joining_Group'\, # But \p{jg​:XX} works   'Joining_Type'\, # But \p{jt​:XX} works   'Line_Break'\, # But \p{lb​:XX} works   'Lowercase_Mapping'\, # But can access through lc() and   # Unicode​::UCD 'charinfo'   'Name'\, # But can access through Unicode​::UCD 'charinfo' and   # Name.pl\, and inverse through \N{}   'NFC_Quick_Check'\, # But can access through Unicode​::Normalize checkNFC   'NFD_Quick_Check'\, # But can access through Unicode​::Normalize checkNFD   'NFKC_Quick_Check'\, # But can access through Unicode​::Normalize checkNFKC   'NFKD_Quick_Check'\, # But can access through Unicode​::Normalize checkNFKD   'Numeric_Type'\, # But can access through Unicode​::UCD 'charinfo'\, and   # \p{nt​:XX} works   'Numeric_Value'\, # But can access through Unicode​::UCD 'charinfo'   'Script'\, # But can access through Unicode​::UCD 'charscript'\, and   # \p{InSCRIPT} works   'Sentence_Break'\, # But \p{sb​:XX} works

  # For all the 'Simple_XXX' properties\, Perl uses the non-Simple mapping   # internally for things like lc()

  'Simple_Case_Folding'\, # But can access through Unicode​::UCD 'casefold'   'Simple_Lowercase_Mapping'\, # But can access through Unicode​::UCD 'charinfo'   'Simple_Titlecase_Mapping'\, # But can access through Unicode​::UCD 'charinfo'   'Simple_Uppercase_Mapping'\, # But can access through Unicode​::UCD 'charinfo'

  'Titlecase_Mapping'\, # But can access through ucfirst() and   # Unicode​::UCD 'charinfo'   'Unicode_1_Name'\, # But can access through Unicode​::UCD 'charinfo'   'Unicode_Radical_Stroke'\, # Unimplemented\, but is in CPAN​: Unicode​::Unihan   'Uppercase_Mapping'\, # But can access through uc() and   # Unicode​::UCD 'charinfo'   'Word_Break'\, # But \p{XXX} works );

# Below are files that Unicode furnishes\, but this program ignores.

my @​ignored_files = (   'ArabicShaping.txt'\, # Unimplemented\, but derived file gives \p access   'BidiMirroring.txt'\, # For glyph rendering.   'EastAsianWidth.txt'\, # Unimplemented\, but derived file gives \p access   'Index.txt'\, # An index for UnicodeData.txt   'LineBreak.txt'\, # Unimplemented\, but derived file gives \p access   'NamedSequences.txt'\, # Unimplemented\, but can be accessed through   # Unicode​::UCD 'namedseq'   'NamedSqProv.txt'\, # Not officially part of the Unicode standard   'NamesList.txt'\, # Just adds commentary   'NormalizationCorrections.txt'\, # Data is already in other files.   'ReadMe.txt'\, # Just comments   'StandardizedVariants.txt'\, # Only for glyph changes );

p5pRT commented 14 years ago

From @obra

Resolved per \4B1C2CFB\.9020002@​khwilliamson\.com from Karl Williamson

p5pRT commented 14 years ago

From @obra

Resolved per \4B1C2CFB\.9020002@​khwilliamson\.com from Karl Williamson

p5pRT commented 14 years ago

@obra - Status changed from 'open' to 'resolved'