Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

[PATCH] Convert some files from Latin-1 to UTF-8 #11637

Closed p5pRT closed 13 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#98666 (status was 'resolved')

Searchable as RT98666$

p5pRT commented 13 years ago

From keithsthompson@gmail.com

As I discussed a few weeks ago with Father Chrysostomos\, I volunteered to submit a patch converting Latin-1 files in the Perl source tree to UTF-8. I've just finished the patch (attached).

For a lot of the files\, this was straightforward; most of the changes involved either copyright symbols or Lord of the Rings quotations in header comments.

I do have some doubts about some of the changes I made\, and I encourage anyone who's more familiar with this than I am to review the changes I've made. In particular\, I'm not sure about the files under the cpan/ directory; I suppose there needs to be some coordination with the corresponding sources in CPAN itself; I don't know how that works. Perhaps it woudl be easier to leave those changes out.

There are a number of files that have encodings other than UTF-8 that I didn't touch\, mostly because there seem to be specific requirements to use those encodings. The files are listed in the attached file "skipped.txt".

This patch did cause one test failure\, in porting/cmp_version. Running the test by itself shows the following​:

# diff --git a/ext/attributes/attributes.xs b/ext/attributes/attributes.xs # index 24f5f61..3900c36 100644 # --- a/ext/attributes/attributes.xs # +++ b/ext/attributes/attributes.xs # @​@​ -12\,7 +12\,7 @​@​ # * 'Perilous to us all are the devices of an art deeper than we possess # * ourselves.' --Gandalf # * # - * [p.597 of _The Lord of the Rings_\, III/xi​: "The Palant�r"] # + * [p.597 of _The Lord of the Rings_\, III/xi​: "The Palantír"] # */ # # #define PERL_NO_GET_CONTEXT not ok 25 - ext/attributes/attributes.pm

It's complaining about the change in the "attributes/attributes.xs" file\, which should be resolved if this patch is committed.

-- Keith Thompson \Keith\.S\.Thompson@​gmail\.com

p5pRT commented 13 years ago

From keithsthompson@gmail.com

README.cn README.jp README.ko cpan/CGI/t/html.t cpan/CGI/t/upload_post_text.txt cpan/Encode/lib/Encode/CJKConstants.pm cpan/Encode/lib/Encode/JP/H2Z.pm cpan/Encode/t/Mod_EUCJP.pm cpan/Encode/t/at-cn.t cpan/Encode/t/at-tw.t cpan/Encode/t/big5-eten.enc cpan/Encode/t/big5-hkscs.enc cpan/Encode/t/enc_data.t cpan/Encode/t/enc_module.enc cpan/Encode/t/enc_module.t cpan/Encode/t/gb2312.enc cpan/Encode/t/jisx0201.enc cpan/Encode/t/jisx0208.enc cpan/Encode/t/jisx0212.enc cpan/Encode/t/jperl.t cpan/Encode/t/ksc5601.enc cpan/Encode/t/mime_header_iso2022jp.t cpan/PerlIO-via-QuotedPrint/t/QuotedPrint.t cpan/Pod-Parser/lib/Pod/Checker.pm cpan/Pod-Simple/t/corpus/8859_7.pod cpan/Pod-Simple/t/corpus/cp1256.txt cpan/Pod-Simple/t/corpus/fet_cont.txt cpan/Pod-Simple/t/corpus/fet_dup.txt cpan/Pod-Simple/t/corpus/iso6.txt cpan/Pod-Simple/t/corpus/koi8r.txt cpan/Pod-Simple/t/corpus/laozi38.txt cpan/Pod-Simple/t/corpus/laozi38b.txt cpan/Pod-Simple/t/corpus/laozi38p.pod cpan/Pod-Simple/t/corpus/lat1fr.txt cpan/Pod-Simple/t/corpus/lat1frim.txt cpan/Pod-Simple/t/corpus/pasternak_cp1251.txt cpan/Pod-Simple/t/corpus/s2763_sjis.txt cpan/Pod-Simple/t/corpus/thai_iso11.txt cpan/Pod-Simple/t/corpus2/fiqhakbar_iso6.txt cpan/Pod-Simple/t/encod02.t cpan/Pod-Simple/t/pulltitl.t cpan/Pod-Simple/t/testlib1/Zonk/Pronk.pm cpan/Sys-Syslog/win32/PerlLog.mc cpan/Unicode-Collate/t/loc_test.t cpan/podlators/t/man.t dist/Storable/t/utf8hash.t lib/utf8.t t/io/utf8.t t/lib/locale/latin1 t/lib/warnings/utf8 t/op/lc.t t/op/utfhash.t t/uni/greek.t t/uni/latin2.t t/uni/tr_eucjp.t t/uni/tr_sjis.t

p5pRT commented 13 years ago

From keithsthompson@gmail.com

0001-Convert-some-files-from-Latin-1-to-UTF-8.patch ```diff From 50b6ea28118a335a25be0c4a771e32f2d6c9f760 Mon Sep 17 00:00:00 2001 From: Keith Thompson Date: Mon, 5 Sep 2011 16:37:46 -0700 Subject: [PATCH] Convert some files from Latin-1 to UTF-8 --- NetWare/CLIBsdio.h | 2 +- NetWare/CLIBstr.h | 2 +- NetWare/CLIBstuf.c | 2 +- NetWare/CLIBstuf.h | 2 +- NetWare/Main.c | 2 +- NetWare/NWTInfo.c | 2 +- NetWare/NWUtil.c | 2 +- NetWare/Nwmain.c | 2 +- NetWare/Nwpipe.c | 2 +- NetWare/deb.h | 2 +- NetWare/intdef.h | 2 +- NetWare/interface.c | 2 +- NetWare/interface.cpp | 2 +- NetWare/interface.h | 2 +- NetWare/iperlhost.h | 2 +- NetWare/netware.h | 2 +- NetWare/nw5.c | 2 +- NetWare/nw5iop.h | 2 +- NetWare/nw5sck.c | 2 +- NetWare/nw5sck.h | 2 +- NetWare/nw5thread.c | 2 +- NetWare/nw5thread.h | 2 +- NetWare/nwhashcls.cpp | 2 +- NetWare/nwhashcls.h | 2 +- NetWare/nwperlhost.h | 2 +- NetWare/nwperlsys.c | 2 +- NetWare/nwperlsys.h | 2 +- NetWare/nwpipe.h | 2 +- NetWare/nwplglob.c | 2 +- NetWare/nwplglob.h | 2 +- NetWare/nwstdio.h | 2 +- NetWare/nwtinfo.h | 2 +- NetWare/nwutil.h | 2 +- NetWare/nwvmem.h | 2 +- NetWare/perllib.cpp | 2 +- NetWare/win32ish.h | 2 +- Porting/Maintainers.pl | 8 ++- Porting/checkAUTHORS.pl | 3 +- cpan/Encode/encengine.c | 2 +- cpan/Module-Metadata/lib/Module/Metadata.pm | 2 +- cpan/Sys-Syslog/Changes | 2 +- cpan/Unicode-Collate/Collate.pm | 9 ++-- cpan/Win32/Changes | 2 +- ext/attributes/attributes.xs | 2 +- gv.c | 2 +- lib/unicore/NamesList.txt | 70 +++++++++++++------------- locale.c | 4 +- malloc.c | 2 +- perl.c | 2 +- run.c | 2 +- taint.c | 2 +- utf8.c | 4 +- util.c | 2 +- 53 files changed, 98 insertions(+), 94 deletions(-) diff --git a/NetWare/CLIBsdio.h b/NetWare/CLIBsdio.h index 76aba02..b2db369 100644 --- a/NetWare/CLIBsdio.h +++ b/NetWare/CLIBsdio.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/CLIBstr.h b/NetWare/CLIBstr.h index e025c04..4b26fc9 100644 --- a/NetWare/CLIBstr.h +++ b/NetWare/CLIBstr.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/CLIBstuf.c b/NetWare/CLIBstuf.c index 0e649dc..26a4a4b 100644 --- a/NetWare/CLIBstuf.c +++ b/NetWare/CLIBstuf.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/CLIBstuf.h b/NetWare/CLIBstuf.h index 90f3557..78671fd 100644 --- a/NetWare/CLIBstuf.h +++ b/NetWare/CLIBstuf.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/Main.c b/NetWare/Main.c index d23ce68..5116cbc 100644 --- a/NetWare/Main.c +++ b/NetWare/Main.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/NWTInfo.c b/NetWare/NWTInfo.c index 4180fa7..b057d56 100644 --- a/NetWare/NWTInfo.c +++ b/NetWare/NWTInfo.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/NWUtil.c b/NetWare/NWUtil.c index 8db93c6..15e90cb 100644 --- a/NetWare/NWUtil.c +++ b/NetWare/NWUtil.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/Nwmain.c b/NetWare/Nwmain.c index a64534e..0b9728a 100644 --- a/NetWare/Nwmain.c +++ b/NetWare/Nwmain.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/Nwpipe.c b/NetWare/Nwpipe.c index 9caf2da..ce9c198 100644 --- a/NetWare/Nwpipe.c +++ b/NetWare/Nwpipe.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/deb.h b/NetWare/deb.h index ece19c2..e79a8f4 100644 --- a/NetWare/deb.h +++ b/NetWare/deb.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/intdef.h b/NetWare/intdef.h index ca84746..4c566c4 100644 --- a/NetWare/intdef.h +++ b/NetWare/intdef.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/interface.c b/NetWare/interface.c index 2cdadca..29a8dc0 100644 --- a/NetWare/interface.c +++ b/NetWare/interface.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/interface.cpp b/NetWare/interface.cpp index 47fef67..aef71f9 100644 --- a/NetWare/interface.cpp +++ b/NetWare/interface.cpp @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/interface.h b/NetWare/interface.h index b6dd4a0..9897993 100644 --- a/NetWare/interface.h +++ b/NetWare/interface.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/iperlhost.h b/NetWare/iperlhost.h index 3204c2c..cc1754a 100644 --- a/NetWare/iperlhost.h +++ b/NetWare/iperlhost.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/netware.h b/NetWare/netware.h index 18089d5..c106476 100644 --- a/NetWare/netware.h +++ b/NetWare/netware.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nw5.c b/NetWare/nw5.c index 7f9eebe..531b308 100644 --- a/NetWare/nw5.c +++ b/NetWare/nw5.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nw5iop.h b/NetWare/nw5iop.h index 27cd0a1..391c899 100644 --- a/NetWare/nw5iop.h +++ b/NetWare/nw5iop.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nw5sck.c b/NetWare/nw5sck.c index 46069a3..35dee92 100644 --- a/NetWare/nw5sck.c +++ b/NetWare/nw5sck.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nw5sck.h b/NetWare/nw5sck.h index 5c0e333..afe2f93 100644 --- a/NetWare/nw5sck.h +++ b/NetWare/nw5sck.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nw5thread.c b/NetWare/nw5thread.c index 9ff2c32..abedb5c 100644 --- a/NetWare/nw5thread.c +++ b/NetWare/nw5thread.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nw5thread.h b/NetWare/nw5thread.h index ad70db0..6bdba24 100644 --- a/NetWare/nw5thread.h +++ b/NetWare/nw5thread.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwhashcls.cpp b/NetWare/nwhashcls.cpp index 1c582a5..2bf2485 100644 --- a/NetWare/nwhashcls.cpp +++ b/NetWare/nwhashcls.cpp @@ -1,5 +1,5 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwhashcls.h b/NetWare/nwhashcls.h index 88956af..55ff200 100644 --- a/NetWare/nwhashcls.h +++ b/NetWare/nwhashcls.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwperlhost.h b/NetWare/nwperlhost.h index 9839184..c69e554 100644 --- a/NetWare/nwperlhost.h +++ b/NetWare/nwperlhost.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwperlsys.c b/NetWare/nwperlsys.c index 9eca522..32c15cb 100644 --- a/NetWare/nwperlsys.c +++ b/NetWare/nwperlsys.c @@ -1,5 +1,5 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwperlsys.h b/NetWare/nwperlsys.h index ff41d69..3d82dd1 100644 --- a/NetWare/nwperlsys.h +++ b/NetWare/nwperlsys.h @@ -1,5 +1,5 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwpipe.h b/NetWare/nwpipe.h index 4e9354a..462a73d 100644 --- a/NetWare/nwpipe.h +++ b/NetWare/nwpipe.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwplglob.c b/NetWare/nwplglob.c index 51a3e5e..6810fd5 100644 --- a/NetWare/nwplglob.c +++ b/NetWare/nwplglob.c @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwplglob.h b/NetWare/nwplglob.h index 1c9d9e4..cf60e73 100644 --- a/NetWare/nwplglob.h +++ b/NetWare/nwplglob.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwstdio.h b/NetWare/nwstdio.h index 4aabb0a..50ab6f3 100644 --- a/NetWare/nwstdio.h +++ b/NetWare/nwstdio.h @@ -1,5 +1,5 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwtinfo.h b/NetWare/nwtinfo.h index 25d78a7..a08d060 100644 --- a/NetWare/nwtinfo.h +++ b/NetWare/nwtinfo.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwutil.h b/NetWare/nwutil.h index 6c8f144..ff05d18 100644 --- a/NetWare/nwutil.h +++ b/NetWare/nwutil.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/nwvmem.h b/NetWare/nwvmem.h index da41afd..e82eaee 100644 --- a/NetWare/nwvmem.h +++ b/NetWare/nwvmem.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/perllib.cpp b/NetWare/perllib.cpp index a9eb824..32f8ed2 100644 --- a/NetWare/perllib.cpp +++ b/NetWare/perllib.cpp @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/NetWare/win32ish.h b/NetWare/win32ish.h index a8fcbcd..f6603d5 100644 --- a/NetWare/win32ish.h +++ b/NetWare/win32ish.h @@ -1,6 +1,6 @@ /* - * Copyright � 2001 Novell, Inc. All Rights Reserved. + * Copyright �� 2001 Novell, Inc. All Rights Reserved. * * You may distribute under the terms of either the GNU General Public * License or the Artistic License, as specified in the README file. diff --git a/Porting/Maintainers.pl b/Porting/Maintainers.pl index 1114369..f903bc3 100755 --- a/Porting/Maintainers.pl +++ b/Porting/Maintainers.pl @@ -7,8 +7,10 @@ package Maintainers; +use utf8; use File::Glob qw(:case); + %Maintainers = ( 'abergman' => 'Arthur Bergman ', @@ -18,7 +20,7 @@ use File::Glob qw(:case); 'andya' => 'Andy Armstrong ', 'arandal' => 'Allison Randal ', 'audreyt' => 'Audrey Tang ', - 'avar' => '�var Arnfj�r� Bjarmason ', + 'avar' => '��var Arnfj��r�� Bjarmason ', 'bingos' => 'Chris Williams ', 'chorny' => 'Alexandr Ciornii ', 'corion' => 'Max Maischein ', @@ -74,8 +76,8 @@ use File::Glob qw(:case); 'rra' => 'Russ Allbery ', 'rurban' => 'Reini Urban ', 'sadahiro' => 'SADAHIRO Tomoyuki ', - 'salva' => 'Salvador Fandi�o Garc�a ', - 'saper' => 'S�bastien Aperghis-Tramoni ', + 'salva' => 'Salvador Fandi��o Garc��a ', + 'saper' => 'S��bastien Aperghis-Tramoni ', 'sartak' => 'Shawn M Moore ', 'sbeck' => 'Sullivan Beck ', 'sburke' => 'Sean Burke ', diff --git a/Porting/checkAUTHORS.pl b/Porting/checkAUTHORS.pl index d59e41a..a5b770c 100755 --- a/Porting/checkAUTHORS.pl +++ b/Porting/checkAUTHORS.pl @@ -1,6 +1,7 @@ #!/usr/bin/perl -w use strict; my ($committer, $patch, $author, $date); +use utf8; use Getopt::Long; use Text::Wrap; $Text::Wrap::columns = 80; @@ -222,7 +223,7 @@ sub read_authors_files { $name =~ s/\s*\z//; $raw{$email} = $name; $count{$email}++; - } elsif (/^([-A-Za-z0-9 .\'�-����-�]+)[\t\n]/) { + } elsif (/^([-A-Za-z0-9 .\'��-��������-��]+)[\t\n]/) { # Name only $untraced{$1}++; diff --git a/cpan/Encode/encengine.c b/cpan/Encode/encengine.c index 255e4d7..7a2adad 100644 --- a/cpan/Encode/encengine.c +++ b/cpan/Encode/encengine.c @@ -81,7 +81,7 @@ This scheme can also handle shift encodings. A slight enhancement to the scheme also allows for look-ahead - if we add a flag to re-add the removed byte to the source we could handle - a" -> � + a" -> �� ab -> a (and take b back please) */ diff --git a/cpan/Module-Metadata/lib/Module/Metadata.pm b/cpan/Module-Metadata/lib/Module/Metadata.pm index e2c83d3..5635fb7 100644 --- a/cpan/Module-Metadata/lib/Module/Metadata.pm +++ b/cpan/Module-Metadata/lib/Module/Metadata.pm @@ -43,7 +43,7 @@ my $VARNAME_REGEXP = qr{ # match fully-qualified VERSION name ([\$*]) # sigil - $ or * ( ( # optional leading package name - (?:::|\')? # possibly starting like just :: (� la $::VERSION) + (?:::|\')? # possibly starting like just :: (�� la $::VERSION) (?:\w+(?:::|\'))* # Foo::Bar:: ... )? VERSION diff --git a/cpan/Sys-Syslog/Changes b/cpan/Sys-Syslog/Changes index 81f8bb1..a7d8c96 100644 --- a/cpan/Sys-Syslog/Changes +++ b/cpan/Sys-Syslog/Changes @@ -242,7 +242,7 @@ Revision history for Sys-Syslog [BUGFIX] Better error messages (Jari Aalto). 0.03 -- 2002.03.23 - [BUGFIX] Fixed copious warnings from Sys::Syslog (Andreas K�nig). + [BUGFIX] Fixed copious warnings from Sys::Syslog (Andreas K��nig). [FEATURE] Failover to different communication modes by Nick Williams. 0.02 -- 2001.06.04 diff --git a/cpan/Unicode-Collate/Collate.pm b/cpan/Unicode-Collate/Collate.pm index fac2cce..6b77e5a 100644 --- a/cpan/Unicode-Collate/Collate.pm +++ b/cpan/Unicode-Collate/Collate.pm @@ -11,6 +11,7 @@ use strict; use warnings; use Carp; use File::Spec; +use utf8; no warnings 'utf8'; @@ -1656,15 +1657,15 @@ e.g. you say my $Collator = Unicode::Collate->new( normalization => undef, level => 1 ); # (normalization => undef) is REQUIRED. - my $str = "Ich mu� studieren Perl."; - my $sub = "M�SS"; + my $str = "Ich mu�� studieren Perl."; + my $sub = "M��SS"; my $match; if (my($pos,$len) = $Collator->index($str, $sub)) { $match = substr($str, $pos, $len); } -and get C<"mu�"> in C<$match> since C<"mu�"> -is primary equal to C<"M�SS">. +and get C<"mu��"> in C<$match> since C<"mu��"> +is primary equal to C<"M��SS">. =item C<$match_ref = $Collator-Ematch($string, $substring)> diff --git a/cpan/Win32/Changes b/cpan/Win32/Changes index dbbb4ff..f0a57bc 100644 --- a/cpan/Win32/Changes +++ b/cpan/Win32/Changes @@ -88,7 +88,7 @@ Revision history for the Perl extension Win32. - Use uppercase environment variable names in t/Unicode.t because the MSWin32 doesn't care, and Cygwin only works with the uppercased version. - - new t/Names.t test (from S�bastien Aperghis-Tramoni) + - new t/Names.t test (from S��bastien Aperghis-Tramoni) 0.30 [2007-06-25] - Fixed t/Unicode.t test for Cygwin (with help from Jerry D. Hedden). diff --git a/ext/attributes/attributes.xs b/ext/attributes/attributes.xs index 24f5f61..3900c36 100644 --- a/ext/attributes/attributes.xs +++ b/ext/attributes/attributes.xs @@ -12,7 +12,7 @@ * 'Perilous to us all are the devices of an art deeper than we possess * ourselves.' --Gandalf * - * [p.597 of _The Lord of the Rings_, III/xi: "The Palant�r"] + * [p.597 of _The Lord of the Rings_, III/xi: "The Palant��r"] */ #define PERL_NO_GET_CONTEXT diff --git a/gv.c b/gv.c index b3b628e..3427944 100644 --- a/gv.c +++ b/gv.c @@ -16,7 +16,7 @@ * history of Middle-earth and Over-heaven and of the Sundering Seas,' * laughed Pippin. * - * [p.599 of _The Lord of the Rings_, III/xi: "The Palant�r"] + * [p.599 of _The Lord of the Rings_, III/xi: "The Palant��r"] */ /* diff --git a/lib/unicore/NamesList.txt b/lib/unicore/NamesList.txt index 4f698c7..df80124 100644 --- a/lib/unicore/NamesList.txt +++ b/lib/unicore/NamesList.txt @@ -676,7 +676,7 @@ : 0061 030A 00E6 LATIN SMALL LETTER AE = latin small ligature ae (1.0) - = ash (from Old English �sc) + = ash (from Old English ��sc) * Danish, Norwegian, Icelandic, Faroese, Old English, French, IPA x (latin small ligature oe - 0153) x (cyrillic small ligature a ie - 04D5) @@ -979,7 +979,7 @@ : 006F 030B 0152 LATIN CAPITAL LIGATURE OE 0153 LATIN SMALL LIGATURE OE - = ethel (from Old English e�el) + = ethel (from Old English e��el) * French, IPA, Old Icelandic, Old English, ... x (latin small letter ae - 00E6) x (latin letter small capital oe - 0276) @@ -11804,7 +11804,7 @@ * editing mark 2051 TWO ASTERISKS ALIGNED VERTICALLY 2052 COMMERCIAL MINUS SIGN - = abz�glich (German), med avdrag av (Swedish), piska (Swedish, "whip") + = abz��glich (German), med avdrag av (Swedish), piska (Swedish, "whip") * a common glyph variant and fallback representation looks like ./. * may also be used as a dingbat to indicate correctness * used in Finno-Ugric Phonetic Alphabet to indicate a related borrowed form with different sound @@ -12031,7 +12031,7 @@ * Laos 20AE TUGRIK SIGN * Mongolia - * also transliterated as tugrug, tugric, tugrog, togrog, t�gr�g + * also transliterated as tugrug, tugric, tugrog, togrog, t��gr��g 20AF DRACHMA SIGN * Greece 20B0 GERMAN PENNY SIGN @@ -12263,7 +12263,7 @@ 212A KELVIN SIGN : 004B latin capital letter k 212B ANGSTROM SIGN - * non SI length unit (=0.1 nm) named after A. J. �ngstr�m, Swedish physicist + * non SI length unit (=0.1 nm) named after A. J. ��ngstr��m, Swedish physicist * preferred representation is 00C5 : 00C5 latin capital letter a with ring above 212C SCRIPT CAPITAL B @@ -15014,7 +15014,7 @@ 271E SHADOWED WHITE LATIN CROSS 271F OUTLINED LATIN CROSS 2720 MALTESE CROSS - * Historically, the Maltese cross took many forms; the shape shown in the Zapf Dingbats is similar to one known as the Cross Form�e. + * Historically, the Maltese cross took many forms; the shape shown in the Zapf Dingbats is similar to one known as the Cross Form��e. @ Stars, asterisks and snowflakes 2721 STAR OF DAVID x (six pointed star with middle dot - 1F52F) @@ -21882,12 +21882,12 @@ A6E6 BAMUM LETTER MO A6E7 BAMUM LETTER MBAA * also used for digit two A6E8 BAMUM LETTER TET - * t�t + * t��t * also used for digit three A6E9 BAMUM LETTER KPA * also used for digit four A6EA BAMUM LETTER TEN - * t�n + * t��n * also used for digit five A6EB BAMUM LETTER NTUU * also used for digit six @@ -23238,7 +23238,7 @@ D7FB HANGUL JONGSEONG PHIEUPH-THIEUTH @@ F900 CJK Compatibility Ideographs FAFF @@+ @+ This block, despite its name, contains a number of unified CJK ideographs. Those characters are individually identified by annotations. -@ Pronunciation variants from KS�X�1001:1998 +@ Pronunciation variants from KS��X��1001:1998 F900 CJK COMPATIBILITY IDEOGRAPH-F900 : 8C48 F901 CJK COMPATIBILITY IDEOGRAPH-F901 @@ -31704,7 +31704,7 @@ FFFF 1D208 GREEK VOCAL NOTATION SYMBOL-9 = Greek instrumental notation symbol-44 * vocal second sharp of G - * instrumental first sharp of e� + * instrumental first sharp of e�� 1D209 GREEK VOCAL NOTATION SYMBOL-10 * vocal A * this is a modification of 039F and is therefore not the same as 03D8 @@ -31717,7 +31717,7 @@ FFFF 1D20D GREEK VOCAL NOTATION SYMBOL-14 = Greek instrumental notation symbol-41 * vocal first sharp of B - * instrumental first sharp of d� + * instrumental first sharp of d�� x (latin capital letter v - 0056) 1D20E GREEK VOCAL NOTATION SYMBOL-15 = Greek instrumental notation symbol-35 @@ -31749,16 +31749,16 @@ FFFF 1D217 GREEK VOCAL NOTATION SYMBOL-24 * vocal second sharp of e 1D218 GREEK VOCAL NOTATION SYMBOL-50 - * vocal first sharp of g� + * vocal first sharp of g�� 1D219 GREEK VOCAL NOTATION SYMBOL-51 - * vocal second sharp of g� + * vocal second sharp of g�� 1D21A GREEK VOCAL NOTATION SYMBOL-52 - * vocal a� + * vocal a�� 1D21B GREEK VOCAL NOTATION SYMBOL-53 - * vocal first sharp of a� + * vocal first sharp of a�� 1D21C GREEK VOCAL NOTATION SYMBOL-54 = Greek instrumental notation symbol-20 - * vocal second sharp of a� + * vocal second sharp of a�� * instrumental first sharp of d @ Ancient Greek instrumental notation 1D21D GREEK INSTRUMENTAL NOTATION SYMBOL-1 @@ -31806,37 +31806,37 @@ FFFF 1D232 GREEK INSTRUMENTAL NOTATION SYMBOL-36 * instrumental second sharp of b 1D233 GREEK INSTRUMENTAL NOTATION SYMBOL-37 - * instrumental c� + * instrumental c�� 1D234 GREEK INSTRUMENTAL NOTATION SYMBOL-38 - * instrumental first sharp of c� + * instrumental first sharp of c�� 1D235 GREEK INSTRUMENTAL NOTATION SYMBOL-39 - * instrumental second sharp of c� + * instrumental second sharp of c�� 1D236 GREEK INSTRUMENTAL NOTATION SYMBOL-40 - * instrumental d� + * instrumental d�� 1D237 GREEK INSTRUMENTAL NOTATION SYMBOL-42 - * instrumental second sharp of d� + * instrumental second sharp of d�� 1D238 GREEK INSTRUMENTAL NOTATION SYMBOL-43 - * instrumental e� + * instrumental e�� 1D239 GREEK INSTRUMENTAL NOTATION SYMBOL-45 - * instrumental second sharp of e� + * instrumental second sharp of e�� 1D23A GREEK INSTRUMENTAL NOTATION SYMBOL-47 - * instrumental first sharp of f� + * instrumental first sharp of f�� * similar but not identical to 002F 1D23B GREEK INSTRUMENTAL NOTATION SYMBOL-48 - * instrumental second sharp of f� + * instrumental second sharp of f�� * similar but not identical to 005C 1D23C GREEK INSTRUMENTAL NOTATION SYMBOL-49 - * instrumental g� + * instrumental g�� 1D23D GREEK INSTRUMENTAL NOTATION SYMBOL-50 - * instrumental first sharp of g� + * instrumental first sharp of g�� 1D23E GREEK INSTRUMENTAL NOTATION SYMBOL-51 - * instrumental second sharp of g� + * instrumental second sharp of g�� 1D23F GREEK INSTRUMENTAL NOTATION SYMBOL-52 - * instrumental a� + * instrumental a�� 1D240 GREEK INSTRUMENTAL NOTATION SYMBOL-53 - * instrumental first sharp of a� + * instrumental first sharp of a�� 1D241 GREEK INSTRUMENTAL NOTATION SYMBOL-54 - * instrumental second sharp of a� + * instrumental second sharp of a�� @ Further Greek musical notation symbols 1D242 COMBINING GREEK MUSICAL TRISEME x (metrical triseme - 23D7) @@ -34258,10 +34258,10 @@ FFFF = chevalier, Ober, Ritter, cavall, cavaliere = knight of swords 1F0AD PLAYING CARD QUEEN OF SPADES - = dame, Dame, K�nigin, regina + = dame, Dame, K��nigin, regina = queen of swords 1F0AE PLAYING CARD KING OF SPADES - = roi, K�nig, re + = roi, K��nig, re = king of swords @ Hearts or cups 1F0B1 PLAYING CARD ACE OF HEARTS @@ -34519,7 +34519,7 @@ FFFF = parking space (ARIB STD B24) 1F160 NEGATIVE CIRCLED LATIN CAPITAL LETTER Q 1F161 NEGATIVE CIRCLED LATIN CAPITAL LETTER R - = Rastst�tte (rest stop) + = Rastst��tte (rest stop) 1F162 NEGATIVE CIRCLED LATIN CAPITAL LETTER S = Stadtbahn (metropolitan railway) 1F163 NEGATIVE CIRCLED LATIN CAPITAL LETTER T @@ -36007,7 +36007,7 @@ FFFF @@ 2A700 CJK Unified Ideographs Extension C 2B734 @@ 2B740 CJK Unified Ideographs Extension D 2B81D @@ 2F800 CJK Compatibility Ideographs Supplement 2FA1F -@ Duplicate characters from CNS�11643-1992 +@ Duplicate characters from CNS��11643-1992 2F800 CJK COMPATIBILITY IDEOGRAPH-2F800 : 4E3D 2F801 CJK COMPATIBILITY IDEOGRAPH-2F801 diff --git a/locale.c b/locale.c index 4631b86..163f412 100644 --- a/locale.c +++ b/locale.c @@ -10,9 +10,9 @@ /* * A Elbereth Gilthoniel, - * silivren penna m�riel + * silivren penna m��riel * o menel aglar elenath! - * Na-chaered palan-d�riel + * Na-chaered palan-d��riel * o galadhremmin ennorath, * Fanuilos, le linnathon * nef aear, si nef aearon! diff --git a/malloc.c b/malloc.c index 3c2923a..64613ee 100644 --- a/malloc.c +++ b/malloc.c @@ -5,7 +5,7 @@ /* * 'The Chamber of Records,' said Gimli. 'I guess that is where we now stand.' * - * [p.321 of _The Lord of the Rings_, II/v: "The Bridge of Khazad-D�m"] + * [p.321 of _The Lord of the Rings_, II/v: "The Bridge of Khazad-D��m"] */ /* This file contains Perl's own implementation of the malloc library. diff --git a/perl.c b/perl.c index 9ebb3d2..8412d9c 100644 --- a/perl.c +++ b/perl.c @@ -13,7 +13,7 @@ /* * A ship then new they built for him * of mithril and of elven-glass - * --from Bilbo's song of E�rendil + * --from Bilbo's song of E��rendil * * [p.236 of _The Lord of the Rings_, II/i: "Many Meetings"] */ diff --git a/run.c b/run.c index 368ef03..7c1d0aa 100644 --- a/run.c +++ b/run.c @@ -30,7 +30,7 @@ * Now we are come to the lands where you were foaled, and every stone you * know. Run now! Hope is in speed!' --Gandalf * - * [p.600 of _The Lord of the Rings_, III/xi: "The Palant�r"] + * [p.600 of _The Lord of the Rings_, III/xi: "The Palant��r"] */ int diff --git a/taint.c b/taint.c index 62c171f..fa1366f 100644 --- a/taint.c +++ b/taint.c @@ -11,7 +11,7 @@ /* * '...we will have peace, when you and all your works have perished--and * the works of your dark master to whom you would deliver us. You are a - * liar, Saruman, and a corrupter of men's hearts.' --Th�oden + * liar, Saruman, and a corrupter of men's hearts.' --Th��oden * * [p.580 of _The Lord of the Rings_, III/x: "The Voice of Saruman"] */ diff --git a/utf8.c b/utf8.c index 797c811..4bab3a9 100644 --- a/utf8.c +++ b/utf8.c @@ -13,12 +13,12 @@ * heard of that we don't want to see any closer; and that's the one place * we're trying to get to! And that's just where we can't get, nohow.' * - * [p.603 of _The Lord of the Rings_, IV/I: "The Taming of Sm�agol"] + * [p.603 of _The Lord of the Rings_, IV/I: "The Taming of Sm��agol"] * * 'Well do I understand your speech,' he answered in the same language; * 'yet few strangers do so. Why then do you not speak in the Common Tongue, * as is the custom in the West, if you wish to be answered?' - * --Gandalf, addressing Th�oden's door wardens + * --Gandalf, addressing Th��oden's door wardens * * [p.508 of _The Lord of the Rings_, III/vi: "The King of the Golden Hall"] * diff --git a/util.c b/util.c index 6a53cff..70a1496 100644 --- a/util.c +++ b/util.c @@ -12,7 +12,7 @@ * 'Very useful, no doubt, that was to Saruman; yet it seems that he was * not content.' --Gandalf to Pippin * - * [p.598 of _The Lord of the Rings_, III/xi: "The Palant�r"] + * [p.598 of _The Lord of the Rings_, III/xi: "The Palant��r"] */ /* This file contains assorted utility routines. -- 1.7.4.5 ```
p5pRT commented 13 years ago

From @cpansprout

On Wed Sep 07 19​:11​:35 2011\, keithsthompson@​gmail.com wrote​:

As I discussed a few weeks ago with Father Chrysostomos\, I volunteered to submit a patch converting Latin-1 files in the Perl source tree to UTF-8. I've just finished the patch (attached).

For a lot of the files\, this was straightforward; most of the changes involved either copyright symbols or Lord of the Rings quotations in header comments.

I do have some doubts about some of the changes I made\, and I encourage anyone who's more familiar with this than I am to review the changes I've made. In particular\, I'm not sure about the files under the cpan/ directory; I suppose there needs to be some coordination with the corresponding sources in CPAN itself; I don't know how that works. Perhaps it woudl be easier to leave those changes out.

For those files\, CPAN is upstream. They usually remain untouched between upgrades to newer versions. Any changes have to be made to the CPAN distributions first. We sometimes make exceptions for test failures or modules used to bootstrap perl itself.

There are a number of files that have encodings other than UTF-8 that I didn't touch\, mostly because there seem to be specific requirements to use those encodings. The files are listed in the attached file "skipped.txt".

I agree that those should be skipped.

This patch did cause one test failure\, in porting/cmp_version. Running the test by itself shows the following​: ... It's complaining about the change in the "attributes/attributes.xs" file\, which should be resolved if this patch is committed.

It’s complaining that it changed without the version number in ext/attributes/attributes.pm changing.

I’ve applied your patch\, but without the cpan/ changes and without the NamesList.txt changes (that file is from the Unicode Consortium and is simply plopped in as it is)\, as cdad3b53476\, followed by a version bump for attributes.pm in commit 83e49ee07c6.

Thank you.

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

@cpansprout - Status changed from 'open' to 'resolved'