Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.92k stars 550 forks source link

PATCH: Make isFOO() O(1) performance, add variants for forcing ASCII, Latin1 interpretations #10661

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#78024 (status was 'resolved')

Searchable as RT78024$

p5pRT commented 14 years ago

From @khwilliamson

The attached series of commits changes the definitions of the character class macros in handy.h to use table lookup for all those that may require more than one comparison (leaving just isASCII() as not table lookup).

Variants of each one are added\, with the ones whose names end in _A mean that they match only in the ASCII range\, and those ending in _L1 match in the entire Latin1 range.

The first two commits are repeats of those in [perl #78022] PATCH​: Add a couple of macros to handy.h. The whole series is available at git​://github.com/khwilliamson/perl.git branch autodoc.

The effect of this patch is both performance and extending capabilities.   Now all calls to these are O(1)\, and there are macros available to use for the /u and proposed /a or similar regular expression modifiers. Previously the macros could have executed many branches to classify the input. Finding an ASCII word character could take 7 comparisons\, quite a few more for Latin1.

p5pRT commented 14 years ago

From @khwilliamson

0001-Subject-handy.h-Add-isSPACE_L1-with-Unicode-semant.patch ```diff From c44689233680ebe74a0ec0bca2149fec5057c5ed Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 11:32:13 -0600 Subject: [PATCH] Subject: handy.h: Add isSPACE_L1 with Unicode semantics --- handy.h | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/handy.h b/handy.h index b41c1c8..7bacab3 100644 --- a/handy.h +++ b/handy.h @@ -515,6 +515,8 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc #define isALPHA(c) (isUPPER(c) || isLOWER(c)) /* ALPHAU includes Unicode semantics for latin1 characters. It has an extra * >= AA test to speed up ASCII-only tests at the expense of the others */ +/* XXX decide whether to document the ALPHAU, ALNUMU and isSPACE_L1 functions. + * Most of these should be implemented as table lookup for speed */ #define isALPHAU(c) (isALPHA(c) || (NATIVE_TO_UNI((U8) c) >= 0xAA \ && ((NATIVE_TO_UNI((U8) c) >= 0xC0 \ && NATIVE_TO_UNI((U8) c) != 0xD7 && NATIVE_TO_UNI((U8) c) != 0xF7) \ @@ -527,6 +529,8 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc #define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) #define isSPACE(c) \ ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') +#define isSPACE_L1(c) (isSPACE(c) \ + || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) #define isPSXSPC(c) (isSPACE(c) || (c) == '\v') #define isBLANK(c) ((c) == ' ' || (c) == '\t') #define isDIGIT(c) ((c) >= '0' && (c) <= '9') -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-handy.h-Add-isWORDCHAR_L1-macro.patch ```diff From fb39b92376355c8542ece6daa038bffcebf5c150 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 12:16:12 -0600 Subject: [PATCH] handy.h: Add isWORDCHAR_L1() macro This is a synonym for isALNUMU --- handy.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/handy.h b/handy.h index 7bacab3..c5c12b7 100644 --- a/handy.h +++ b/handy.h @@ -524,6 +524,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc || NATIVE_TO_UNI((U8) c) == 0xB5 \ || NATIVE_TO_UNI((U8) c) == 0xBA))) #define isALNUMU(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') +#define isWORDCHAR_L1(c) isALNUMU(c) /* continuation character for legal NAME in \N{NAME} */ #define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0003-Subject-handy.h-Move-defn-s-outside-ifndef-EBCDIC.patch ```diff From 990284df216e73c8ad7ca871704551df87b2d8b1 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 13:41:52 -0600 Subject: [PATCH] Subject: handy.h: Move defn's outside #ifndef EBCDIC Commit 4125141464884619e852c7b0986a51eba8fe1636 improperly got rid of EBCDIC handling, as it combined the ASCII and EBCDIC versions, but left the result in the ASCII-only branch. Just move to the common code. --- handy.h | 30 +++++++++++++++--------------- 1 files changed, 15 insertions(+), 15 deletions(-) diff --git a/handy.h b/handy.h index c5c12b7..00bc5e4 100644 --- a/handy.h +++ b/handy.h @@ -559,26 +559,26 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isPUNCT(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) # define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) - - /* Use table lookup for speed; return error character for input - * out-of-range */ -# define toLOWER_LATIN1(c) (FITS_IN_8_BITS(c) \ - ? UNI_TO_NATIVE(PL_latin1_lc[ \ - NATIVE_TO_UNI( (U8) (c)) ]) \ - : UNICODE_REPLACEMENT) - /* Modified uc. Is correct uc except for three non-ascii chars which are - * all mapped to one of them, and these need special handling; error - * character for input out-of-range */ -# define toUPPER_LATIN1_MOD(c) (FITS_IN_8_BITS(c) \ - ? UNI_TO_NATIVE(PL_mod_latin1_uc[ \ - NATIVE_TO_UNI( (U8) (c)) ]) \ - : UNICODE_REPLACEMENT) - /* ASCII casing. */ # define toUPPER(c) (isLOWER(c) ? (c) - ('a' - 'A') : (c)) # define toLOWER(c) (isUPPER(c) ? (c) + ('a' - 'A') : (c)) #endif + +/* Use table lookup for speed; return error character for input + * out-of-range */ +#define toLOWER_LATIN1(c) (FITS_IN_8_BITS(c) \ + ? UNI_TO_NATIVE(PL_latin1_lc[ \ + NATIVE_TO_UNI( (U8) (c)) ]) \ + : UNICODE_REPLACEMENT) +/* Modified uc. Is correct uc except for three non-ascii chars which are + * all mapped to one of them, and these need special handling; error + * character for input out-of-range */ +#define toUPPER_LATIN1_MOD(c) (FITS_IN_8_BITS(c) \ + ? UNI_TO_NATIVE(PL_mod_latin1_uc[ \ + NATIVE_TO_UNI( (U8) (c)) ]) \ + : UNICODE_REPLACEMENT) + #ifdef USE_NEXT_CTYPE # define isALNUM_LC(c) \ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0004-handy.h-isPSXSPC-is-wrong-for-EBCDIC.patch ```diff From 8a6ef31341fa1cf69ac0185239f0a85b71800806 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 13:45:58 -0600 Subject: [PATCH] handy.h: isPSXSPC() is wrong for EBCDIC The macro was using the ASCII definition, which doesn't include NEL nor NBSP. But, libc contains the correct definition, which is usable on EBCDIC since we don't worry about locales there. --- handy.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/handy.h b/handy.h index 00bc5e4..e96fa48 100644 --- a/handy.h +++ b/handy.h @@ -532,7 +532,6 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') #define isSPACE_L1(c) (isSPACE(c) \ || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) -#define isPSXSPC(c) (isSPACE(c) || (c) == '\v') #define isBLANK(c) ((c) == ' ' || (c) == '\t') #define isDIGIT(c) ((c) >= '0' && (c) <= '9') #define isOCTAL(c) ((c) >= '0' && (c) <= '7') @@ -545,6 +544,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isCNTRL(c) iscntrl(c) # define isGRAPH(c) isgraph(c) # define isPRINT(c) isprint(c) +# define isPSXSPC(c) isspace(c) # define isPUNCT(c) ispunct(c) # define isXDIGIT(c) isxdigit(c) # define toUPPER(c) toupper(c) @@ -556,6 +556,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isCNTRL(c) ((U8) (c) < ' ' || (c) == 127) # define isGRAPH(c) (isALNUM(c) || isPUNCT(c)) # define isPRINT(c) (((c) >= 32 && (c) < 127)) +# define isPSXSPC(c) (isSPACE(c) || (c) == '\v') # define isPUNCT(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) # define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0005-handy.h-EBCDIC-isBLANK-is-wrong.patch ```diff From 1f6bc6f122d1bb353b44f0ff96dc55dea97ef677 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 13:57:51 -0600 Subject: [PATCH] handy.h: EBCDIC isBLANK() is wrong It doesn't include NBSP --- handy.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/handy.h b/handy.h index e96fa48..0dc9f39 100644 --- a/handy.h +++ b/handy.h @@ -532,7 +532,6 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') #define isSPACE_L1(c) (isSPACE(c) \ || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) -#define isBLANK(c) ((c) == ' ' || (c) == '\t') #define isDIGIT(c) ((c) >= '0' && (c) <= '9') #define isOCTAL(c) ((c) >= '0' && (c) <= '7') #define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) @@ -541,6 +540,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isUPPER(c) isupper(c) # define isLOWER(c) islower(c) # define isALNUMC(c) isalnum(c) +# define isBLANK(c) ((c) == ' ' || (c) == '\t' || NATIVE_TO_UNI(c) == 0xA0) # define isCNTRL(c) iscntrl(c) # define isGRAPH(c) isgraph(c) # define isPRINT(c) isprint(c) @@ -553,6 +553,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isUPPER(c) ((c) >= 'A' && (c) <= 'Z') # define isLOWER(c) ((c) >= 'a' && (c) <= 'z') # define isALNUMC(c) (isALPHA(c) || isDIGIT(c)) +# define isBLANK(c) ((c) == ' ' || (c) == '\t') # define isCNTRL(c) ((U8) (c) < ' ' || (c) == 127) # define isGRAPH(c) (isALNUM(c) || isPUNCT(c)) # define isPRINT(c) (((c) >= 32 && (c) < 127)) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0006-handy.h-isSPACE-is-wrong-for-EBCDIC.patch ```diff From 939cd03d4c8e2326cb111a77a069f3f2a51a669d Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 14:14:09 -0600 Subject: [PATCH] handy.h: isSPACE() is wrong for EBCDIC It didn't include the Latin1 space components. --- handy.h | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/handy.h b/handy.h index 0dc9f39..d6d77a4 100644 --- a/handy.h +++ b/handy.h @@ -528,8 +528,6 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc /* continuation character for legal NAME in \N{NAME} */ #define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) -#define isSPACE(c) \ - ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') #define isSPACE_L1(c) (isSPACE(c) \ || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) #define isDIGIT(c) ((c) >= '0' && (c) <= '9') @@ -546,6 +544,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isPRINT(c) isprint(c) # define isPSXSPC(c) isspace(c) # define isPUNCT(c) ispunct(c) +# define isSPACE(c) (isPSXSPC(c) && (c) != '\v') # define isXDIGIT(c) isxdigit(c) # define toUPPER(c) toupper(c) # define toLOWER(c) tolower(c) @@ -559,6 +558,8 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isPRINT(c) (((c) >= 32 && (c) < 127)) # define isPSXSPC(c) (isSPACE(c) || (c) == '\v') # define isPUNCT(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) +# define isSPACE(c) \ + ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') # define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) /* ASCII casing. */ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0007-Subject-handy.h-Reorder-defines-alphabetically.patch ```diff From 2e7fa2c4f6aaf266d9f0e4afa0b1ae9e2f11ab5e Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 14:26:51 -0600 Subject: [PATCH] Subject: handy.h: Reorder #defines alphabetically The only change here is that I sorted these #defines within their groups, to make it much easier to follow what's going on. --- handy.h | 25 +++++++++++++------------ 1 files changed, 13 insertions(+), 12 deletions(-) diff --git a/handy.h b/handy.h index d6d77a4..fcec7c8 100644 --- a/handy.h +++ b/handy.h @@ -511,7 +511,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc #endif #define isALNUM(c) (isALPHA(c) || isDIGIT(c) || (c) == '_') -#define isIDFIRST(c) (isALPHA(c) || (c) == '_') +#define isALNUMU(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') #define isALPHA(c) (isUPPER(c) || isLOWER(c)) /* ALPHAU includes Unicode semantics for latin1 characters. It has an extra * >= AA test to speed up ASCII-only tests at the expense of the others */ @@ -523,48 +523,49 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc || NATIVE_TO_UNI((U8) c) == 0xAA \ || NATIVE_TO_UNI((U8) c) == 0xB5 \ || NATIVE_TO_UNI((U8) c) == 0xBA))) -#define isALNUMU(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') -#define isWORDCHAR_L1(c) isALNUMU(c) +#define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) /* continuation character for legal NAME in \N{NAME} */ #define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) -#define isSPACE_L1(c) (isSPACE(c) \ - || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) + #define isDIGIT(c) ((c) >= '0' && (c) <= '9') +#define isIDFIRST(c) (isALPHA(c) || (c) == '_') #define isOCTAL(c) ((c) >= '0' && (c) <= '7') -#define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) +#define isSPACE_L1(c) (isSPACE(c) \ + || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) +#define isWORDCHAR_L1(c) isALNUMU(c) #ifdef EBCDIC /* In EBCDIC we do not do locales: therefore() isupper() is fine. */ -# define isUPPER(c) isupper(c) -# define isLOWER(c) islower(c) # define isALNUMC(c) isalnum(c) # define isBLANK(c) ((c) == ' ' || (c) == '\t' || NATIVE_TO_UNI(c) == 0xA0) # define isCNTRL(c) iscntrl(c) # define isGRAPH(c) isgraph(c) +# define isLOWER(c) islower(c) # define isPRINT(c) isprint(c) # define isPSXSPC(c) isspace(c) # define isPUNCT(c) ispunct(c) # define isSPACE(c) (isPSXSPC(c) && (c) != '\v') +# define isUPPER(c) isupper(c) # define isXDIGIT(c) isxdigit(c) -# define toUPPER(c) toupper(c) # define toLOWER(c) tolower(c) +# define toUPPER(c) toupper(c) #else -# define isUPPER(c) ((c) >= 'A' && (c) <= 'Z') -# define isLOWER(c) ((c) >= 'a' && (c) <= 'z') # define isALNUMC(c) (isALPHA(c) || isDIGIT(c)) # define isBLANK(c) ((c) == ' ' || (c) == '\t') # define isCNTRL(c) ((U8) (c) < ' ' || (c) == 127) # define isGRAPH(c) (isALNUM(c) || isPUNCT(c)) +# define isLOWER(c) ((c) >= 'a' && (c) <= 'z') # define isPRINT(c) (((c) >= 32 && (c) < 127)) # define isPSXSPC(c) (isSPACE(c) || (c) == '\v') # define isPUNCT(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) # define isSPACE(c) \ ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') +# define isUPPER(c) ((c) >= 'A' && (c) <= 'Z') # define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) /* ASCII casing. */ -# define toUPPER(c) (isLOWER(c) ? (c) - ('a' - 'A') : (c)) # define toLOWER(c) (isUPPER(c) ? (c) + ('a' - 'A') : (c)) +# define toUPPER(c) (isLOWER(c) ? (c) - ('a' - 'A') : (c)) #endif -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0008-Indent-a-comment-better.patch ```diff From 9fec4c5c6f263e134318d6195fb68d693f8f4365 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 14:27:57 -0600 Subject: [PATCH] Indent a comment better --- handy.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/handy.h b/handy.h index fcec7c8..1f4dba5 100644 --- a/handy.h +++ b/handy.h @@ -563,7 +563,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isUPPER(c) ((c) >= 'A' && (c) <= 'Z') # define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) -/* ASCII casing. */ + /* ASCII casing. */ # define toLOWER(c) (isUPPER(c) ? (c) + ('a' - 'A') : (c)) # define toUPPER(c) (isLOWER(c) ? (c) - ('a' - 'A') : (c)) #endif -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0009-Add-a-comment-clarify-another.patch ```diff From d1c7eb5ad1b6e5a0c48389f4affa4df9f0091ba9 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 14:30:54 -0600 Subject: [PATCH] Add a comment; clarify another --- handy.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/handy.h b/handy.h index 1f4dba5..761c51a 100644 --- a/handy.h +++ b/handy.h @@ -535,7 +535,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) #define isWORDCHAR_L1(c) isALNUMU(c) #ifdef EBCDIC - /* In EBCDIC we do not do locales: therefore() isupper() is fine. */ + /* In EBCDIC we do not do locales: therefore can use native functions */ # define isALNUMC(c) isalnum(c) # define isBLANK(c) ((c) == ' ' || (c) == '\t' || NATIVE_TO_UNI(c) == 0xA0) # define isCNTRL(c) iscntrl(c) @@ -549,7 +549,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isXDIGIT(c) isxdigit(c) # define toLOWER(c) tolower(c) # define toUPPER(c) toupper(c) -#else +#else /* Not EBCDIC */ # define isALNUMC(c) (isALPHA(c) || isDIGIT(c)) # define isBLANK(c) ((c) == ' ' || (c) == '\t') # define isCNTRL(c) ((U8) (c) < ' ' || (c) == 127) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0010-Subject-handy.h-Add-isWORDCHAR-for-clarity.patch ```diff From f7602b5694ea65b6e8e2f41d480a00f318701640 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 14:40:42 -0600 Subject: [PATCH] Subject: handy.h: Add isWORDCHAR() for clarity The name isALNUM() is problematic, as it is very close to isALNUMC(), and doesn't mean exactly what most people might think. I presume the C in isALNUMC stands for C language or libc, but am not sure. Others don't know either. But in any event, isALNUM is different from the C isalnum(), in that it matches the Perl concept of \w, which differs from the C definition in exactly one place. Perl includes the underscore character, '_'. So, I'm adding a isWORDCHAR() macro for future code to use to be more clear. I thought also about isWORD(), but I think confusion can arise from thinking that means a whole word. isWORDCHAR_L1() matches in the Latin1 range, to be equivalent to isALNUMU(). The motivation for using L1 instead of U will be explained in a commit message for the other L1 macros that are to be added. --- handy.h | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/handy.h b/handy.h index 761c51a..6878531 100644 --- a/handy.h +++ b/handy.h @@ -510,8 +510,8 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define FITS_IN_8_BITS(c) ((sizeof(c) == 1) || (((U32)(c) & 0xFF) == (U32)(c))) #endif -#define isALNUM(c) (isALPHA(c) || isDIGIT(c) || (c) == '_') -#define isALNUMU(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') +#define isALNUM(c) isWORDCHAR(c) +#define isALNUMU(c) isWORDCHAR_L1(c) #define isALPHA(c) (isUPPER(c) || isLOWER(c)) /* ALPHAU includes Unicode semantics for latin1 characters. It has an extra * >= AA test to speed up ASCII-only tests at the expense of the others */ @@ -533,7 +533,8 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc #define isOCTAL(c) ((c) >= '0' && (c) <= '7') #define isSPACE_L1(c) (isSPACE(c) \ || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) -#define isWORDCHAR_L1(c) isALNUMU(c) +#define isWORDCHAR(c) (isALPHA(c) || isDIGIT(c) || (c) == '_') +#define isWORDCHAR_L1(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') #ifdef EBCDIC /* In EBCDIC we do not do locales: therefore can use native functions */ # define isALNUMC(c) isalnum(c) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0011-handy.h-move-macro-in-file.patch ```diff From b20fbe4a5dfcce812ebcdb5697561f85f596132b Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 15:08:06 -0600 Subject: [PATCH] handy.h: move macro in file --- handy.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/handy.h b/handy.h index 6878531..15a0687 100644 --- a/handy.h +++ b/handy.h @@ -510,6 +510,8 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define FITS_IN_8_BITS(c) ((sizeof(c) == 1) || (((U32)(c) & 0xFF) == (U32)(c))) #endif +#define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) + #define isALNUM(c) isWORDCHAR(c) #define isALNUMU(c) isWORDCHAR_L1(c) #define isALPHA(c) (isUPPER(c) || isLOWER(c)) @@ -523,7 +525,6 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc || NATIVE_TO_UNI((U8) c) == 0xAA \ || NATIVE_TO_UNI((U8) c) == 0xB5 \ || NATIVE_TO_UNI((U8) c) == 0xBA))) -#define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) /* continuation character for legal NAME in \N{NAME} */ #define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0012-handy.h-should-use-EBCDIC-libc-isdigit.patch ```diff From 3e457a99a77e589980be387853e04ac014d480e6 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 15:40:21 -0600 Subject: [PATCH] handy.h: should use EBCDIC libc isdigit() as is better optimized and suitable for the purpose. --- handy.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/handy.h b/handy.h index 15a0687..342179b 100644 --- a/handy.h +++ b/handy.h @@ -529,7 +529,6 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc /* continuation character for legal NAME in \N{NAME} */ #define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) -#define isDIGIT(c) ((c) >= '0' && (c) <= '9') #define isIDFIRST(c) (isALPHA(c) || (c) == '_') #define isOCTAL(c) ((c) >= '0' && (c) <= '7') #define isSPACE_L1(c) (isSPACE(c) \ @@ -541,6 +540,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isALNUMC(c) isalnum(c) # define isBLANK(c) ((c) == ' ' || (c) == '\t' || NATIVE_TO_UNI(c) == 0xA0) # define isCNTRL(c) iscntrl(c) +# define isDIGIT(c) isdigit(c) # define isGRAPH(c) isgraph(c) # define isLOWER(c) islower(c) # define isPRINT(c) isprint(c) @@ -555,6 +555,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isALNUMC(c) (isALPHA(c) || isDIGIT(c)) # define isBLANK(c) ((c) == ' ' || (c) == '\t') # define isCNTRL(c) ((U8) (c) < ' ' || (c) == 127) +# define isDIGIT(c) ((c) >= '0' && (c) <= '9') # define isGRAPH(c) (isALNUM(c) || isPUNCT(c)) # define isLOWER(c) ((c) >= 'a' && (c) <= 'z') # define isPRINT(c) (((c) >= 32 && (c) < 127)) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0013-handy.h-Add-isFOO_A-macros-for-ASCII-range-matche.patch ```diff From 2df6cc5e9b743a32f8f70785250fe30c13465d95 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 20:27:32 -0600 Subject: [PATCH] handy.h: Add isFOO_A() macros for ASCII range matches These macros return true only if the parameter is an ASCII character. --- handy.h | 90 ++++++++++++++++++++++++++++++++++++++++++++------------------ 1 files changed, 64 insertions(+), 26 deletions(-) diff --git a/handy.h b/handy.h index 342179b..b2c4ced 100644 --- a/handy.h +++ b/handy.h @@ -510,31 +510,70 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define FITS_IN_8_BITS(c) ((sizeof(c) == 1) || (((U32)(c) & 0xFF) == (U32)(c))) #endif -#define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) +#define isASCII(c) (FITS_IN_8_BITS(c) ? NATIVE_TO_UNI((U8) c) <= 127 : 0) +#define isASCII_A(c) isASCII(c) -#define isALNUM(c) isWORDCHAR(c) -#define isALNUMU(c) isWORDCHAR_L1(c) -#define isALPHA(c) (isUPPER(c) || isLOWER(c)) +/* ASCII range only */ +#ifdef EBCDIC +# define isALNUMC_A(c) (isASCII(c) && isALNUMC(c)) +# define isALPHA_A(c) (isASCII(c) && isALPHA(c)) +# define isBLANK_A(c) (isASCII(c) && isBLANK(c)) +# define isCNTRL_A(c) (isASCII(c) && isCNTRL(c)) +# define isDIGIT_A(c) (isASCII(c) && isDIGIT(c)) +# define isGRAPH_A(c) (isASCII(c) && isGRAPH(c)) +# define isLOWER_A(c) (isASCII(c) && isLOWER(c)) +# define isPRINT_A(c) (isASCII(c) && isPRINT(c)) +# define isPSXSPC_A(c) (isASCII(c) && isPSXSPC(c)) +# define isPUNCT_A(c) (isASCII(c) && isPUNCT(c)) +# define isSPACE_A(c) (isASCII(c) && isSPACE(c)) +# define isUPPER_A(c) (isASCII(c) && isUPPER(c)) +# define isWORDCHAR_A(c) (isASCII(c) && isWORDCHAR(c)) +# define isXDIGIT_A(c) (isASCII(c) && isXDIGIT(c)) +#else /* ASCII */ +# define isALNUMC_A(c) (isALPHA_A(c) || isDIGIT_A(c)) +# define isALPHA_A(c) (isUPPER_A(c) || isLOWER_A(c)) +# define isBLANK_A(c) ((c) == ' ' || (c) == '\t') +# define isCNTRL_A(c) (FITS_IN_8_BITS(c) ? ((U8) (c) < ' ' || (c) == 127) : 0) +# define isDIGIT_A(c) ((c) >= '0' && (c) <= '9') +# define isGRAPH_A(c) (isWORDCHAR_A(c) || isPUNCT_A(c)) +# define isLOWER_A(c) ((c) >= 'a' && (c) <= 'z') +# define isPRINT_A(c) (((c) >= 32 && (c) < 127)) +# define isPSXSPC_A(c) (isSPACE_A(c) || (c) == '\v') +# define isPUNCT_A(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) +# define isSPACE_A(c) ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' \ + || (c) == '\f') +# define isUPPER_A(c) ((c) >= 'A' && (c) <= 'Z') +# define isWORDCHAR_A(c) (isALPHA_A(c) || isDIGIT_A(c) || (c) == '_') +# define isXDIGIT_A(c) (isDIGIT_A(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) +#endif + +/* Latin1 definitions */ /* ALPHAU includes Unicode semantics for latin1 characters. It has an extra * >= AA test to speed up ASCII-only tests at the expense of the others */ /* XXX decide whether to document the ALPHAU, ALNUMU and isSPACE_L1 functions. * Most of these should be implemented as table lookup for speed */ -#define isALPHAU(c) (isALPHA(c) || (NATIVE_TO_UNI((U8) c) >= 0xAA \ - && ((NATIVE_TO_UNI((U8) c) >= 0xC0 \ +#define isALPHAU(c) (isALPHA_A(c) || (NATIVE_TO_UNI((U8) c) >= 0xAA \ + && ((NATIVE_TO_UNI((U8) c) >= 0xC0 \ && NATIVE_TO_UNI((U8) c) != 0xD7 && NATIVE_TO_UNI((U8) c) != 0xF7) \ || NATIVE_TO_UNI((U8) c) == 0xAA \ || NATIVE_TO_UNI((U8) c) == 0xB5 \ || NATIVE_TO_UNI((U8) c) == 0xBA))) +#define isSPACE_L1(c) (isSPACE(c) \ + || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) +#define isWORDCHAR_L1(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') +/* Same macro in non-EBCDIC and EBCDIC. Called macros may evaluate + * differently between the two */ +#define isALNUM(c) isWORDCHAR(c) +#define isALNUMU(c) isWORDCHAR_L1(c) +#define isALPHA(c) (isUPPER(c) || isLOWER(c)) /* continuation character for legal NAME in \N{NAME} */ -#define isCHARNAME_CONT(c) (isALNUMU(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) - +#define isCHARNAME_CONT(c) (isWORDCHAR_L1(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) #define isIDFIRST(c) (isALPHA(c) || (c) == '_') -#define isOCTAL(c) ((c) >= '0' && (c) <= '7') -#define isSPACE_L1(c) (isSPACE(c) \ - || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) +#define isOCTAL_A(c) ((c) >= '0' && (c) <= '7') +#define isOCTAL(c) isOCTAL_A(c) #define isWORDCHAR(c) (isALPHA(c) || isDIGIT(c) || (c) == '_') -#define isWORDCHAR_L1(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') + #ifdef EBCDIC /* In EBCDIC we do not do locales: therefore can use native functions */ # define isALNUMC(c) isalnum(c) @@ -551,20 +590,19 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isXDIGIT(c) isxdigit(c) # define toLOWER(c) tolower(c) # define toUPPER(c) toupper(c) -#else /* Not EBCDIC */ -# define isALNUMC(c) (isALPHA(c) || isDIGIT(c)) -# define isBLANK(c) ((c) == ' ' || (c) == '\t') -# define isCNTRL(c) ((U8) (c) < ' ' || (c) == 127) -# define isDIGIT(c) ((c) >= '0' && (c) <= '9') -# define isGRAPH(c) (isALNUM(c) || isPUNCT(c)) -# define isLOWER(c) ((c) >= 'a' && (c) <= 'z') -# define isPRINT(c) (((c) >= 32 && (c) < 127)) -# define isPSXSPC(c) (isSPACE(c) || (c) == '\v') -# define isPUNCT(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) -# define isSPACE(c) \ - ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') -# define isUPPER(c) ((c) >= 'A' && (c) <= 'Z') -# define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) +#else /* Not EBCDIC: ASCII-only matching */ +# define isALNUMC(c) isALNUMC_A(c) +# define isBLANK(c) isBLANK_A(c) +# define isCNTRL(c) isCNTRL_A(c) +# define isDIGIT(c) isDIGIT_A(c) +# define isGRAPH(c) isGRAPH_A(c) +# define isLOWER(c) isLOWER_A(c) +# define isPRINT(c) isPRINT_A(c) +# define isPSXSPC(c) isPSXSPC_A(c) +# define isPUNCT(c) isPUNCT_A(c) +# define isSPACE(c) isSPACE_A(c) +# define isUPPER(c) isUPPER_A(c) +# define isXDIGIT(c) isXDIGIT_A(c) /* ASCII casing. */ # define toLOWER(c) (isUPPER(c) ? (c) + ('a' - 'A') : (c)) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0014-handy.h-EBCDIC-should-use-native-isalpha.patch ```diff From 28fb8f73aba68a0f357a0f27ad95a34a3186457a Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 20:42:40 -0600 Subject: [PATCH] handy.h: EBCDIC should use native isalpha() --- handy.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/handy.h b/handy.h index b2c4ced..8e929ef 100644 --- a/handy.h +++ b/handy.h @@ -566,7 +566,6 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc * differently between the two */ #define isALNUM(c) isWORDCHAR(c) #define isALNUMU(c) isWORDCHAR_L1(c) -#define isALPHA(c) (isUPPER(c) || isLOWER(c)) /* continuation character for legal NAME in \N{NAME} */ #define isCHARNAME_CONT(c) (isWORDCHAR_L1(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) #define isIDFIRST(c) (isALPHA(c) || (c) == '_') @@ -577,6 +576,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc #ifdef EBCDIC /* In EBCDIC we do not do locales: therefore can use native functions */ # define isALNUMC(c) isalnum(c) +# define isALPHA(c) isalpha(c) # define isBLANK(c) ((c) == ' ' || (c) == '\t' || NATIVE_TO_UNI(c) == 0xA0) # define isCNTRL(c) iscntrl(c) # define isDIGIT(c) isdigit(c) @@ -592,6 +592,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define toUPPER(c) toupper(c) #else /* Not EBCDIC: ASCII-only matching */ # define isALNUMC(c) isALNUMC_A(c) +# define isALPHA(c) isALPHA_A(c) # define isBLANK(c) isBLANK_A(c) # define isCNTRL(c) isCNTRL_A(c) # define isDIGIT(c) isDIGIT_A(c) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0015-Subject-Add-256-word-bit-table-of-character-classes.patch ```diff From d4ca7fe7f47d027350da70b00b8ad199239b59fe Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 20:47:03 -0600 Subject: [PATCH] Subject: Add 256 word bit table of character classes This patch adds a table for looking up character classes. It is 256 words long, in perl.h, with each word corresponding to the ordinal of a Latin1 character, and each word contains a bit map of all the properties that character matches. Each property has a bit or two. Ones named _CC_property_A are true only if the character is also in the ASCII character set. Ones named CC_property_L1 do not have this restriction. (L1 stands for Latin1.) Also added is a script that generates the table. It is not anticipated that this will need to be used often. --- MANIFEST | 1 + perl.h | 297 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 298 insertions(+), 0 deletions(-) diff --git a/MANIFEST b/MANIFEST index c935b4e..011943f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -4183,6 +4183,7 @@ Porting/makerel Release making utility Porting/make_snapshot.pl Make a tgz snapshot of our tree with a .patch file in it Porting/manicheck Check against MANIFEST Porting/manisort Sort the MANIFEST +Porting/mk_PL_charclass.pl Populate the PL_charclass table Porting/newtests-perldelta.pl Generate Perldelta stub for newly added tests Porting/perldelta_template.pod Template for creating new perldelta.pod files Porting/perlhist_calculate.pl Perform calculations to update perlhist diff --git a/perl.h b/perl.h index ccf89ad..6a2d835 100644 --- a/perl.h +++ b/perl.h @@ -4252,6 +4252,42 @@ extern char ** environ; /* environment variables supplied via exec */ # endif #endif +/* Bits for PL_charclass[] */ +#define _CC_ALNUMC_A (1<<0) +#define _CC_ALNUMC_L1 (1<<1) +#define _CC_ALPHA_A (1<<2) +#define _CC_ALPHA_L1 (1<<3) +#define _CC_BLANK_A (1<<4) +#define _CC_BLANK_L1 (1<<5) +#define _CC_CHARNAME_CONT (1<<6) +#define _CC_CNTRL_A (1<<7) +#define _CC_CNTRL_L1 (1<<8) +#define _CC_DIGIT_A (1<<9) +#define _CC_GRAPH_A (1<<10) +#define _CC_GRAPH_L1 (1<<11) +#define _CC_IDFIRST_A (1<<12) +#define _CC_IDFIRST_L1 (1<<13) +#define _CC_LOWER_A (1<<14) +#define _CC_LOWER_L1 (1<<15) +#define _CC_OCTAL_A (1<<16) +#define _CC_PRINT_A (1<<17) +#define _CC_PRINT_L1 (1<<18) +#define _CC_PSXSPC_A (1<<19) +#define _CC_PSXSPC_L1 (1<<20) +#define _CC_PUNCT_A (1<<21) +#define _CC_PUNCT_L1 (1<<22) +#define _CC_SPACE_A (1<<23) +#define _CC_SPACE_L1 (1<<24) +#define _CC_UPPER_A (1<<25) +#define _CC_UPPER_L1 (1<<26) +#define _CC_WORDCHAR_A (1<<27) +#define _CC_WORDCHAR_L1 (1<<28) +#define _CC_XDIGIT_A (1<<29) +/* Unused + * (1<<30) + * (1<<31) + */ + START_EXTERN_C /* handy constants */ @@ -4469,10 +4505,271 @@ EXTCONST unsigned char PL_mod_latin1_uc[] = { 240-32, 241-32, 242-32, 243-32, 244-32, 245-32, 246-32, 247, 248-32, 249-32, 250-32, 251-32, 252-32, 253-32, 254-32, 255 }; + +EXTCONST U32 PL_charclass[] = { +/* !! MODIFY AND USE Porting/mk_PL_charclass.pl TO CHANGE THIS TABLE !! */ +/* U+00 NUL */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+01 SOH */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+02 STX */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+03 ETX */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+04 EOT */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+05 ENQ */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+06 ACK */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+07 BEL */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+08 BS */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+09 HT */ _CC_BLANK_A|_CC_BLANK_L1|_CC_CNTRL_A|_CC_CNTRL_L1|_CC_PSXSPC_A|_CC_PSXSPC_L1|_CC_SPACE_A|_CC_SPACE_L1, +/* U+0A LF */ _CC_CNTRL_A|_CC_CNTRL_L1|_CC_PSXSPC_A|_CC_PSXSPC_L1|_CC_SPACE_A|_CC_SPACE_L1, +/* U+0B VT */ _CC_CNTRL_A|_CC_CNTRL_L1|_CC_PSXSPC_A|_CC_PSXSPC_L1, +/* U+0C FF */ _CC_CNTRL_A|_CC_CNTRL_L1|_CC_PSXSPC_A|_CC_PSXSPC_L1|_CC_SPACE_A|_CC_SPACE_L1, +/* U+0D CR */ _CC_CNTRL_A|_CC_CNTRL_L1|_CC_PSXSPC_A|_CC_PSXSPC_L1|_CC_SPACE_A|_CC_SPACE_L1, +/* U+0E SO */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+0F SI */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+10 DLE */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+11 DC1 */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+12 DC2 */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+13 DC3 */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+14 DC4 */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+15 NAK */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+16 SYN */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+17 ETB */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+18 CAN */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+19 EOM */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+1A SUB */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+1B ESC */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+1C FS */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+1D GS */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+1E RS */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+1F US */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+20 SPACE */ _CC_BLANK_A|_CC_BLANK_L1|_CC_CHARNAME_CONT|_CC_PRINT_A|_CC_PRINT_L1|_CC_PSXSPC_A|_CC_PSXSPC_L1|_CC_SPACE_A|_CC_SPACE_L1, +/* U+21 '!' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+22 '"' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+23 '#' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+24 '$' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+25 '%' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+26 '&' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+27 ''' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+28 '(' */ _CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+29 ')' */ _CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+2A '*' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+2B '+' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+2C ',' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+2D '-' */ _CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+2E '.' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+2F '/' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+30 '0' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+31 '1' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+32 '2' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+33 '3' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+34 '4' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+35 '5' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+36 '6' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+37 '7' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_OCTAL_A|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+38 '8' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+39 '9' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_CHARNAME_CONT|_CC_DIGIT_A|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+3A ':' */ _CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+3B ';' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+3C '<' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+3D '=' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+3E '>' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+3F '?' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+40 '@' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+41 'A' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+42 'B' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+43 'C' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+44 'D' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+45 'E' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+46 'F' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+47 'G' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+48 'H' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+49 'I' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+4A 'J' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+4B 'K' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+4C 'L' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+4D 'M' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+4E 'N' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+4F 'O' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+50 'P' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+51 'Q' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+52 'R' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+53 'S' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+54 'T' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+55 'U' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+56 'V' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+57 'W' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+58 'X' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+59 'Y' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+5A 'Z' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_UPPER_A|_CC_UPPER_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+5B '[' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+5C '\' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+5D ']' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+5E '^' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+5F '_' */ _CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+60 '`' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+61 'a' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+62 'b' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+63 'c' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+64 'd' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+65 'e' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+66 'f' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1|_CC_XDIGIT_A, +/* U+67 'g' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+68 'h' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+69 'i' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+6A 'j' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+6B 'k' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+6C 'l' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+6D 'm' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+6E 'n' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+6F 'o' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+70 'p' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+71 'q' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+72 'r' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+73 's' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+74 't' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+75 'u' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+76 'v' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+77 'w' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+78 'x' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+79 'y' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+7A 'z' */ _CC_ALNUMC_A|_CC_ALNUMC_L1|_CC_ALPHA_A|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_A|_CC_GRAPH_L1|_CC_IDFIRST_A|_CC_IDFIRST_L1|_CC_LOWER_A|_CC_LOWER_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_WORDCHAR_A|_CC_WORDCHAR_L1, +/* U+7B '{' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+7C '|' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+7D '}' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+7E '~' */ _CC_GRAPH_A|_CC_GRAPH_L1|_CC_PRINT_A|_CC_PRINT_L1|_CC_PUNCT_A|_CC_PUNCT_L1, +/* U+7F DEL */ _CC_CNTRL_A|_CC_CNTRL_L1, +/* U+80 PAD */ _CC_CNTRL_L1, +/* U+81 HOP */ _CC_CNTRL_L1, +/* U+82 BPH */ _CC_CNTRL_L1, +/* U+83 NBH */ _CC_CNTRL_L1, +/* U+84 IND */ _CC_CNTRL_L1, +/* U+85 NEL */ _CC_CNTRL_L1|_CC_PSXSPC_L1|_CC_SPACE_L1, +/* U+86 SSA */ _CC_CNTRL_L1, +/* U+87 ESA */ _CC_CNTRL_L1, +/* U+88 HTS */ _CC_CNTRL_L1, +/* U+89 HTJ */ _CC_CNTRL_L1, +/* U+8A VTS */ _CC_CNTRL_L1, +/* U+8B PLD */ _CC_CNTRL_L1, +/* U+8C PLU */ _CC_CNTRL_L1, +/* U+8D RI */ _CC_CNTRL_L1, +/* U+8E SS2 */ _CC_CNTRL_L1, +/* U+8F SS3 */ _CC_CNTRL_L1, +/* U+90 DCS */ _CC_CNTRL_L1, +/* U+91 PU1 */ _CC_CNTRL_L1, +/* U+92 PU2 */ _CC_CNTRL_L1, +/* U+93 STS */ _CC_CNTRL_L1, +/* U+94 CCH */ _CC_CNTRL_L1, +/* U+95 MW */ _CC_CNTRL_L1, +/* U+96 SPA */ _CC_CNTRL_L1, +/* U+97 EPA */ _CC_CNTRL_L1, +/* U+98 SOS */ _CC_CNTRL_L1, +/* U+99 SGC */ _CC_CNTRL_L1, +/* U+9A SCI */ _CC_CNTRL_L1, +/* U+9B CSI */ _CC_CNTRL_L1, +/* U+9C ST */ _CC_CNTRL_L1, +/* U+9D OSC */ _CC_CNTRL_L1, +/* U+9E PM */ _CC_CNTRL_L1, +/* U+9F APC */ _CC_CNTRL_L1, +/* U+A0 NO-BREAK SPACE */ _CC_BLANK_L1|_CC_CHARNAME_CONT|_CC_PRINT_L1|_CC_PSXSPC_L1|_CC_SPACE_L1, +/* U+A1 INVERTED EXCLAMATION MARK */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1, +/* U+A2 CENT SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A3 POUND SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A4 CURRENCY SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A5 YEN SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A6 BROKEN BAR */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A7 SECTION SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A8 DIAERESIS */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+A9 COPYRIGHT SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+AA FEMININE ORDINAL INDICATOR */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1, +/* U+AC NOT SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+AD SOFT HYPHEN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+AE REGISTERED SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+AF MACRON */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B0 DEGREE SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B1 PLUS-MINUS SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B2 SUPERSCRIPT TWO */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B3 SUPERSCRIPT THREE */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B4 ACUTE ACCENT */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B5 MICRO SIGN */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+B6 PILCROW SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B7 MIDDLE DOT */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1, +/* U+B8 CEDILLA */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+B9 SUPERSCRIPT ONE */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+BA MASCULINE ORDINAL INDICATOR */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1, +/* U+BC VULGAR FRACTION ONE QUARTER */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+BD VULGAR FRACTION ONE HALF */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+BE VULGAR FRACTION THREE QUARTERS */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+BF INVERTED QUESTION MARK */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1, +/* U+C0 A WITH GRAVE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C1 A WITH ACUTE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C2 A WITH CIRCUMFLEX */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C3 A WITH TILDE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C4 A WITH DIAERESIS */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C5 A WITH RING ABOVE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C6 AE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C7 C WITH CEDILLA */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C8 E WITH GRAVE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+C9 E WITH ACUTE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+CA E WITH CIRCUMFLEX */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+CB E WITH DIAERESIS */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+CC I WITH GRAVE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+CD I WITH ACUTE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+CE I WITH CIRCUMFLEX */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+CF I WITH DIAERESIS */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D0 ETH */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D1 N WITH TILDE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D2 O WITH GRAVE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D3 O WITH ACUTE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D4 O WITH CIRCUMFLEX */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D5 O WITH TILDE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D6 O WITH DIAERESIS */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D7 MULTIPLICATION SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+D8 O WITH STROKE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+D9 U WITH GRAVE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+DA U WITH ACUTE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+DB U WITH CIRCUMFLEX */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+DC U WITH DIAERESIS */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+DD Y WITH ACUTE */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+DE THORN */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_PRINT_L1|_CC_UPPER_L1|_CC_WORDCHAR_L1, +/* U+DF sharp s */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E0 a with grave */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E1 a with acute */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E2 a with circumflex */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E3 a with tilde */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E4 a with diaeresis */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E5 a with ring above */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E6 ae */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E7 c with cedilla */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E8 e with grave */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+E9 e with acute */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+EA e with circumflex */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+EB e with diaeresis */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+EC i with grave */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+ED i with acute */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+EE i with circumflex */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+EF i with diaeresis */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F0 eth */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F1 n with tilde */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F2 o with grave */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F3 o with acute */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F4 o with circumflex */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F5 o with tilde */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F6 o with diaeresis */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F7 DIVISION SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1, +/* U+F8 o with stroke */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+F9 u with grave */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+FA u with acute */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+FB u with circumflex */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+FC u with diaeresis */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+FD y with acute */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+FE thorn */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +/* U+FF y with diaeresis */ _CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1, +}; #else /* ! DOINIT */ EXTCONST unsigned char PL_fold[]; EXTCONST unsigned char PL_mod_latin1_uc[]; EXTCONST unsigned char PL_latin1_lc[]; +EXTCONST U32 PL_charclass[]; #endif #ifndef PERL_GLOBAL_STRUCT /* or perlvars.h */ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0016-handy.h-Change-isFOO_A-to-be-O-1-performance.patch ```diff From 4e4f32f83a9f3cf51440baca15236f9028c76ae9 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 21:04:58 -0600 Subject: [PATCH] handy.h: Change isFOO_A() to be O(1) performance This patch changes the macros whose names end in _A to use table lookup except for the one (isASCII) which always has only one comparison. The table is in perl.h. Some code does not #include perl.h. For those, the previous definitions are retained. The advantage of this is speed. It replaces some fairly complicated expressions with an O(1) look-up and a mask. It uses the FITS_IN_8_BITS() macro to guarantee that the table bounds are not exceeded. For legal inputs that are byte size, the optimizer should get rid of this macro leaving only the lookup and mask. --- handy.h | 78 ++++++++++++++++++++++++++++++++++++++------------------------- 1 files changed, 47 insertions(+), 31 deletions(-) diff --git a/handy.h b/handy.h index 8e929ef..124e8ea 100644 --- a/handy.h +++ b/handy.h @@ -514,37 +514,53 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc #define isASCII_A(c) isASCII(c) /* ASCII range only */ -#ifdef EBCDIC -# define isALNUMC_A(c) (isASCII(c) && isALNUMC(c)) -# define isALPHA_A(c) (isASCII(c) && isALPHA(c)) -# define isBLANK_A(c) (isASCII(c) && isBLANK(c)) -# define isCNTRL_A(c) (isASCII(c) && isCNTRL(c)) -# define isDIGIT_A(c) (isASCII(c) && isDIGIT(c)) -# define isGRAPH_A(c) (isASCII(c) && isGRAPH(c)) -# define isLOWER_A(c) (isASCII(c) && isLOWER(c)) -# define isPRINT_A(c) (isASCII(c) && isPRINT(c)) -# define isPSXSPC_A(c) (isASCII(c) && isPSXSPC(c)) -# define isPUNCT_A(c) (isASCII(c) && isPUNCT(c)) -# define isSPACE_A(c) (isASCII(c) && isSPACE(c)) -# define isUPPER_A(c) (isASCII(c) && isUPPER(c)) -# define isWORDCHAR_A(c) (isASCII(c) && isWORDCHAR(c)) -# define isXDIGIT_A(c) (isASCII(c) && isXDIGIT(c)) -#else /* ASCII */ -# define isALNUMC_A(c) (isALPHA_A(c) || isDIGIT_A(c)) -# define isALPHA_A(c) (isUPPER_A(c) || isLOWER_A(c)) -# define isBLANK_A(c) ((c) == ' ' || (c) == '\t') -# define isCNTRL_A(c) (FITS_IN_8_BITS(c) ? ((U8) (c) < ' ' || (c) == 127) : 0) -# define isDIGIT_A(c) ((c) >= '0' && (c) <= '9') -# define isGRAPH_A(c) (isWORDCHAR_A(c) || isPUNCT_A(c)) -# define isLOWER_A(c) ((c) >= 'a' && (c) <= 'z') -# define isPRINT_A(c) (((c) >= 32 && (c) < 127)) -# define isPSXSPC_A(c) (isSPACE_A(c) || (c) == '\v') -# define isPUNCT_A(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) -# define isSPACE_A(c) ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' \ - || (c) == '\f') -# define isUPPER_A(c) ((c) >= 'A' && (c) <= 'Z') -# define isWORDCHAR_A(c) (isALPHA_A(c) || isDIGIT_A(c) || (c) == '_') -# define isXDIGIT_A(c) (isDIGIT_A(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) +#ifdef H_PERL /* If have access to perl.h, lookup in its table */ +# define isALNUMC_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_ALNUMC_A)) +# define isALPHA_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_ALPHA_A)) +# define isBLANK_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_BLANK_A)) +# define isCNTRL_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_CNTRL_A)) +# define isDIGIT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_DIGIT_A)) +# define isGRAPH_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_GRAPH_A)) +# define isLOWER_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_LOWER_A)) +# define isPRINT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PRINT_A)) +# define isPSXSPC_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PSXSPC_A)) +# define isPUNCT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PUNCT_A)) +# define isSPACE_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_SPACE_A)) +# define isUPPER_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_UPPER_A)) +# define isWORDCHAR_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_WORDCHAR_A)) +# define isXDIGIT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_XDIGIT_A)) +#else /* No perl.h. */ +# ifdef EBCDIC +# define isALNUMC_A(c) (isASCII(c) && isALNUMC(c)) +# define isALPHA_A(c) (isASCII(c) && isALPHA(c)) +# define isBLANK_A(c) (isASCII(c) && isBLANK(c)) +# define isCNTRL_A(c) (isASCII(c) && isCNTRL(c)) +# define isDIGIT_A(c) (isASCII(c) && isDIGIT(c)) +# define isGRAPH_A(c) (isASCII(c) && isGRAPH(c)) +# define isLOWER_A(c) (isASCII(c) && isLOWER(c)) +# define isPRINT_A(c) (isASCII(c) && isPRINT(c)) +# define isPSXSPC_A(c) (isASCII(c) && isPSXSPC(c)) +# define isPUNCT_A(c) (isASCII(c) && isPUNCT(c)) +# define isSPACE_A(c) (isASCII(c) && isSPACE(c)) +# define isUPPER_A(c) (isASCII(c) && isUPPER(c)) +# define isWORDCHAR_A(c) (isASCII(c) && isWORDCHAR(c)) +# define isXDIGIT_A(c) (isASCII(c) && isXDIGIT(c)) +# else /* ASCII */ +# define isALNUMC_A(c) (isALPHA_A(c) || isDIGIT_A(c)) +# define isALPHA_A(c) (isUPPER_A(c) || isLOWER_A(c)) +# define isBLANK_A(c) ((c) == ' ' || (c) == '\t') +# define isCNTRL_A(c) (FITS_IN_8_BITS(c) ? ((U8) (c) < ' ' || (c) == 127) : 0) +# define isDIGIT_A(c) ((c) >= '0' && (c) <= '9') +# define isGRAPH_A(c) (isWORDCHAR_A(c) || isPUNCT_A(c)) +# define isLOWER_A(c) ((c) >= 'a' && (c) <= 'z') +# define isPRINT_A(c) (((c) >= 32 && (c) < 127)) +# define isPSXSPC_A(c) (isSPACE_A(c) || (c) == '\v') +# define isPUNCT_A(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) +# define isSPACE_A(c) ((c) == ' ' || (c) == '\t' || (c) == '\n' || (c) =='\r' || (c) == '\f') +# define isUPPER_A(c) ((c) >= 'A' && (c) <= 'Z') +# define isWORDCHAR_A(c) (isALPHA_A(c) || isDIGIT_A(c) || (c) == '_') +# define isXDIGIT_A(c) (isDIGIT_A(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) +# endif #endif /* Latin1 definitions */ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0017-handy.h-alphabetize-pod-entries.patch ```diff From e2e3cde4385000b15c24ed4e82a7ce92d33eba81 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 21:12:51 -0600 Subject: [PATCH] handy.h: alphabetize pod entries There are a number of macros missing from the documentation. This helps me figure out which ones. --- handy.h | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/handy.h b/handy.h index 124e8ea..48ab4f6 100644 --- a/handy.h +++ b/handy.h @@ -454,26 +454,26 @@ character set. Returns a boolean indicating whether the C C is an alphabetic character in the platform's native character set. -=for apidoc Am|bool|isSPACE|char ch -Returns a boolean indicating whether the C C is a -whitespace character in the platform's native character set. - =for apidoc Am|bool|isDIGIT|char ch Returns a boolean indicating whether the C C is a digit in the platform's native character set. +=for apidoc Am|bool|isLOWER|char ch +Returns a boolean indicating whether the C C is a +lowercase character in the platform's native character set. + =for apidoc Am|bool|isOCTAL|char ch Returns a boolean indicating whether the C C is an octal digit, [0-7] in the platform's native character set. +=for apidoc Am|bool|isSPACE|char ch +Returns a boolean indicating whether the C C is a +whitespace character in the platform's native character set. + =for apidoc Am|bool|isUPPER|char ch Returns a boolean indicating whether the C C is an uppercase character in the platform's native character set. -=for apidoc Am|bool|isLOWER|char ch -Returns a boolean indicating whether the C C is a -lowercase character in the platform's native character set. - =head1 Character case changing =for apidoc Am|char|toUPPER|char ch -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0018-handy.h-Slightly-change-the-pod.patch ```diff From 4108fa707341dc063579e1dd07ac0b9a39b49bbf Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 21:21:20 -0600 Subject: [PATCH] handy.h: Slightly change the pod --- handy.h | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/handy.h b/handy.h index 48ab4f6..d663039 100644 --- a/handy.h +++ b/handy.h @@ -446,32 +446,32 @@ knows about all have 8-bit characters, so most of these functions will return true for more characters than on ASCII platforms. =for apidoc Am|bool|isALNUM|char ch -Returns a boolean indicating whether the C C is an -alphanumeric character (including underscore) or digit in the platform's native +Returns a boolean indicating whether the specified character is an +alphanumeric character (including underscore) in the platform's native character set. =for apidoc Am|bool|isALPHA|char ch -Returns a boolean indicating whether the C C is an +Returns a boolean indicating whether the specified character is an alphabetic character in the platform's native character set. =for apidoc Am|bool|isDIGIT|char ch -Returns a boolean indicating whether the C C is a +Returns a boolean indicating whether the specified character is a digit in the platform's native character set. =for apidoc Am|bool|isLOWER|char ch -Returns a boolean indicating whether the C C is a +Returns a boolean indicating whether the specified character is a lowercase character in the platform's native character set. =for apidoc Am|bool|isOCTAL|char ch -Returns a boolean indicating whether the C C is an +Returns a boolean indicating whether the specified character is an octal digit, [0-7] in the platform's native character set. =for apidoc Am|bool|isSPACE|char ch -Returns a boolean indicating whether the C C is a +Returns a boolean indicating whether the specified character is a whitespace character in the platform's native character set. =for apidoc Am|bool|isUPPER|char ch -Returns a boolean indicating whether the C C is an +Returns a boolean indicating whether the specified character is an uppercase character in the platform's native character set. =head1 Character case changing -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0019-handy.h-Make-isWORDCHAR-primary-documentation.patch ```diff From 1da543dc2d168e471019494f375e41bf554813c3 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 21:26:47 -0600 Subject: [PATCH] handy.h: Make isWORDCHAR() primary documentation This macro is clearer as to intent over isALNUM, and isn't confusable with isALNUMC. So document it primarily. --- handy.h | 13 ++++++++----- 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/handy.h b/handy.h index d663039..a93e1a0 100644 --- a/handy.h +++ b/handy.h @@ -445,11 +445,6 @@ platforms, they use the code page of the platform. The code pages that Perl knows about all have 8-bit characters, so most of these functions will return true for more characters than on ASCII platforms. -=for apidoc Am|bool|isALNUM|char ch -Returns a boolean indicating whether the specified character is an -alphanumeric character (including underscore) in the platform's native -character set. - =for apidoc Am|bool|isALPHA|char ch Returns a boolean indicating whether the specified character is an alphabetic character in the platform's native character set. @@ -474,6 +469,14 @@ whitespace character in the platform's native character set. Returns a boolean indicating whether the specified character is an uppercase character in the platform's native character set. +=for apidoc Am|bool|isWORDCHAR|char ch +Returns a boolean indicating whether the specified character is a +character that is any of: alphabetic, numeric, or an underscore. This is the +same as what C<\w> matches in a regular expression. +C is a synonym provided for backward compatibility. Note that it +does not have the standard C language meaning of alphanumeric, since it matches +an underscore and the standard meaning does not. + =head1 Character case changing =for apidoc Am|char|toUPPER|char ch -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0020-Add-mk_PL_charclass.pl.patch ```diff From b79dd88fa7c5bbe3440b7e73927d51d12959c4c6 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 21:54:31 -0600 Subject: [PATCH] Add mk_PL_charclass.pl I forgot to add this in an earlier commit --- Porting/mk_PL_charclass.pl | 200 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 200 insertions(+), 0 deletions(-) create mode 100644 Porting/mk_PL_charclass.pl diff --git a/Porting/mk_PL_charclass.pl b/Porting/mk_PL_charclass.pl new file mode 100644 index 0000000..a23d611 --- /dev/null +++ b/Porting/mk_PL_charclass.pl @@ -0,0 +1,200 @@ +#!perl -w +use 5.012; +use strict; +use warnings; + +# This program outputs the 256 lines that form the guts of the PL_charclass +# table. The output should be used to manually replace the table contents in +# perl.h. Each line is a bit map of properties that the Unicode code point at +# the corresponding position in the table array has. The first line +# corresponds to code point U+0000, NULL, the last line to U=00FF. For an +# application to see if the code point "i" has a particular property, it just +# does +# 'PL_charclass[i] & BIT' +# The bit names are of the form '_CC_property_suffix', where 'CC' stands for +# character class, and 'property' is the corresponding property, and 'suffix' +# is one of '_A' to mean the property is true only if the corresponding code +# point is ASCII, and '_L1' means that the range includes any Latin1 +# character (ISO-8859-1 including the C0 and C1 controls). A property without +# these suffixes does not have different forms for both ranges. + +# The data in the table is pretty well set in stone, so that this program need +# be run only when adding new properties to it. + +my @properties = qw( + ALNUMC_A + ALNUMC_L1 + ALPHA_A + ALPHA_L1 + BLANK_A + BLANK_L1 + CHARNAME_CONT + CNTRL_A + CNTRL_L1 + DIGIT_A + GRAPH_A + GRAPH_L1 + IDFIRST_A + IDFIRST_L1 + LOWER_A + LOWER_L1 + OCTAL_A + PRINT_A + PRINT_L1 + PSXSPC_A + PSXSPC_L1 + PUNCT_A + PUNCT_L1 + SPACE_A + SPACE_L1 + UPPER_A + UPPER_L1 + WORDCHAR_A + WORDCHAR_L1 + XDIGIT_A +); + +my @bits; # Bit map for each code point + +for my $ord (0..255) { + my $char = chr($ord); + utf8::upgrade($char); # Important to use Unicode semantics! + for my $property (@properties) { + my $name = $property; + + # The property name that corresponds to this doesn't have a suffix. + # If is a latin1 version, no further checking is needed. + if (! ($name =~ s/_L1$//)) { + + # Here, isn't an L1. It's either a special one or the suffix ends + # in _A. In the latter case, it's automatically false for + # non-ascii. The one current special is valid over the whole range. + next if $name =~ s/_A$// && $ord >= 128; + + } + my $re; + if ($name eq 'PUNCT') {; + + # Sadly, this is inconsistent: \pP and \pS for the ascii range, + # just \pP outside it. + $re = qr/\p{Punct}|[^\P{Symbol}\P{ASCII}]/; + } elsif ($name eq 'CHARNAME_CONT') {; + $re = qr/[-\w ():\xa0]/; + } elsif ($name eq 'SPACE') {; + $re = qr/\s/; + } elsif ($name eq 'IDFIRST') { + $re = qr/[_\p{Alpha}]/; + } elsif ($name eq 'PSXSPC') { + $re = qr/[\v\p{Space}]/; + } elsif ($name eq 'WORDCHAR') { + $re = qr/\w/; + } elsif ($name eq 'ALNUMC') { + # Like \w, but no underscore + $re = qr/[^_\W]/; + } elsif ($name eq 'OCTAL') { + $re = qr/[0-7]/; + } else { # The remainder have the same name and values as Unicode + $re = eval "qr/\\p{$name}/"; + use Carp; + carp $@ if ! defined $re; + } + #print "$ord, $name $property, $re\n"; + if ($char =~ $re) { # Add this property if matches + $bits[$ord] .= '|' if $bits[$ord]; + $bits[$ord] .= "_CC_$property"; + } + } + #print __LINE__, " $ord $char $bits[$ord]\n"; +} + +# Names of C0 controls +my @C0 = qw ( + NUL + SOH + STX + ETX + EOT + ENQ + ACK + BEL + BS + HT + LF + VT + FF + CR + SO + SI + DLE + DC1 + DC2 + DC3 + DC4 + NAK + SYN + ETB + CAN + EOM + SUB + ESC + FS + GS + RS + US + ); + +# Names of C1 controls, plus the adjacent DEL +my @C1 = qw( + DEL + PAD + HOP + BPH + NBH + IND + NEL + SSA + ESA + HTS + HTJ + VTS + PLD + PLU + RI + SS2 + SS3 + DCS + PU1 + PU2 + STS + CCH + MW + SPA + EPA + SOS + SGC + SCI + CSI + ST + OSC + PM + APC + ); + +# Output the table using fairly short names for each char. +for my $ord (0..255) { + my $name; + if ($ord < 32) { # A C0 control + $name = $C0[$ord]; + } elsif ($ord > 32 && $ord < 127) { # Graphic + $name = "'" . chr($ord) . "'"; + } elsif ($ord >= 127 && $ord <= 0x9f) { + $name = $C1[$ord - 127]; # A C1 control + DEL + } else { # SPACE, or, if Latin1, shorten the name */ + use charnames(); + $name = charnames::viacode($ord); + $name =~ s/LATIN CAPITAL LETTER // + || $name =~ s/LATIN SMALL LETTER (.*)/\L$1/; + } + printf "/* U+%02X %s */ %s,\n", $ord, $name, $bits[$ord]; +} + -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0021-handy.h-Add-isFOO_L1-macros-using-table-lookup.patch ```diff From bd555d41f89f2486647e79b475f07bfa388de623 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 23 Sep 2010 23:14:58 -0600 Subject: [PATCH] handy.h: Add isFOO_L1() macros, using table lookup This patch adds *_L1() macros for character class lookup, using table lookup for O(1) performance. These force a Latin-1 interpretation on ASCII platforms. There were a couple existing macros that had the suffix U for Unicode semantics. I thought that those names might be confusing, so settled on L1 as the least bad name. The older names are kept as synonyms for backward compatibility. The problem with those names is that these are actually macros, not functions, and hence can be called with any int, including any Unicode code point. The U suffix might be mistaken for indicating they are more general purpose, whereas they are really only valid for the latin1 subset of Unicode (including the EBCDIC isomorphs). When called with something outside the latin1 range, they will return false. This patch necessitated rearranging a few things in the file. I added documentation for several more macros, and intend to document the rest. --- handy.h | 143 +++++++++++++++++++++++++++++++++++++++++++++----------------- 1 files changed, 104 insertions(+), 39 deletions(-) diff --git a/handy.h b/handy.h index a93e1a0..591e250 100644 --- a/handy.h +++ b/handy.h @@ -438,36 +438,66 @@ C). /* =head1 Character classes -The functions in this section operate using the character set of the platform -Perl is running on, and are unaffected by locale. For ASCII platforms, they -will all return false for characters outside the ASCII range. For EBCDIC -platforms, they use the code page of the platform. The code pages that Perl -knows about all have 8-bit characters, so most of these functions will return -true for more characters than on ASCII platforms. +There are three variants for all the functions in this section. The base ones +operate using the character set of the platform Perl is running on. The ones +with an C<_A> suffix operate on the ASCII character set, and the ones with an +C<_L1> suffix operate on the full Latin1 character set. All are unaffected by +locale + +For ASCII platforms, the base function with no suffix and the one with the +C<_A> suffix are identical. The function with the C<_L1> suffix imposes the +Latin-1 character set onto the platform. That is, the code points that are +ASCII are unaffected, since ASCII is a subset of Latin-1. But the non-ASCII +code points are treated as if they are Latin-1 characters. For example, +C will return true when called with the code point 0xA0, which is +the Latin-1 NO-BREAK SPACE. + +For EBCDIC platforms, the base function with no suffix and the one with the +C<_L1> suffix should be identical, since, as of this writing, the EBCDIC code +pages that Perl knows about all are equivalent to Latin-1. The function that +ends in an C<_A> suffix will not return true unless the specified character also +has an ASCII equivalent. =for apidoc Am|bool|isALPHA|char ch Returns a boolean indicating whether the specified character is an alphabetic character in the platform's native character set. +See the L for an explanation of variants +C and C. + +=for apidoc Am|bool|isASCII|char ch +Returns a boolean indicating whether the specified character is one of the 128 +characters in the ASCII character set. On non-ASCII platforms, it is if this +character corresponds to an ASCII character. Variants C and +C are identical to C. =for apidoc Am|bool|isDIGIT|char ch Returns a boolean indicating whether the specified character is a digit in the platform's native character set. +Variants C and C are identical to C. =for apidoc Am|bool|isLOWER|char ch Returns a boolean indicating whether the specified character is a lowercase character in the platform's native character set. +See the L for an explanation of variants +C and C. =for apidoc Am|bool|isOCTAL|char ch Returns a boolean indicating whether the specified character is an octal digit, [0-7] in the platform's native character set. +Variants C and C are identical to C. =for apidoc Am|bool|isSPACE|char ch Returns a boolean indicating whether the specified character is a -whitespace character in the platform's native character set. +whitespace character in the platform's native character set. This is the same +as what C<\s> matches in a regular expression. +See the L for an explanation of variants +C and C. =for apidoc Am|bool|isUPPER|char ch Returns a boolean indicating whether the specified character is an uppercase character in the platform's native character set. +See the L for an explanation of variants +C and C. =for apidoc Am|bool|isWORDCHAR|char ch Returns a boolean indicating whether the specified character is a @@ -476,6 +506,13 @@ same as what C<\w> matches in a regular expression. C is a synonym provided for backward compatibility. Note that it does not have the standard C language meaning of alphanumeric, since it matches an underscore and the standard meaning does not. +See the L for an explanation of variants +C and C. + +=for apidoc Am|bool|isXDIGIT|char ch +Returns a boolean indicating whether the specified character is a hexadecimal +digit, [0-9A-Fa-f]. Variants C and C are +identical to C. =head1 Character case changing @@ -489,11 +526,7 @@ character set, if possible; otherwise returns the input character itself. =cut -NOTE: Since some of these are macros, there is no check in those that the -parameter is a char or U8. This means that if called with a larger width -parameter, casts can silently truncate and yield wrong results. - -Also note that these macros are repeated in Devel::PPPort, so should also be +Note that these macros are repeated in Devel::PPPort, so should also be patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc */ @@ -524,7 +557,9 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isCNTRL_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_CNTRL_A)) # define isDIGIT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_DIGIT_A)) # define isGRAPH_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_GRAPH_A)) +# define isIDFIRST_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_IDFIRST_A)) # define isLOWER_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_LOWER_A)) +# define isOCTAL_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_OCTAL_A)) # define isPRINT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PRINT_A)) # define isPSXSPC_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PSXSPC_A)) # define isPUNCT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PUNCT_A)) @@ -533,6 +568,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isWORDCHAR_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_WORDCHAR_A)) # define isXDIGIT_A(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_XDIGIT_A)) #else /* No perl.h. */ +# define isOCTAL_A(c) ((c) >= '0' && (c) <= '9') # ifdef EBCDIC # define isALNUMC_A(c) (isASCII(c) && isALNUMC(c)) # define isALPHA_A(c) (isASCII(c) && isALPHA(c)) @@ -540,6 +576,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isCNTRL_A(c) (isASCII(c) && isCNTRL(c)) # define isDIGIT_A(c) (isASCII(c) && isDIGIT(c)) # define isGRAPH_A(c) (isASCII(c) && isGRAPH(c)) +# define isIDFIRST_A(c) (isASCII(c) && isIDFIRST(c)) # define isLOWER_A(c) (isASCII(c) && isLOWER(c)) # define isPRINT_A(c) (isASCII(c) && isPRINT(c)) # define isPSXSPC_A(c) (isASCII(c) && isPSXSPC(c)) @@ -548,13 +585,14 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isUPPER_A(c) (isASCII(c) && isUPPER(c)) # define isWORDCHAR_A(c) (isASCII(c) && isWORDCHAR(c)) # define isXDIGIT_A(c) (isASCII(c) && isXDIGIT(c)) -# else /* ASCII */ +# else /* ASCII platform, no perl.h */ # define isALNUMC_A(c) (isALPHA_A(c) || isDIGIT_A(c)) # define isALPHA_A(c) (isUPPER_A(c) || isLOWER_A(c)) # define isBLANK_A(c) ((c) == ' ' || (c) == '\t') # define isCNTRL_A(c) (FITS_IN_8_BITS(c) ? ((U8) (c) < ' ' || (c) == 127) : 0) # define isDIGIT_A(c) ((c) >= '0' && (c) <= '9') # define isGRAPH_A(c) (isWORDCHAR_A(c) || isPUNCT_A(c)) +# define isIDFIRST_A(c) (isALPHA_A(c) || (c) == '_') # define isLOWER_A(c) ((c) >= 'a' && (c) <= 'z') # define isPRINT_A(c) (((c) >= 32 && (c) < 127)) # define isPSXSPC_A(c) (isSPACE_A(c) || (c) == '\v') @@ -564,42 +602,62 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isWORDCHAR_A(c) (isALPHA_A(c) || isDIGIT_A(c) || (c) == '_') # define isXDIGIT_A(c) (isDIGIT_A(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) # endif -#endif +#endif /* ASCII range definitions */ /* Latin1 definitions */ -/* ALPHAU includes Unicode semantics for latin1 characters. It has an extra - * >= AA test to speed up ASCII-only tests at the expense of the others */ -/* XXX decide whether to document the ALPHAU, ALNUMU and isSPACE_L1 functions. - * Most of these should be implemented as table lookup for speed */ -#define isALPHAU(c) (isALPHA_A(c) || (NATIVE_TO_UNI((U8) c) >= 0xAA \ - && ((NATIVE_TO_UNI((U8) c) >= 0xC0 \ - && NATIVE_TO_UNI((U8) c) != 0xD7 && NATIVE_TO_UNI((U8) c) != 0xF7) \ - || NATIVE_TO_UNI((U8) c) == 0xAA \ - || NATIVE_TO_UNI((U8) c) == 0xB5 \ - || NATIVE_TO_UNI((U8) c) == 0xBA))) -#define isSPACE_L1(c) (isSPACE(c) \ - || (NATIVE_TO_UNI(c) == 0x85 || NATIVE_TO_UNI(c) == 0xA0)) -#define isWORDCHAR_L1(c) (isDIGIT(c) || isALPHAU(c) || (c) == '_') - -/* Same macro in non-EBCDIC and EBCDIC. Called macros may evaluate - * differently between the two */ +#ifdef H_PERL +# define isALNUMC_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_ALNUMC_L1)) +# define isALPHA_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_ALPHA_L1)) +# define isBLANK_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_BLANK_L1)) +/* continuation character for legal NAME in \N{NAME} */ +# define isCHARNAME_CONT(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_CHARNAME_CONT)) +# define isCNTRL_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_CNTRL_L1)) +# define isGRAPH_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_GRAPH_L1)) +# define isIDFIRST_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_IDFIRST_L1)) +# define isLOWER_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_LOWER_L1)) +# define isPRINT_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PRINT_L1)) +# define isPSXSPC_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PSXSPC_L1)) +# define isPUNCT_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_PUNCT_L1)) +# define isSPACE_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_SPACE_L1)) +# define isUPPER_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_UPPER_L1)) +# define isWORDCHAR_L1(c) cBOOL(FITS_IN_8_BITS(c) && (PL_charclass[(U8) NATIVE_TO_UNI(c)] & _CC_WORDCHAR_L1)) +#else /* No access to perl.h. Only a few provided here, just in case needed + * for backwards compatibility */ + /* ALPHAU includes Unicode semantics for latin1 characters. It has an extra + * >= AA test to speed up ASCII-only tests at the expense of the others */ +# define isALPHA_L1(c) (isALPHA(c) || (NATIVE_TO_UNI((U8) c) >= 0xAA \ + && ((NATIVE_TO_UNI((U8) c) >= 0xC0 \ + && NATIVE_TO_UNI((U8) c) != 0xD7 && NATIVE_TO_UNI((U8) c) != 0xF7) \ + || NATIVE_TO_UNI((U8) c) == 0xAA \ + || NATIVE_TO_UNI((U8) c) == 0xB5 \ + || NATIVE_TO_UNI((U8) c) == 0xBA))) +# define isCHARNAME_CONT(c) (isALNUM_L1(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) +#endif + +/* Macros for backwards compatibility and for completeness when the ASCII and + * Latin1 values are identical */ #define isALNUM(c) isWORDCHAR(c) #define isALNUMU(c) isWORDCHAR_L1(c) -/* continuation character for legal NAME in \N{NAME} */ -#define isCHARNAME_CONT(c) (isWORDCHAR_L1(c) || (c) == ' ' || (c) == '-' || (c) == '(' || (c) == ')' || (c) == ':' || NATIVE_TO_UNI((U8) c) == 0xA0) -#define isIDFIRST(c) (isALPHA(c) || (c) == '_') -#define isOCTAL_A(c) ((c) >= '0' && (c) <= '7') -#define isOCTAL(c) isOCTAL_A(c) -#define isWORDCHAR(c) (isALPHA(c) || isDIGIT(c) || (c) == '_') - +#define isALPHAU(c) isALPHA_L1(c) +#define isDIGIT_L1(c) isDIGIT_A(c) +#define isOCTAL(c) isOCTAL_A(c) +#define isOCTAL_L1(c) isOCTAL_A(c) +#define isXDIGIT_L1(c) isXDIGIT_A(c) + +/* Macros that differ between EBCDIC and ASCII. Where C89 defines a function, + * that is used in the EBCDIC form, because in EBCDIC we do not do locales: + * therefore can use native functions. For those where C89 doesn't define a + * function, use our function, assuming that the EBCDIC code page is isomorphic + * with Latin1, which the three currently recognized by Perl are. Some libc's + * have an isblank(), but it's not guaranteed. */ #ifdef EBCDIC - /* In EBCDIC we do not do locales: therefore can use native functions */ # define isALNUMC(c) isalnum(c) # define isALPHA(c) isalpha(c) # define isBLANK(c) ((c) == ' ' || (c) == '\t' || NATIVE_TO_UNI(c) == 0xA0) # define isCNTRL(c) iscntrl(c) # define isDIGIT(c) isdigit(c) # define isGRAPH(c) isgraph(c) +# define isIDFIRST(c) (isALPHA(c) || (c) == '_') # define isLOWER(c) islower(c) # define isPRINT(c) isprint(c) # define isPSXSPC(c) isspace(c) @@ -607,6 +665,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isSPACE(c) (isPSXSPC(c) && (c) != '\v') # define isUPPER(c) isupper(c) # define isXDIGIT(c) isxdigit(c) +# define isWORDCHAR(c) (isalnum(c) || (c) == '_') # define toLOWER(c) tolower(c) # define toUPPER(c) toupper(c) #else /* Not EBCDIC: ASCII-only matching */ @@ -616,15 +675,21 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isCNTRL(c) isCNTRL_A(c) # define isDIGIT(c) isDIGIT_A(c) # define isGRAPH(c) isGRAPH_A(c) +# define isIDFIRST(c) isIDFIRST_A(c) # define isLOWER(c) isLOWER_A(c) # define isPRINT(c) isPRINT_A(c) # define isPSXSPC(c) isPSXSPC_A(c) # define isPUNCT(c) isPUNCT_A(c) # define isSPACE(c) isSPACE_A(c) # define isUPPER(c) isUPPER_A(c) +# define isWORDCHAR(c) isWORDCHAR_A(c) # define isXDIGIT(c) isXDIGIT_A(c) - /* ASCII casing. */ + /* ASCII casing. These could also be written as + #define toLOWER(c) (isASCII(c) ? toLOWER_LATIN1(c) : (c)) + #define toUPPER(c) (isASCII(c) ? toUPPER_LATIN1_MOD(c) : (c)) + which uses table lookup and mask instead of subtraction. (This would + work because the _MOD does not apply in the ASCII range) */ # define toLOWER(c) (isUPPER(c) ? (c) + ('a' - 'A') : (c)) # define toUPPER(c) (isLOWER(c) ? (c) - ('a' - 'A') : (c)) #endif -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @tsee

Hi Karl\,

karl williamson wrote​:

# New Ticket Created by karl williamson # Please include the string​: [perl #78024] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=78024 >

I just applied your changes locally after some review (as far as I'm qualified to do that) and squashed the "add table" and "oops\, forgot generating script" commits together.

The tests pass. I pushed the changes to camel as a branch "steffen/asciiTableLookup". The reason I didn't push to blead was that I'm not sure I'd put the big generated table in perl.h (at least not without a note that it's generated). I'd like to hear from others before I push this to blead.

Cheers\, Steffen

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

From @tsee

Hi again\,

karl williamson wrote​:

# New Ticket Created by karl williamson # Please include the string​: [perl #78024] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=78024 >

Just after writing the previous mail\, I got some feedback via IRC from Nicholas. It seems he agrees on my sentiment that the new table shouldn't live in perl.h. We also agree that your general plan wrt. table lookups is a very good one.

What we'd like to see for consistency (not a must) is​:

- generated table moved to its own header (mentioning that it's generated). - The script isn't really in the "Porting" category. Apart from the fact that Porting/ is already a pretty wild place\, I think it's for actual developer tools like the ones for validating things (check*.pl)\, preparing releases and such. Most certainly not (see next point) for stuff that's run by regen.pl. I'm saying this without any authority\, mind you. Maybe the script should live in the top level? - It would be convenient and reasonable to run it with each "regen.pl"

What do you think? I'd be willing to attempt the necessary munging of your changes\, but they're your work\, so I don't want to trample all over it without your consent.

Best regards\, Steffen

p5pRT commented 14 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 14 years ago

From zefram@fysh.org

karl williamson wrote​:

The attached series of commits changes the definitions of the character class macros in handy.h to use table lookup for all those that may require more than one comparison

Have you profiled this? Lookup tables aren't the straightforward win that they were in the 1970s. The comparison isn't just N comparison instructions against one table-lookup instruction any more. Now you must consider the cache space taken up by the table against the cache space taken up by the comparisons. L1 cache misses are expensive\, and can easily wipe out the performance win from using fewer instructions. We might conceivably want to have both styles of implementation\, and choose between them depending on the target machine's cache architecture.

-zefram