Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 554 forks source link

PATCH: Add bounds checking to case changing array indices #10596

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#77600 (status was 'resolved')

Searchable as RT77600$

p5pRT commented 14 years ago

From @khwilliamson

Any decent compiler should optimize this out at compile time if the index can't possibly exceed the array bounds\, which are 256 long.

p5pRT commented 14 years ago

From @khwilliamson

0001-handy.h-Add-FITS_IN_8_BITS-macro.patch ```diff From 7293a56c85089aab6e89eb220ab34c011eca3d21 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Tue, 31 Aug 2010 19:34:50 -0600 Subject: [PATCH] handy.h: Add FITS_IN_8_BITS() macro This macro is designed to be optimized out if the argument is byte-length, but otherwise to be a bomb-proof way of making sure that the argument occupies only 8 bits or fewer in whatever storage class it is in. --- handy.h | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/handy.h b/handy.h index bbeb1ff..b46b844 100644 --- a/handy.h +++ b/handy.h @@ -482,6 +482,20 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc */ +/* FITS_IN_8_BITS(c) returns true if c occupies no more than 8 bits. It is + * designed to be hopefully bomb-proof, making sure that no bits of + * information are lost even on a 64-bit machine, but to get the compiler to + * optimize it out if possible. This is because Configure makes sure that the + * machine has an 8-bit byte, so if c is stored in a byte, the sizeof() + * guarantees that this evaluates to a constant true at compile time. The use + * of the mask instead of '< 256' keeps gcc from complaining that it is alway + * true, when c's storage class is a byte */ +#ifdef HAS_QUAD +# define FITS_IN_8_BITS(c) ((sizeof(c) == 1) || (((U64)(c) & 0xFF) == (U64)(c))) +#else +# define FITS_IN_8_BITS(c) ((sizeof(c) == 1) || (((U32)(c) & 0xFF) == (U32)(c))) +#endif + #define isALNUM(c) (isALPHA(c) || isDIGIT(c) || (c) == '_') #define isIDFIRST(c) (isALPHA(c) || (c) == '_') #define isALPHA(c) (isUPPER(c) || isLOWER(c)) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-handy.h-Add-bounds-checking-to-case-change-arrays.patch ```diff From d17150d36cc1419aa753a857b8d13555fc6536f4 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Tue, 31 Aug 2010 20:20:01 -0600 Subject: [PATCH] handy.h: Add bounds checking to case change arrays This makes sure that the index into the arrays used to change between lower and upper case will fit into their bounds; returning an error character if not. The check is likely to be optimized out if the index is stored in 8 bits. --- handy.h | 20 +++++++++++++------- 1 files changed, 13 insertions(+), 7 deletions(-) diff --git a/handy.h b/handy.h index b46b844..a1d753d 100644 --- a/handy.h +++ b/handy.h @@ -528,9 +528,7 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isPUNCT(c) ispunct(c) # define isXDIGIT(c) isxdigit(c) # define toUPPER(c) toupper(c) -# define toUPPER_LATIN1_MOD(c) UNI_TO_NATIVE(PL_mod_latin1_uc[(U8) NATIVE_TO_UNI(c)]) # define toLOWER(c) tolower(c) -# define toLOWER_LATIN1(c) UNI_TO_NATIVE(PL_latin1_lc[(U8) NATIVE_TO_UNI(c)]) #else # define isUPPER(c) ((c) >= 'A' && (c) <= 'Z') # define isLOWER(c) ((c) >= 'a' && (c) <= 'z') @@ -542,12 +540,20 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc # define isPUNCT(c) (((c) >= 33 && (c) <= 47) || ((c) >= 58 && (c) <= 64) || ((c) >= 91 && (c) <= 96) || ((c) >= 123 && (c) <= 126)) # define isXDIGIT(c) (isDIGIT(c) || ((c) >= 'a' && (c) <= 'f') || ((c) >= 'A' && (c) <= 'F')) -/* Use table lookup for speed */ -# define toLOWER_LATIN1(c) (PL_latin1_lc[(U8) c]) -/* Modified uc. Is correct uc except for three non-ascii chars which are - * all mapped to one of them, and these need special handling */ -# define toUPPER_LATIN1_MOD(c) (PL_mod_latin1_uc[(U8) c]) + /* Use table lookup for speed; return error character for input + * out-of-range */ +# define toLOWER_LATIN1(c) (FITS_IN_8_BITS(c) \ + ? UNI_TO_NATIVE(PL_latin1_lc[ \ + NATIVE_TO_UNI( (U8) (c)) ]) \ + : UNICODE_REPLACEMENT) + /* Modified uc. Is correct uc except for three non-ascii chars which are + * all mapped to one of them, and these need special handling; error + * character for input out-of-range */ +# define toUPPER_LATIN1_MOD(c) (FITS_IN_8_BITS(c) \ + ? UNI_TO_NATIVE(PL_mod_latin1_uc[ \ + NATIVE_TO_UNI( (U8) (c)) ]) \ + : UNICODE_REPLACEMENT) /* ASCII casing. */ # define toUPPER(c) (isLOWER(c) ? (c) - ('a' - 'A') : (c)) -- 1.5.6.3 ```
p5pRT commented 14 years ago

@rgs - Status changed from 'new' to 'resolved'

p5pRT commented 14 years ago

From @rgarcia

On 1 September 2010 04​:26\, karl williamson \perlbug\-followup@&#8203;perl\.org wrote​:

Any decent compiler should optimize this out at compile time if the index can't possibly exceed the array bounds\, which are 256 long.

Thanks\, applied to bleadperl.

p5pRT commented 14 years ago

From @druud62

On 2010-09-01 04​:26\, karl williamson wrote​:

#define isALNUM(c) (isALPHA(c) || isDIGIT(c) || (c) == '_')

Why is the underscore in isALNUM?

perl -wle 'print "_" =~ /[[​:alnum​:]]/ ? "in" : "out"' out

-- Ruud

p5pRT commented 14 years ago

From zefram@fysh.org

Dr.Ruud wrote​:

Why is the underscore in isALNUM?

For hysterical reasons\, "alnum" in Perl's "isALNUM" does not mean "alphanumeric". It means "identifier character"\, as in \w.

-zefram