Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.95k stars 554 forks source link

Fix const correctness for header files #15732

Closed p5pRT closed 7 years ago

p5pRT commented 7 years ago

Migrated from rt.perl.org#130169 (status was 'resolved')

Searchable as RT130169$

p5pRT commented 7 years ago

From @ppisar

Hello\,

when building a code that includes

#include "EXTERN.h #include "perl.h"

with GCC's -Wcast-qual option\, I get many warnings about discarding const qualifiers like this​:

$ printf '#include "EXTERN.h"\n#include "perl.h"\n' | gcc -Wcast-qual -I/usr/lib64/perl5/CORE -c -x c - In file included from /usr/lib64/perl5/CORE/hv.h​:629​:0\,   from /usr/lib64/perl5/CORE/perl.h​:3740\,   from \​:2​: /usr/lib64/perl5/CORE/hv_func.h​: In function ‘S_perl_hash_siphash_2_4’​: /usr/lib64/perl5/CORE/hv_func.h​:213​:17​: warning​: cast discards ‘const’ qualifier from pointer target type [-Wcast-qual]   U64TYPE k0 = ((U64TYPE*)seed)[0];   ^

This makes difficult to spot issues in non-perl code and people complain (https://bugzilla.redhat.com/show_bug.cgi?id=1242980).

Series of three patches is attached that tries to fix this issue​:

First patch fixes hash functions in hv_func.h. Because their input is already const and their output does not rely pointers to input data\, this patch seems simple and non-controversial.

Second patch touches utf8.h and utfebcdic.h. It changes dereferencing string bytes in is_UTF8-* check macros. While these are macros without argument types\, I think its reasonable to assume the input string does not change during the check\, so the change is desired. The only unusual thing is the changed code was originally generated by a script\, then hand-editted\, and then comes my patch. I did not try to locate the disfunctional (or deleted?) generator in regen/regcharclass.pl and ammend it.

The third and last patch touches utf8_hop* functions. This is the most controversal change as it changes their prototype. I cannot see a way how to get rid of the warnings there without changing prototype. I looked into glibc sources how char *strchr(const char *\, int) is solved. It isn't. It also produces many warnings. Here I doubt about helpfullness of the -Wcast-qual warnings.

-- Petr

p5pRT commented 7 years ago

From @ppisar

0001-Fix-const-correctness-in-hv_func.h.patch ```diff From ee8c4bf20643c07d04f6a5927096bf7b89fcd300 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= Date: Thu, 24 Nov 2016 16:34:09 +0100 Subject: [PATCH 1/3] Fix const correctness in hv_func.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Building an XS code with -Wcast-qual yielded warnings about discarding const qualifiers from pointer targets like: $ printf '#include "EXTERN.h"\n#include "perl.h"\n' | gcc -Wcast-qual -I/usr/lib64/perl5/CORE -c -x c - In file included from /usr/lib64/perl5/CORE/hv.h:629:0, from /usr/lib64/perl5/CORE/perl.h:3740, from :2: /usr/lib64/perl5/CORE/hv_func.h: In function ‘S_perl_hash_siphash_2_4’: /usr/lib64/perl5/CORE/hv_func.h:213:17: warning: cast discards ‘const’ qualifier from pointer target type [-Wcast-qual] U64TYPE k0 = ((U64TYPE*)seed)[0]; ^ Signed-off-by: Petr Písař --- hv_func.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/hv_func.h b/hv_func.h index 8866db9..57b1ed1 100644 --- a/hv_func.h +++ b/hv_func.h @@ -118,7 +118,7 @@ #if (BYTEORDER == 0x1234 || BYTEORDER == 0x12345678) && U32SIZE == 4 /* CPU endian matches murmurhash algorithm, so read 32-bit word directly */ - #define U8TO32_LE(ptr) (*((U32*)(ptr))) + #define U8TO32_LE(ptr) (*((const U32*)(ptr))) #elif BYTEORDER == 0x4321 || BYTEORDER == 0x87654321 /* TODO: Add additional cases below where a compiler provided bswap32 is available */ #if defined(__GNUC__) && (__GNUC__>4 || (__GNUC__==4 && __GNUC_MINOR__>=3)) @@ -210,8 +210,8 @@ S_perl_hash_siphash_2_4(const unsigned char * const seed, const unsigned char *i U64 v3 = UINT64_C(0x7465646279746573); U64 b; - U64 k0 = ((U64*)seed)[0]; - U64 k1 = ((U64*)seed)[1]; + U64 k0 = ((const U64*)seed)[0]; + U64 k1 = ((const U64*)seed)[1]; U64 m; const int left = inlen & 7; const U8 *end = in + inlen - left; @@ -269,7 +269,7 @@ S_perl_hash_siphash_2_4(const unsigned char * const seed, const unsigned char *i PERL_STATIC_INLINE U32 S_perl_hash_superfast(const unsigned char * const seed, const unsigned char *str, STRLEN len) { - U32 hash = *((U32*)seed) + (U32)len; + U32 hash = *((const U32*)seed) + (U32)len; U32 tmp; int rem= len & 3; len >>= 2; @@ -373,7 +373,7 @@ S_perl_hash_superfast(const unsigned char * const seed, const unsigned char *str /* now we create the hash function */ PERL_STATIC_INLINE U32 S_perl_hash_murmur3(const unsigned char * const seed, const unsigned char *ptr, STRLEN len) { - U32 h1 = *((U32*)seed); + U32 h1 = *((const U32*)seed); U32 k1; U32 carry = 0; @@ -467,7 +467,7 @@ S_perl_hash_murmur3(const unsigned char * const seed, const unsigned char *ptr, PERL_STATIC_INLINE U32 S_perl_hash_djb2(const unsigned char * const seed, const unsigned char *str, const STRLEN len) { const unsigned char * const end = (const unsigned char *)str + len; - U32 hash = *((U32*)seed) + (U32)len; + U32 hash = *((const U32*)seed) + (U32)len; while (str < end) { hash = ((hash << 5) + hash) + *str++; } @@ -477,7 +477,7 @@ S_perl_hash_djb2(const unsigned char * const seed, const unsigned char *str, con PERL_STATIC_INLINE U32 S_perl_hash_sdbm(const unsigned char * const seed, const unsigned char *str, const STRLEN len) { const unsigned char * const end = (const unsigned char *)str + len; - U32 hash = *((U32*)seed) + (U32)len; + U32 hash = *((const U32*)seed) + (U32)len; while (str < end) { hash = (hash << 6) + (hash << 16) - hash + *str++; } @@ -503,7 +503,7 @@ S_perl_hash_sdbm(const unsigned char * const seed, const unsigned char *str, con PERL_STATIC_INLINE U32 S_perl_hash_one_at_a_time(const unsigned char * const seed, const unsigned char *str, const STRLEN len) { const unsigned char * const end = (const unsigned char *)str + len; - U32 hash = *((U32*)seed) + (U32)len; + U32 hash = *((const U32*)seed) + (U32)len; while (str < end) { hash += *str++; hash += (hash << 10); @@ -518,7 +518,7 @@ S_perl_hash_one_at_a_time(const unsigned char * const seed, const unsigned char PERL_STATIC_INLINE U32 S_perl_hash_one_at_a_time_hard(const unsigned char * const seed, const unsigned char *str, const STRLEN len) { const unsigned char * const end = (const unsigned char *)str + len; - U32 hash = *((U32*)seed) + (U32)len; + U32 hash = *((const U32*)seed) + (U32)len; while (str < end) { hash += (hash << 10); @@ -553,7 +553,7 @@ S_perl_hash_one_at_a_time_hard(const unsigned char * const seed, const unsigned PERL_STATIC_INLINE U32 S_perl_hash_old_one_at_a_time(const unsigned char * const seed, const unsigned char *str, const STRLEN len) { const unsigned char * const end = (const unsigned char *)str + len; - U32 hash = *((U32*)seed); + U32 hash = *((const U32*)seed); while (str < end) { hash += *str++; hash += (hash << 10); @@ -581,7 +581,7 @@ S_perl_hash_murmur_hash_64a (const unsigned char * const seed, const unsigned ch { const U64 m = UINT64_C(0xc6a4a7935bd1e995); const int r = 47; - U64 h = *((U64*)seed) ^ len; + U64 h = *((const U64*)seed) ^ len; const U64 * data = (const U64 *)str; const U64 * end = data + (len/8); const unsigned char * data2; -- 2.7.4 ```
p5pRT commented 7 years ago

From @ppisar

0002-Fix-const-correctness-in-utf8.h.patch

p5pRT commented 7 years ago

From @ppisar

0003-Fix-const-correctnes-for-utf8_hop-functions.patch ```diff From e7771e5dd2682477edae807e0717f28f0b441679 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= Date: Thu, 24 Nov 2016 17:58:15 +0100 Subject: [PATCH 3/3] Fix const correctnes for utf8_hop functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GCC -Wcast-qual option reports a const violation in utf8_hop functions. They take a pointer to constant data but returns pointer to non-constant data. This patch makes the return value a constant. This changes API. Another option would be to change remove const from the functions input. But that would loose optimization posibilities and it would utf8_hop functions would appear as they could modify the input and that is not true. Signed-off-by: Petr Písař --- embed.fnc | 8 ++++---- inline.h | 16 ++++++++-------- proto.h | 8 ++++---- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/embed.fnc b/embed.fnc index 4743aed..6a7aa3c 100644 --- a/embed.fnc +++ b/embed.fnc @@ -1733,10 +1733,10 @@ Ap |U8* |utf16_to_utf8 |NN U8* p|NN U8 *d|I32 bytelen|NN I32 *newlen Ap |U8* |utf16_to_utf8_reversed|NN U8* p|NN U8 *d|I32 bytelen|NN I32 *newlen AdpPR |STRLEN |utf8_length |NN const U8* s|NN const U8 *e AipdPR |IV |utf8_distance |NN const U8 *a|NN const U8 *b -AipdPRn |U8* |utf8_hop |NN const U8 *s|SSize_t off -AipdPRn |U8* |utf8_hop_back|NN const U8 *s|SSize_t off|NN const U8 *start -AipdPRn |U8* |utf8_hop_forward|NN const U8 *s|SSize_t off|NN const U8 *end -AipdPRn |U8* |utf8_hop_safe |NN const U8 *s|SSize_t off|NN const U8 *start|NN const U8 *end +AipdPRn |const U8* |utf8_hop |NN const U8 *s|SSize_t off +AipdPRn |const U8* |utf8_hop_back|NN const U8 *s|SSize_t off|NN const U8 *start +AipdPRn |const U8* |utf8_hop_forward|NN const U8 *s|SSize_t off|NN const U8 *end +AipdPRn |const U8* |utf8_hop_safe |NN const U8 *s|SSize_t off|NN const U8 *start|NN const U8 *end ApMd |U8* |utf8_to_bytes |NN U8 *s|NN STRLEN *len Apd |int |bytes_cmp_utf8 |NN const U8 *b|STRLEN blen|NN const U8 *u \ |STRLEN ulen diff --git a/inline.h b/inline.h index 346dcdc..b00ece4 100644 --- a/inline.h +++ b/inline.h @@ -896,7 +896,7 @@ on the first byte of character or just after the last byte of a character. =cut */ -PERL_STATIC_INLINE U8 * +PERL_STATIC_INLINE const U8 * Perl_utf8_hop(const U8 *s, SSize_t off) { PERL_ARGS_ASSERT_UTF8_HOP; @@ -916,7 +916,7 @@ Perl_utf8_hop(const U8 *s, SSize_t off) s--; } } - return (U8 *)s; + return s; } /* @@ -936,7 +936,7 @@ Will not exceed this limit even if the string is not valid "UTF-8". =cut */ -PERL_STATIC_INLINE U8 * +PERL_STATIC_INLINE const U8 * Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end) { PERL_ARGS_ASSERT_UTF8_HOP_FORWARD; @@ -951,11 +951,11 @@ Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end) while (off--) { STRLEN skip = UTF8SKIP(s); if ((STRLEN)(end - s) <= skip) - return (U8 *)end; + return end; s += skip; } - return (U8 *)s; + return s; } /* @@ -975,7 +975,7 @@ Will not exceed this limit even if the string is not valid "UTF-8". =cut */ -PERL_STATIC_INLINE U8 * +PERL_STATIC_INLINE const U8 * Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start) { PERL_ARGS_ASSERT_UTF8_HOP_BACK; @@ -993,7 +993,7 @@ Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start) s--; } - return (U8 *)s; + return s; } /* @@ -1011,7 +1011,7 @@ Will not exceed those limits even if the string is not valid "UTF-8". =cut */ -PERL_STATIC_INLINE U8 * +PERL_STATIC_INLINE const U8 * Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end) { PERL_ARGS_ASSERT_UTF8_HOP_SAFE; diff --git a/proto.h b/proto.h index 5ff6bfe..f1aab46 100644 --- a/proto.h +++ b/proto.h @@ -3494,25 +3494,25 @@ PERL_STATIC_INLINE IV Perl_utf8_distance(pTHX_ const U8 *a, const U8 *b) #define PERL_ARGS_ASSERT_UTF8_DISTANCE \ assert(a); assert(b) -PERL_STATIC_INLINE U8* Perl_utf8_hop(const U8 *s, SSize_t off) +PERL_STATIC_INLINE const U8* Perl_utf8_hop(const U8 *s, SSize_t off) __attribute__warn_unused_result__ __attribute__pure__; #define PERL_ARGS_ASSERT_UTF8_HOP \ assert(s) -PERL_STATIC_INLINE U8* Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start) +PERL_STATIC_INLINE const U8* Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start) __attribute__warn_unused_result__ __attribute__pure__; #define PERL_ARGS_ASSERT_UTF8_HOP_BACK \ assert(s); assert(start) -PERL_STATIC_INLINE U8* Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end) +PERL_STATIC_INLINE const U8* Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end) __attribute__warn_unused_result__ __attribute__pure__; #define PERL_ARGS_ASSERT_UTF8_HOP_FORWARD \ assert(s); assert(end) -PERL_STATIC_INLINE U8* Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end) +PERL_STATIC_INLINE const U8* Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end) __attribute__warn_unused_result__ __attribute__pure__; #define PERL_ARGS_ASSERT_UTF8_HOP_SAFE \ -- 2.7.4 ```
p5pRT commented 7 years ago

From zefram@fysh.org

Petr Pisar wrote​:

                                              I cannot see a way how to

get rid of the warnings there without changing prototype. I looked into glibc sources how char *strchr(const char *\, int) is solved. It isn't. It also produces many warnings. Here I doubt about helpfullness of the -Wcast-qual warnings.

Indeed. We're not going to want to change prototypes here\, for the same reasons that the strchr() prototype is appropriate as it is. The code is correct\, and there's no reasonable way to avoid these warnings.

Your first two patches seem OK. We're happy with that kind of localised change to avoid warnings.

-zefram

p5pRT commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 7 years ago

From @ppisar

On Thu\, Nov 24\, 2016 at 01​:05​:58PM -0800\, Zefram via RT wrote​:

Indeed. We're not going to want to change prototypes here\, for the same reasons that the strchr() prototype is appropriate as it is. The code is correct\, and there's no reasonable way to avoid these warnings.

I searched Perl sources to see how utf8_hop is used and it's realy mix of all the ways.

Should I bother with compiler pragmata to silent the warning? GCC-specific code would be​:

  char *foo(const char *a) {   #if defined __GNUC__   #pragma GCC diagnostic push   #pragma GCC diagnostic ignored "-Wcast-qual"   #endif   return (char *)a;   #if defined __GNUC__   #pragma GCC diagnostic pop   #endif   }

I do not want to clutter the sources.

Your first two patches seem OK. We're happy with that kind of localised change to avoid warnings.

Thanks for reviewing the patches.

-- Petr

p5pRT commented 7 years ago

From @ppisar

On Fri\, Nov 25\, 2016 at 10​:14​:41AM -0500\, Andy Lester wrote​:

Should I bother with compiler pragmata to silent the warning? [...] I would be in favor of doing that if it was wrapped in a macro. It would be good to have it explicitly obvious that we are casting away const\, so that we can have the compiler flag the ones we don't mean to cast away.

Me either. I searched the sources again and I found there had already been GCC_DIAG_IGNORE and GCC_DIAG_RESTORE macros for that.

The attached patch that replaces the third patch from the original patch set uses them to silence the warnings in utf8_hop functions.

-- Petr

p5pRT commented 7 years ago

From @ppisar

0003-Silent-const-correctnes-warnings-in-utf8_hop-functio.patch ```diff From 4c74b8f309939863c82124e3ebe686f96cceb9cc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= Date: Mon, 28 Nov 2016 13:06:24 +0100 Subject: [PATCH 3/3] Silent const correctnes warnings in utf8_hop functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GCC -Wcast-qual option reports a const violation in utf8_hop functions. They take a pointer to constant data but returns pointer to non-constant data. It's impossible to fix this without changing their prototype. Therefore this patch asks a compiler to ignore the violations. Signed-off-by: Petr Písař --- inline.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/inline.h b/inline.h index 346dcdc..acd19e5 100644 --- a/inline.h +++ b/inline.h @@ -916,7 +916,9 @@ Perl_utf8_hop(const U8 *s, SSize_t off) s--; } } + GCC_DIAG_IGNORE(-Wcast-qual); return (U8 *)s; + GCC_DIAG_RESTORE; } /* @@ -950,12 +952,17 @@ Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end) while (off--) { STRLEN skip = UTF8SKIP(s); - if ((STRLEN)(end - s) <= skip) + if ((STRLEN)(end - s) <= skip) { + GCC_DIAG_IGNORE(-Wcast-qual); return (U8 *)end; + GCC_DIAG_RESTORE; + } s += skip; } + GCC_DIAG_IGNORE(-Wcast-qual); return (U8 *)s; + GCC_DIAG_RESTORE; } /* @@ -993,7 +1000,9 @@ Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start) s--; } + GCC_DIAG_IGNORE(-Wcast-qual); return (U8 *)s; + GCC_DIAG_RESTORE; } /* -- 2.7.4 ```
p5pRT commented 7 years ago

From @tonycoz

On Mon\, 28 Nov 2016 04​:22​:12 -0800\, ppisar wrote​:

On Fri\, Nov 25\, 2016 at 10​:14​:41AM -0500\, Andy Lester wrote​:

Should I bother with compiler pragmata to silent the warning? [...] I would be in favor of doing that if it was wrapped in a macro. It would be good to have it explicitly obvious that we are casting away const\, so that we can have the compiler flag the ones we don't mean to cast away.

Me either. I searched the sources again and I found there had already been GCC_DIAG_IGNORE and GCC_DIAG_RESTORE macros for that.

The attached patch that replaces the third patch from the original patch set uses them to silence the warnings in utf8_hop functions.

Thanks\, applied the first two of the original patches and this new patch as 463ddf34c08f2c97199b1bb242da1f17494d4d1a\, 9f2eed981068e7abbcc52267863529bc59e9c8c0 and de97954862ed37d913c4b9a48758aba578a8cf0c.

Tony

p5pRT commented 7 years ago

@tonycoz - Status changed from 'open' to 'pending release'

p5pRT commented 7 years ago

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0\, this and 210 other issues have been resolved.

Perl 5.26.0 may be downloaded via​: https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists\, feel free to reopen this ticket.

p5pRT commented 7 years ago

@khwilliamson - Status changed from 'pending release' to 'resolved'