Open p5pRT opened 5 years ago
This is a bug report for perl from smylers@stripey.com\, generated with the help of perlbug 1.41 running under perl 5.31.1.
The overload module has an internal function OverloadedStringify which is useful but not documented or tested. So here are docs and tests.
The function is used by Moose\, among other things: https://grep.metacpan.org/search?q=OverloadedStringify
It used to be used internally (by Larry Wall\, in the initial implementation of overload in 4633a7c4\, Perl 5.002 beta 1) but hasn't been for 15 years. Maybe it could've been removed at that point\, but it wasn't\, and removing it now would break things.
So documenting it and adding tests for it seems more sensible.
I wasn't entirely sure where the best place in overload.t was to add the tests for overload::OverloadedStringify:
• Straight after the overload::Overloaded tests made sense\, but some of the tests after that are commented with what looks like their test numbers (search for ‘# 225’ and ‘# 230’)\, so I thought it better not to risk renumbering those.
But\, glancing at the verbose test output\, it seems that the test commented # 230 is already test 234\, so maybe that wouldn't've mattered.
• At the end of the file would seem another obvious place for new tests\, but there's a test with the comment ‘KEEP THIS TEST LAST’\, which I didn't want to to disobey. So I added the new tests just above there.
However\, there are several tests after the one with that comment\, so again maybe that wouldn't've mattered.
Flags: category=library severity=low Type=Patch PatchStatus=HasPatch module=overload
Site configuration information for perl 5.31.1:
Configured by smylers at Thu Jun 6 11:46:24 BST 2019.
Summary of my perl5 (revision 5 version 31 subversion 1) configuration: Local Commit: 6d4abcdaf9a79aa74fdfd074c75d8015725cfb72 Ancestor: fb55ce6b7596b9e94f941cf83eac5ff84f760ea2 Platform: osname=linux osvers=4.4.0-146-generic archname=x86_64-linux uname='linux fozzie 4.4.0-146-generic #172~14.04.1-ubuntu smp fri apr 5 16:51:29 utc 2019 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dusedevel' hint=previous useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='' gccversion='4.8.4' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.19.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.19' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl\,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'
Locally applied patches: 44f58ffee6a6e924a381d9c1a64644811ec5323e f3ff56ad41170c41079f26ae34085c91ce33bb6a 6d4abcdaf9a79aa74fdfd074c75d8015725cfb72
@INC for perl 5.31.1: lib /usr/local/lib/perl5/site_perl/5.31.1/x86_64-linux /usr/local/lib/perl5/site_perl/5.31.1 /usr/local/lib/perl5/5.31.1/x86_64-linux /usr/local/lib/perl5/5.31.1
Environment for perl 5.31.1: HOME=/home/smylers LANG=en_GB.utf8 LANGUAGE=en_GB:en LC_COLLATE=C LD_LIBRARY_PATH=/home/smylers/pending/Perl/perl LOGDIR (unset) PATH=/home/smylers/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/games PERL_BADLANG (unset) PERL_CPANM_OPT=--sudo --prompt SHELL=/bin/bash
A minor comment: not sure if we have a policy on it\, but for me there should be a preference to have commit messages ASCII where possible.
All three of your commit messages have some utf8 in there\, and in no case does it look particularly necessary. (If you do change them\, you may also want to change the "party" of the third part\, to "part".)
Hugo
The RT System itself - Status changed from 'new' to 'open'
"Hugo van der Sanden via RT" \perlbug\-followup@​perl\.org writes:
A minor comment: not sure if we have a policy on it\, but for me there should be a preference to have commit messages ASCII where possible.
Why? This is 2019\, everything should be Unicode-capable by now\, and we've indeed been using non-ASCII characters in commit messages for a long time.
~/src/perl$ git log -P --grep='[^[:ascii:]]' --oneline |wc -l 2209
The earliest non-ASCII commit message is:
commit f31700dc8965c83194d956522e573c5d4fdb4611 Author: Gurusamy Sarathy \gsar@​cpan\.org Date: 1998-08-07 21:36:04 +0000
don't use © in Test.pm (suggested by M.J.T. Guy)
p4raw-id: //depot/maint-5.005/perl@1749
All three of your commit messages have some utf8 in there\, and in no case does it look particularly necessary.
A lot of the non-ASCII in our commit history not "particularly necessary": stuff like curly quotes (both from compiler messages and manually typed)\, bullets\, and ellipses.
- ilmari -- "I use RMS as a guide in the same way that a boat captain would use a lighthouse. It's good to know where it is\, but you generally don't want to find yourself in the same spot." - Tollef Fog Heen
On Thu\, 06 Jun 2019 08:41:23 -0700\, ilmari wrote:
"Hugo van der Sanden via RT" \perlbug\-followup@​perl\.org writes:
A minor comment: not sure if we have a policy on it\, but for me there should be a preference to have commit messages ASCII where possible.
Why? This is 2019\, everything should be Unicode-capable by now\, and we've indeed been using non-ASCII characters in commit messages for a long time.
Maybe it's just me\, but for the range of systems perl aspires to port to it seems unlikely they'll all be equally Unicode-capable. I noticed it because they showed up as garbage in my email client\, but maybe that was only because of the way RT wrapped the commits as attachments.
A lot of the non-ASCII in our commit history not "particularly necessary": stuff like curly quotes (both from compiler messages and manually typed)\, bullets\, and ellipses.
Yeah\, that's the sort of stuff. If I'm the only one that considers such things to reduce the quality of the commit history\, I'll have to live with it.
Hugo
On Thu\, Jun 06\, 2019 at 09:20:35AM -0700\, Hugo van der Sanden via RT wrote:
Maybe it's just me\, but for the range of systems perl aspires to port to it seems unlikely they'll all be equally Unicode-capable. I noticed it because they showed up as garbage in my email client\, but maybe that was only because of the way RT wrapped the commits as attachments.
The attachments\, as seen on p5p\, appear to have garbage in the commit messages. For example the bytes in patch 0002 just before the word "make" are
ef bf bd ef bf bd ef bf bd
which when decoded as utf8\, form these characters
"\x{fffd}\x{fffd}\x{fffd}"
So I think the patches probably need resubmitting.
-- All wight. I will give you one more chance. This time\, I want to hear no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers. -- Life of Brian
Dave Mitchell writes:
the bytes in patch 0002 just before the word "make" are
ef bf bd ef bf bd ef bf bd
which when decoded as utf8\, form these characters
"\\x\{fffd\}\\x\{fffd\}\\x\{fffd\}"
So I think the patches probably need resubmitting.
Ascii-only patches attached.
(That was an opening double-quote mark when I typed it\, U+201C. It should've had the UTF-8 bytes E2 80 9C.)
Dagfinn Ilmari Mannsåker writes:
This is 2019\, everything should be Unicode-capable by now\,
I agree. Especially since some committers have non-Ascii characters in their names!
And as you show\, there are plenty of commit messages already successfully using non-Ascii characters. However\, what turned up on the mailing list was clearly unusable.
The attachments in my ‘sent’ folder are unmangled if I open them with Vim\, but possibly insufficiently labelled as to how to interpret them; I couldn't see anywhere in Zoho's webmail client to specify attachment headers. It may have been that if I'd sent them some other way they would have made it through fine.
Or maybe only folk with commit bits are free to use any characters in their messages\, and those of us submitting via RT need to use a restricted subset. I shall try to remember to watch my typing in future.
Hugo van der Sanden via RT writes:
All three of your commit messages have some utf8 in there\,
Thanks for spotting the issue\, Hugo (and the other typo).
Apologies for the distraction\, all.
Smylers
On Fri\, 07 Jun 2019 02:26:42 -0700\, smylers@stripey.com wrote:
Dave Mitchell writes:
the bytes in patch 0002 just before the word "make" are
ef bf bd ef bf bd ef bf bd
which when decoded as utf8\, form these characters
"\\x\{fffd\}\\x\{fffd\}\\x\{fffd\}"
So I think the patches probably need resubmitting.
Ascii-only patches attached.
(That was an opening double-quote mark when I typed it\, U+201C. It should've had the UTF-8 bytes E2 80 9C.)
Dagfinn Ilmari Mannsåker writes:
This is 2019\, everything should be Unicode-capable by now\,
I agree. Especially since some committers have non-Ascii characters in their names!
And as you show\, there are plenty of commit messages already successfully using non-Ascii characters. However\, what turned up on the mailing list was clearly unusable.
The attachments in my ‘sent’ folder are unmangled if I open them with Vim\, but possibly insufficiently labelled as to how to interpret them; I couldn't see anywhere in Zoho's webmail client to specify attachment headers. It may have been that if I'd sent them some other way they would have made it through fine.
Or maybe only folk with commit bits are free to use any characters in their messages\, and those of us submitting via RT need to use a restricted subset. I shall try to remember to watch my typing in future.
The (old version of) RT we're using seems to corrupt UTF-8 in attached patches received by email\, including in the copy of the email sent to the list.
This:
+L\
seems inaccurate - isn't the idea that it's always been provided (for useful versions of always) and we're now documenting and testing it.
Tony
From @tonycoz
The (old version of) RT we're using seems to corrupt UTF-8 in attached patches received by email, including in the copy of the email sent to the list.
Thanks for the explanation, Tony. Hopefully that's no longer a problem. On your more substantive matter:
This:
+L
has been upgraded from version 1.31 to 1.32. It now provides
+Cfor determining whether an object has overloaded
+stringification by any means.seems inaccurate - isn't the idea that it's always been provided (for useful > versions of always) and we're now documenting and testing it.
That's rather a philosophical point! In practice, yes, the function has always been there. Does that count as being “provided”?
I presume that in general P5P wants users to use documented functions (with documented behaviour) and not to start using an undocumented internal function they've somehow come across and happens to work for them?
And specifically that if a user did use such an internal function and then P5P removed it or changed it, that wouldn't count as breaking Perl's backwards- compatibility, because the user shouldn't've been using an internal function in the first place.
In this case arguably what we have is an public function that we forgot to document, rather than an internal function. But in general that's quite a hard distinction to make: possibly the defining feature of an internal function is it's one that isn't documented.
So it seemed safer to go for the wording above: if you only use publicly documented functions, then it effectively wasn't provided until now; adding this documentation is what provides the function.
However, I now see the disadvantage of that is that it incorrectly makes users think they need to upgrade to get this function. It would be nice to have wording which retrospectively blesses use of it in older versions of perl. Specifically that would enable its use in Cpan modules with minimum perl version requirements.
Does anybody know of a suitable form of words from a previous time when something like this has happened? If not, I'll have a go at coming up with some.
(And I'll try not to wait 3 years this time. Apologies for gap; I guess I went on holiday or something and then it slipped my mind — then this morning I encountered a comment in our code-base which links to this bug report.)
Smylers
On Mon, 8 Aug 2022 at 09:46, Smylers @.***> wrote:
From @tonycoz https://github.com/tonycoz
The (old version of) RT we're using seems to corrupt UTF-8 in attached patches received by email, including in the copy of the email sent to the list.
Thanks for the explanation, Tony. Hopefully that's no longer a problem. On your more substantive matter:
This:
+L has been upgraded from version 1.31 to 1.32. It now provides +C for determining whether an object has overloaded +stringification by any means.
seems inaccurate - isn't the idea that it's always been provided (for useful > versions of always) and we're now documenting and testing it.
That's rather a philosophical point! In practice, yes, the function has always been there. Does that count as being “provided”?
I presume that in general P5P wants users to use documented functions (with documented behaviour) and not to start using an undocumented internal function they've somehow come across and happens to work for them?
I personally consider some widely distributed modules to have stable apis even if we dont document them. For instance I would object if Data::Dumper::qquote() was removed.
And specifically that if a user did use such an internal function and then P5P removed it or changed it, that wouldn't count as breaking Perl's backwards- compatibility, because the user shouldn't've been using an internal function in the first place.
In theory, sure, in practice it's harder to say that. Why should we treat using an undocumented (but obviously stable) api any differently than an undocumented and unintentional behavior in perl any differently? And we often commit to preserving such behavior. In practice if people have been using it for a while and its going to cause trouble if we remove it we treat it as baked in
In this case arguably what we have is an public function that we forgot to document, rather than an internal function. But in general that's quite a hard distinction to make: possibly the defining feature of an internal function is it's one that isn't documented.
I think in practice it is one that is documented as internal. :-)
So it seemed safer to go for the wording above: if you only use publicly documented functions, then it effectively wasn't provided until now; adding this documentation is what provides the function
I have no opinion on the wording, I am just speaking to my view of the bigger picture as your comment reminded me of all the times I have used Data::Dumper::qquote() in production code even though it wasn't/isn't documented.
cheers, Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
Migrated from rt.perl.org#134180 (status was 'open')
Searchable as RT134180$