Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 556 forks source link

please correct perlfunc to tell when "sprintf @..." is useful #16107

Open p5pRT opened 7 years ago

p5pRT commented 7 years ago

Migrated from rt.perl.org#131875 (status was 'open')

Searchable as RT131875$

p5pRT commented 7 years ago

From perl-diddler@tlinx.org

Created by perl-diddler@tlinx.org

In perlfunc.pod\, it says​:

  Unlike "printf"\, "sprintf" does not do what you probably mean when   you pass it an array as your first argument. The array is given   scalar context\, and instead of using the 0th element of the array   as the format\, Perl will use the count of elements in the array as   the format\, which is almost never useful.

For a language that is describe as a DWIM language\, and where the sources uses "DWIM" to describe implemented behaviors\, easily over 50 times\, it seems odd or out of place to highlight a special case where this is not true while giving no examples of how it *is* useful.

It seems it would be more useful\, to know how the current behavior of using an array with "sprintf" can be useful. As it is\, it *seems* to be documenting a known deficiency in the language without giving a good reason\, why\, in a language designed to be DWIM\, where possible\, the "sprintf" command isn't corrected to Do What "You probably Mean when you pass it an array as [the] first argument".

Alternatively\, if there are no examples where the current behavior is _useful_\, perhaps this behavior could altered as the text suggests to do the behavior that the programer probably means? I believe grepping through CPAN was used in the 5.18 timeframe to check on the impact of some change that went in.

I tried searching through a 2.8G CPAN-author tree using zgrep and 'sprintf[\x20\x09]+@​' (using an actual space & tab instead of the hex values shown) and found no usages. If it isn't possible to document how the current behavior can be useful\, perhaps this behavior can be changed to "be useful" along the lines of perl's other DWIM behaviors?

Perl Info ``` Flags: category=docs severity=medium Summary of my perl5 (revision 5 version 24 subversion 0) configuration: Platform: osname=linux, osvers=4.10.1-isht-van, archname=x86_64-linux-multi-ld uname='linux ishtar 4.10.1-isht-van #1 smp preempt tue feb 28 18:57:48 pst 2017 x86_64 gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=define use64bitint=define, use64bitall=define, uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2', optimize='-O2', cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong' ccversion='', gccversion='4.9.0', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3 ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8 alignbytes=16, prototype=define Linker and Libraries: ld='gcc', ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/lib64/gcc/x86_64-suse-linux/4.9/include-fixed /usr/lib /usr/local/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.19.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.19' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/home/perl/perl-5.24.0/usr/lib/x86_64-linux-multi-ld/CORE' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_LONG_DOUBLE USE_PERLIO USE_PERL_ATOF Built under linux Compiled at Apr 2 2017 17:08:11 %ENV: PERL5LIB="/home/perl/perl-5.24/lib" @INC: /home/perl/perl-5.24/lib /home/perl/perl-5.24.0/usr/lib/site/x86_64-linux-multi-ld /home/perl/perl-5.24.0/usr/lib/site /home/perl/perl-5.24.0/usr/lib/vendor/x86_64-linux-multi-ld /home/perl/perl-5.24.0/usr/lib/vendor /home/perl/perl-5.24.0/usr/lib/x86_64-linux-multi-ld /home/perl/perl-5.24.0/usr/lib . Environment for perl 5.24.0 HOME=/home/perl LANG (unset) LANGUAGE (unset) LC_COLLATE=C LC_CTYPE=en_US.UTF-8 LC_MESSAGES=C LC_MONETARY=C LC_NUMERIC=C LC_TIME=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/perl/perl-5.24:/home/perl/perl-5.24/usr/bin:/sbin:/usr/local/bin:/usr/bin:/bin:/opt/kde3/bin:/usr/sbin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 7 years ago

From @iabyn

On Wed\, Aug 09\, 2017 at 10​:53​:47PM -0700\, Linda Walsh wrote​:

In perlfunc.pod\, it says​:

 Unlike "printf"\, "sprintf" does not do what you probably mean when
 you pass it an array as your first argument\. The array is given
 scalar context\, and instead of using the 0th element of the array
 as the format\, Perl will use the count of elements in the array as
 the format\, which is almost never useful\.

For a language that is describe as a DWIM language\, and where the sources uses "DWIM" to describe implemented behaviors\, easily over 50 times\, it seems odd or out of place to highlight a special case where this is not true while giving no examples of how it *is* useful.

It seems it would be more useful\, to know how the current behavior of using an array with "sprintf" can be useful. As it is\, it *seems* to be documenting a known deficiency in the language without giving a good reason\, why\, in a language designed to be DWIM\, where possible\, the "sprintf" command isn't corrected to Do What "You probably Mean when you pass it an array as [the] first argument".

The prototype for sprintf is '$@​'. This was possibly a bad choice at the time\, but that's what we've inherited.

Our two main choices are​:

1) leave it as-is.

In this case\, 'sprintf @​a' gets compiled as 'sprintf scalar(@​a)\, ()' which likely doesn't DWIM. Hence the warning text in the pod. We can't add anything to the pod explaining why it would be useful\, as there are likely no circumstances where it would be useful.

Note that a couple of other ops have the '$@​' prototype too; in particular\, pack and join. So 'join @​a' doesn't DWIM either.

Arguably we could add a warning if the first arg of sprintf/pack/join is an array.

2) Change sprintf's prototype to '@​' so that it becomes a normal list operator. The problem with this is that it changes the first argument from being in scalar context to list context\, so for example in

  $x = sprintf my_fmt()\, @​args

my_fmt() is now called in list context\, and may start to return something different.

I tried searching through a 2.8G CPAN-author tree using zgrep and 'sprintf[\x20\x09]+@​' (using an actual space & tab instead of the hex values shown) and found no usages. If it isn't possible to document how the current behavior can be useful\, perhaps this behavior can be changed to "be useful" along the lines of perl's other DWIM behaviors?

These two searches

  http​://grep.cpan.me/?q=sprintf%5Cs*%40   http​://grep.cpan.me/?q=sprintf%5Cs*%5C%28%5Cs*%40

show about 50 CPAN distributions which are (probably incorrectly) doing

  sprintf @​array ...   sprintf(@​array ...

While a search for

  sprintf(foo(...

shows about 100 distributions. In some of those 'foo' is 'qq'\, but plenty are calling real functions\, and so could break if that function behaves differently in list context.

So I don't think changing the prototype is a viable option.

-- I thought I was wrong once\, but I was mistaken.

p5pRT commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 7 years ago

From perl-diddler@tlinx.org

Dave Mitchell via RT wrote​:

The prototype for sprintf is '$@​'. This was possibly a bad choice at the time\, but that's what we've inherited.

Our two main choices are​:

1) leave it as-is.

In this case\, 'sprintf @​a' gets compiled as 'sprintf scalar(@​a)\, ()' which likely doesn't DWIM. Hence the warning text in the pod. We can't add anything to the pod explaining why it would be useful\, as there are likely no circumstances where it would be useful.

Note that a couple of other ops have the '$@​' prototype too; in particular\, pack and join. So 'join @​a' doesn't DWIM either.


FWIW\, sprintf stick out more\, for me\, as it doesn't follow fprintf and printf. While pack and join also suffer from not being DWIM\, they haven't stuck out as much to me as they don't have parallel functions that operate differentlyl.

Arguably we could add a warning if the first arg of sprintf/pack/join is an array.


  While I'd prefer the functionality\, the warning would be a reasonable way of preventing

2) Change sprintf's prototype to '@​' so that it becomes a normal list operator. The problem with this is that it changes the first argument from being in scalar context to list context\, so for example in

$x = sprintf my\_fmt\(\)\, @​args

my_fmt() is now called in list context\, and may start to return something different.

I tried searching through a 2.8G CPAN-author tree using zgrep and 'sprintf[\x20\x09]+@​' (using an actual space & tab instead of the hex values shown) and found no usages. If it isn't possible to document how the current behavior can be useful\, perhaps this behavior can be changed to "be useful" along the lines of perl's other DWIM behaviors?

These two searches

http​://grep\.cpan\.me/?q=sprintf%5Cs\*%40
http​://grep\.cpan\.me/?q=sprintf%5Cs\*%5C%28%5Cs\*%40

  Never seen that "tool"/site before. Looks useful....

show about 50 CPAN distributions which are (probably incorrectly) doing

sprintf @​array \.\.\.
sprintf\(@​array \.\.\.

  I'd think it to be a good thing if an automated-type message could be sent to the authors of those uses to examine what might be a potential problem?

While a search for

sprintf\(foo\(\.\.\.

shows about 100 distributions. In some of those 'foo' is 'qq'\, but plenty are calling real functions\, and so could break if that function behaves differently in list context.

So I don't think changing the prototype is a viable option.


  On the surface\, I'd tend to agree\, however\, one question is -- is how sprintf operates now\, how we would *like* it to operate? I.e. Is this something a "use feature/use VERSION" might be called for in order to eliminate the wart in the future?

  The second issue obviates the need of considering the 1st case\, and that is what could a function return in that context that would be valid for sprintf?

  I.e. sprintf requires a format statement -- it will die at that point if given a reference. The only other option scalar func()... could return would be the number of elements in an array\, OR the recently changed output of SCALAR hash-ref (i.e. what was "4/14" would now be seen as "4" if I understand recent change notes(?)).

  The only way 'foo(...)' would return something different is if it were returning a reference for the SCALAR case OR expecting to return the the result of a SCALAR ARRAY or SCALAR HASH.

  If it was expecting to return a SCALAR HASH -- then recent changes for what is returned in that case would cause breakage already\, no?

  In the other two cases\, SCALAR ARRAY\, or a reference\, it's still the case that sprintf would generate an error.

  I.e. can you think of what a function might return that would be affected by context ($ or @​)\, that would be valid input for sprintf?

  If it is the case that no function can return valid input to sprintf if the function's output is 'mutable' (dependent on scalar or list context)\, then we still have a case where changing sprintf's prototype would not affect any existing\, *working*\, code. No?

 

p5pRT commented 7 years ago

From @iabyn

On Mon\, Aug 14\, 2017 at 11​:57​:31AM -0700\, L A Walsh wrote​:

I.e. can you think of what a function might return that would be affected by context ($ or @​)\, that would be valid input for sprintf?

As a trivial (if slightly contrived) example​:

my @​formats = qw(%s%s %s-%s a=%sb=%s); sub f {   wantarray ? @​formats : $formats[0]; }

print scalar f()\, "\n"; print f()\, "\n";

which outputs

  %s%s   %s%s%s-%sa=%sb=%s

-- You never really learn to swear until you learn to drive.

p5pRT commented 7 years ago

From perl-diddler@tlinx.org

Dave Mitchell via RT wrote​:

On Mon\, Aug 14\, 2017 at 11​:57​:31AM -0700\, L A Walsh wrote​:

I.e. can you think of what a function might return that would be affected by context ($ or @​)\, that would be valid input for sprintf?

As a trivial (if slightly contrived) example​:

my @​formats = qw(%s%s %s-%s a=%sb=%s); sub f { wantarray ? @​formats : $formats[0]; }


  Considering that we are talking about using 'f's output as a format to sprintf\, and not to any\, arbitrary function\, if you are going to contrive an example\, you might\, at least use sprintf.

  I.e. the non-scalar case as passed to the 1st argument of sprintf would never be called. I.e. it is the case that you would already have "faulty" code.

If you felt such faulty code was a possibility\, might not an equally contrived case show that the recent change of SCALAR $hash from its prior form of "NN/MM" to "NN" also could generate an error in some form?

Admittedly\, also a bit of a contrived example\, but many years ago Intel released a chip that had some simple math err in the chip. If someone had written code with the error in mind\, it would no longer work when it was fixed. Do you think Intel gave any thought to how people's programs\, relying on a quirk in the chip's language\, would develop errors if Intel fixed it?

However\, if you really believe there is a reasonable chance that someone would use such a feature -- then\, at least\, such usage could be used as an example in the perldoc as how the current behavior could be useful in perl code.

If such usage is considered a "non-pattern"\, poor\, or a 'bug'\, then it seems deprecating the old behavior (as it encourages obscure or broken code) and fixing the behavior in the future would be the most reasonable step.

However\, if it is desired to protect such code\, then fixing the behavior only in future versions (i.e use 'feature' or use 'PERLVERSION') would be the other option.

Either way\, it seems the current behavior is only lending it self to either buggy code (as your greps indicate in several cases) or obscure code that would be "more than a little" hard to maintain.

p5pRT commented 7 years ago

From @iabyn

On Mon\, Aug 14\, 2017 at 02​:28​:33PM -0700\, L A Walsh wrote​:

Dave Mitchell via RT wrote​:

On Mon\, Aug 14\, 2017 at 11​:57​:31AM -0700\, L A Walsh wrote​:

I.e. can you think of what a function might return that would be affected by context ($ or @​)\, that would be valid input for sprintf?

As a trivial (if slightly contrived) example​:

my @​formats = qw(%s%s %s-%s a=%sb=%s); sub f { wantarray ? @​formats : $formats[0]; } ---- Considering that we are talking about using 'f's output as a format to sprintf\, and not to any\, arbitrary function\, if you are going to contrive an example\, you might\, at least use sprintf.

I.e. the non-scalar case as passed to the 1st argument of sprintf would never be called. I.e. it is the case that you would already have "faulty" code.

I used a non-sprintf example purely to demonstrate that a function can can meaningfully return a valid format in scalar context\, while also capable of being used in list context.

Here's an example with sprintf​:

  my @​formats = qw(%s%s %s-%s a=%sb=%s);   sub formats {   wantarray ? @​formats : $formats[0];   }

  sub simple_output {   sprintf formats()\, @​_;   }

  sub complex_output {   my $selector = shift;   my ($format) = grep /\Q$selector/\, formats();   sprintf $format\, @​_;   }

  $x = simple_output("a"\, "b");   $y = complex_output("="\, "a"\, "b");

If you felt such faulty code was a possibility\,

I don't think my code examples are faulty.

might not an equally contrived case show that the recent change of SCALAR $hash from its prior form of "NN/MM" to "NN" also could generate an error in some form?

Any non-backwardly compatible change we make requires a value judgement of the benefits versus the risks. It's possible to reach different decisions for different circumstances.

Admittedly\, also a bit of a contrived example\, but many years ago Intel released a chip that had some simple math err in the chip. If someone had written code with the error in mind\, it would no longer work when it was fixed. Do you think Intel gave any thought to how people's programs\, relying on a quirk in the chip's language\, would develop errors if Intel fixed it?

The two are not alike. The perl case is documented behaviour\, which like many things in perl\, can produce surprising effects if the coder is not careful. The pentium issue was a bug. No one in their right mind would have written code that needed the bug to be present in order to function. At best\, code may have probed for the bug being present\, and if so\, worked around it.

However\, if you really believe there is a reasonable chance that someone would use such a feature -- then\, at least\, such usage could be used as an example in the perldoc as how the current behavior could be useful in perl code.

I'm not saying that the current behaviour is particularly useful\, merely that it's not a bug\, and that therefore we should consider carefully before changing its behaviour. As such there is no need to add an example to perldoc.

If such usage is considered a "non-pattern"\, poor\, or a 'bug'\, then it seems deprecating the old behavior (as it encourages obscure or broken code) and fixing the behavior in the future would be the most reasonable step.

However\, if it is desired to protect such code\, then fixing the behavior only in future versions (i.e use 'feature' or use 'PERLVERSION') would be the other option.

Either way\, it seems the current behavior is only lending it self to either buggy code (as your greps indicate in several cases) or obscure code that would be "more than a little" hard to maintain.

There is a third option​: to recognise that the first argument being in scalar context is documented behaviour and that some code may rely on it\, and leave it as is. But to recognise specifically that having an array as the first argument indicates a coder error\, and start to emit a warning in that case.

I am in favour of adding a warning. I am not in favour of changing the prototype of sprintf\, with or without the protection of a version or 'use feature' guard.

-- I thought I was wrong once\, but I was mistaken.

p5pRT commented 6 years ago

From zefram@fysh.org

There is nothing to change here. sprintf with an array as first argument is not useful\, but scalar context is useful and\, more importantly\, likely to have code relying on it. There's nothing more to say in the documentation\, and we can't change the behaviour for backcompat reasons. This ticket should be closed.

-zefram

p5pRT commented 6 years ago

From @iabyn

On Fri\, Dec 15\, 2017 at 07​:19​:54AM +0000\, Zefram wrote​:

There is nothing to change here. sprintf with an array as first argument is not useful\, but scalar context is useful and\, more importantly\, likely to have code relying on it. There's nothing more to say in the documentation\, and we can't change the behaviour for backcompat reasons. This ticket should be closed.

I still intend at some point to add a warning when the first arg of sprintf is an array\, as in

  sprintf @​foo ...

since that almost certainly represents a coding error.

-- You're only as old as you look.