Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 549 forks source link

Warning on duplicate key in literal hash definition #14736

Open p5pRT opened 9 years ago

p5pRT commented 9 years ago

Migrated from rt.perl.org#125332 (status was 'open')

Searchable as RT125332$

p5pRT commented 9 years ago

From @epa

Created by @epa

  my %h = (a => 1\, a => 2);

The first value for key 'a' will immediately be overwritten by the second. Having them both does nothing useful and almost always indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text\,

- two of the keys of the hash (that is\, the even-numbered elements of the list)   are literal strings in the program text and are the same string.

This bug report is not suggesting a warning on code like

  my %h = (a => 1); my %g = (%h\, a => 2);

but only on the case of a single literal hash initializer which contains the same string twice.

(I suspect this warning would in fact find quite a lot of cut-and-paste bugs in existing code.)

Perl Info ``` Flags: category=core severity=wishlist Site configuration information for perl 5.10.1: Configured by Debian Project at Wed Mar 6 15:31:28 UTC 2013. Summary of my perl5 (revision 5 version 10 subversion 1) configuration: Platform: osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi uname='linux madeleine 3.2.0-4-amd64 #1 smp debian 3.2.39-2 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.4.5', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.11.3.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1 gnulibc_version='2.11.3' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Locally applied patches: DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable. DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules. DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD. DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default. DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl. DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets. DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd. DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64 DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626] DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976] [be1cf43] fix an errno stringification bug in taint mode DEBPKG:fixes/safe-upgrade - http://bugs.debian.org/582978 Upgrade Safe.pm to 2.25, fixing CVE-2010-1974 DEBPKG:fixes/tell-crash - http://bugs.debian.org/578577 [f4817f3] Fix a tell() crash on bad arguments. DEBPKG:fixes/format-write-crash - http://bugs.debian.org/579537 [perl #22977] [421f30e] Fix a crash in format/write DEBPKG:fixes/arm-alignment - http://bugs.debian.org/289884 [f1c7503] Prevent gcc from optimizing the alignment test away on armel DEBPKG:fixes/fcgi-test - Fix a failure in CGI/t/fast.t when FCGI is installed DEBPKG:fixes/hurd-ccflags - http://bugs.debian.org/587901 Make hints/gnu.sh append to $ccflags rather than overriding them DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts DEBPKG:fixes/lc-numeric-docs - http://bugs.debian.org/379329 [perl #78452] [903eb63] LC_NUMERIC documentation fixes DEBPKG:fixes/lc-numeric-sprintf - http://bugs.debian.org/601549 [perl #78632] [b3fd614] Fix sprintf not to ignore LC_NUMERIC with constants DEBPKG:fixes/concat-stack-corruption - http://bugs.debian.org/596105 [perl #78674] [e3393f5] Fix stack pointer corruption in pp_concat() with 'use encoding' DEBPKG:fixes/cgi-multiline-header - http://bugs.debian.org/606995 [CVE-2010-2761 CVE-2010-4410 CVE-2010-4411] CGI.pm MIME boundary and multiline header vulnerabilities DEBPKG:fixes/casing-taint-cve-2011-1487 - http://bugs.debian.org/622817 [perl #87336] fix unwanted taint laundering in lc(), uc() et al. DEBPKG:fixes/safe-reval-rdo-cve-2010-1447 - [PATCH] Wrap by default coderefs returned by rdo and reval DEBPKG:fixes/encode-heap-overflow - [PATCH] Fix decode_xs n-byte heap-overflow security bug in DEBPKG:fixes/digest_eval_hole - Close the eval \"require $module\" security hole in DEBPKG:fixes/unregister_signal_handler - [PATCH] main: Unregister signal handler before destroying my_perl DEBPKG:fixes/CVE-2012-5195 - avoid calling memset with a negative count DEBPKG:fixes/CVE-2012-5526 - [PATCH 1/4] CR escaping for P3P header DEBPKG:fixes/storable-security-warning - [PATCH] add a note about security concerns in Storable DEBPKG:fixes/maketext-code-execution - Fix misparsing of maketext strings. DEBPKG:fixes/CVE-2013-1667 - [PATCH] Prevent premature hsplit() calls, and only trigger REHASH DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-17squeeze5 in patchlevel.h @INC for perl 5.10.1: /home/ed/lib/perl/5.10.1 /home/ed/share/perl/5.10.1 /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 . Environment for perl 5.10.1: HOME=/home/ed LANG=en_GB.UTF-8 LANGUAGE (unset) LC_ALL=en_GB.UTF-8 LD_LIBRARY_PATH=/home/ed/lib:/usr/local/lib LOGDIR (unset) PATH=/home/ed/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin PERL5LIB=/home/ed/lib/perl/5.10.1:/home/ed/share/perl/5.10.1 PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 9 years ago

From zefram@fysh.org

Ed Avis wrote​:

- two of the keys of the hash (that is\, the even-numbered elements of the list)

That's difficult to detect\, due to list context.

  my %h = (a => foo()\, a => bar());

looks like it has clashing keys\, but it may not if foo() returns an even-length list.

-zefram

p5pRT commented 9 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 9 years ago

From @epa

Zefram \<zefram \ fysh.org> writes​:

my %h = \(a => foo\(\)\, a => bar\(\)\);

Yes\, I agree that this case would have to be excluded from the check. Only hash initializers where the number of elements is known at compile time would be statically checked in this way.

(Sidetrack​: it might have been a good idea for => to impose scalar context on its RHS\, but probably too late for that now.)

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @leonerd

On Fri\, 5 Jun 2015 14​:00​:20 +0000 (UTC) Ed Avis \eda@&#8203;waniasset\.com wrote​:

(Sidetrack​: it might have been a good idea for => to impose scalar context on its RHS\, but probably too late for that now.)

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar context to RHS\, because I do things like

  Future->fail( "HTTP GET failed"\, http => $request\, $response );

That said\, if we're going to wish for alternate histories\, then maybe

  BARE => SCALAR   BARE ==> LIST

might be the spelling?

-- Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

p5pRT commented 9 years ago

From zefram@fysh.org

Paul "LeoNerd" Evans wrote​:

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar context to RHS\, because I do things like

Future->fail( "HTTP GET failed"\, http => $request\, $response );

Would make no difference in this case\, because the RHS\, $request\, behaves the same regardless of context. Ed was clearly not proposing for => to alter the syntactic list structure.

-zefram

p5pRT commented 9 years ago

From @tux

On Fri\, 5 Jun 2015 17​:11​:26 +0100\, "Paul \"LeoNerd\" Evans" \leonerd@&#8203;leonerd\.org\.uk wrote​:

On Fri\, 5 Jun 2015 14​:00​:20 +0000 (UTC) Ed Avis \eda@&#8203;waniasset\.com wrote​:

(Sidetrack​: it might have been a good idea for => to impose scalar context on its RHS\, but probably too late for that now.)

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar context to RHS\, because I do things like

s/sometimes/very often/

my $string = join "\," => $a\, $b\, @​foo\, func ()\, "TAIL";

Future->fail ("HTTP GET failed"\, http => $request\, $response);

That said\, if we're going to wish for alternate histories\, then maybe

BARE => SCALAR BARE ==> LIST

might be the spelling?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 9 years ago

From @leonerd

On Fri\, 5 Jun 2015 17​:23​:36 +0100 Zefram \zefram@&#8203;fysh\.org wrote​:

Paul "LeoNerd" Evans wrote​:

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar context to RHS\, because I do things like

Future->fail( "HTTP GET failed"\, http => $request\, $response );

Would make no difference in this case\, because the RHS\, $request\, behaves the same regardless of context. Ed was clearly not proposing for => to alter the syntactic list structure.

Huh. Ohyes\, of course.

In that case\, ignore me :)

-- Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

p5pRT commented 9 years ago

From @leonerd

On Fri\, 5 Jun 2015 18​:32​:36 +0200 "H.Merijn Brand" \h\.m\.brand@&#8203;xs4all\.nl wrote​:

That said\, if we're going to wish for alternate histories\, then maybe

BARE => SCALAR BARE ==> LIST

might be the spelling?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!

What; you wanted to reserve ==> for the pipeline operator?

  @​items ==> grep { ... } ==> map { ... } ==> my @​result;

perl6-style?

-- Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

p5pRT commented 9 years ago

From @tux

On Fri\, 5 Jun 2015 17​:46​:13 +0100\, "Paul \"LeoNerd\" Evans" \leonerd@&#8203;leonerd\.org\.uk wrote​:

On Fri\, 5 Jun 2015 18​:32​:36 +0200 "H.Merijn Brand" \h\.m\.brand@&#8203;xs4all\.nl wrote​:

That said\, if we're going to wish for alternate histories\, then maybe

BARE => SCALAR BARE ==> LIST

might be the spelling?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!

What; you wanted to reserve ==> for the pipeline operator?

Joke\, right?

I do not want => to enforce scalar on the right hand side what ==> might be used for in the future is open to discussion

@​items ==> grep { ... } ==> map { ... } ==> my @​result;

perl6-style?

I'm also "doing" perl6 on an almost daily basis now\, but perl5 still wins for me. And the rumor goes that *perl5* has weird syntax experience and not everything is consistent :)

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 9 years ago

From moregan@stresscafe.com

my %h = (a => 1\, a => 2);

The first value for key 'a' will immediately be overwritten by the second. Having them both does nothing useful and almost always indicates a typo.

A lot of work has been done on this by Kevin Ryde in Perl​::Critic​::Policy​::ValuesAndExpressions​::ProhibitDuplicateHashKeys : http​://search.cpan.org/~kryde/Perl-Critic-Pulp-90/

Mike

p5pRT commented 9 years ago

From @iabyn

On Fri\, Jun 05\, 2015 at 06​:28​:00AM -0700\, Ed Avis wrote​:

my %h = \(a => 1\, a => 2\);

The first value for key 'a' will immediately be overwritten by the second. Having them both does nothing useful and almost always indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text\,

The only way I can think to detect this is\, at compile time where an assignment is seen with a list of constants on the RHS\, to create a temporary hash\, then scan the keys of the RHS adding them to the temp hash\, checking for duplicates\, then throw the tmp hash away.

Which seems moderately expensive.

Alternatively\, perhaps we should enhance the HV API to provide a hash copy function\, then stuff like

  %h1 = %h2;

could be done directly rather than pushing all %h2's keys and values onto the stack?

Then with

  %h = (constkey\, constval\, ....);

the (constkey\, constval\, ...) could be constant-folded to a single hash\, with the runtime code doing a direct hash copy. The warning would then be easily generated during the constant folding.

-- Music lesson​: a symbiotic relationship whereby a pupil's embellishments concerning the amount of practice performed since the last lesson are rewarded with embellishments from the teacher concerning the pupil's progress over the corresponding period.

p5pRT commented 9 years ago

From @epa

I don't think the compile-time cost of checking for duplicate hash keys will be significant. Not unless someone is generating huge Perl programs with hashes containing millions of entries - in which case there are certainly more efficient ways to do it anyway.

Constant-folding hashes does sound like a good idea\, more so if it would let duplicate keys be detected straightforwardly. Indeed\, if there were some way to declare a hash as read-only\, so that its contents come from the initial declaration and assignment and can never afterwards be changed\, a lot of hash lookups could be constant-folded too. But that is another sidetrack...

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @demerphq

On 8 June 2015 at 11​:23\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

I don't think the compile-time cost of checking for duplicate hash keys will be significant. Not unless someone is generating huge Perl programs with hashes containing millions of entries - in which case there are certainly more efficient ways to do it anyway.

Constant-folding hashes does sound like a good idea\, more so if it would let duplicate keys be detected straightforwardly. Indeed\, if there were some way to declare a hash as read-only\, so that its contents come from the initial declaration and assignment and can never afterwards be changed\, a lot of hash lookups could be constant-folded too. But that is another sidetrack...

Just curious\, what should happen with code like this​:

my %hash= ( a=> 1\, b=>2 );

my %other_hash= ( a=> 0\, %hash );

would you expect a warning here? FWIW\, personally I wouldn't\, as this is a fairly standard way to set up defaults in initialization.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 9 years ago

From @epa

demerphq \<demerphq \ gmail.com> writes​:

Just curious\, what should happen with code like this​:

my %hash= ( a=> 1\, b=>2 );

my %other_hash= ( a=> 0\, %hash );

As I mentioned in the original bug report\, I do not suggest making any warning for this case. Only in cases where the two duplicate keys are literally present in the program text as part of the same hash assignment.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @epa

Thanks for the mention of Perl​::Critic​::Pulp. I do use it. But nobody runs their program through perlcritic every single edit-test cycle. Although particularly heavyweight\, controversial or specialized warnings are certainly better in perlcritic\, I believe that checking for duplicate literal hash keys in the program is lightweight enough and universally useful enough to merit a warning in perl itself.

p5pRT commented 9 years ago

From zefram@fysh.org

Ed Avis via RT wrote​:

I believe that checking for duplicate literal hash keys in the program is lightweight enough and universally useful enough to merit a warning in perl itself.

There's a middle path that you seem to have overlooked​: a compile-time warning implemented by a module. One could implement it quite easily by hooking the checking for the aassign op. That's very close to where one would naturally implement it in the core.

-zefram

p5pRT commented 9 years ago

From @demerphq

On 8 June 2015 at 13​:34\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

demerphq \<demerphq \ gmail.com> writes​:

Just curious\, what should happen with code like this​:

my %hash= ( a=> 1\, b=>2 );

my %other_hash= ( a=> 0\, %hash );

As I mentioned in the original bug report\, I do not suggest making any warning for this case. Only in cases where the two duplicate keys are literally present in the program text as part of the same hash assignment.

Sorry I missed that point. Thanks for the explanation. This sounds like a useful improvement to me\, assuming it is doable.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 9 years ago

From @timbunce

On Mon\, Jun 08\, 2015 at 04​:54​:27AM -0700\, Ed Avis via RT wrote​:

Thanks for the mention of Perl​::Critic​::Pulp. I do use it. But nobody runs their program through perlcritic every single edit-test cycle.

[Just a data point] At $work we run perlcritic in a git commit hook via https://metacpan.org/pod/App::GitHooks::Plugin::PerlCritic That works very well for us.

Tim.

p5pRT commented 9 years ago

From @timbunce

On Mon\, Jun 08\, 2015 at 09​:14​:41AM +0100\, Dave Mitchell wrote​:

On Fri\, Jun 05\, 2015 at 06​:28​:00AM -0700\, Ed Avis wrote​:

my %h = \(a => 1\, a => 2\);

The first value for key 'a' will immediately be overwritten by the second. Having them both does nothing useful and almost always indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text\,

The only way I can think to detect this is\, at compile time where an assignment is seen with a list of constants on the RHS\, to create a temporary hash\, then scan the keys of the RHS adding them to the temp hash\, checking for duplicates\, then throw the tmp hash away.

Couldn't we just compare the number of elements in the list with the number of elements that end up in the hash?

Tim.

p5pRT commented 9 years ago

From @epa

Tim Bunce \<Tim.Bunce \ pobox.com> writes​:

Couldn't we just compare the number of elements in the list with the number of elements that end up in the hash?

That's certainly a start - it doesn't let you report exactly which hash key was duplicated - but a lot better than nothing.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @tux

On Mon\, 8 Jun 2015 06​:58​:10 -0600\, Tim Bunce \Tim\.Bunce@&#8203;pobox\.com wrote​:

On Mon\, Jun 08\, 2015 at 04​:54​:27AM -0700\, Ed Avis via RT wrote​:

Thanks for the mention of Perl​::Critic​::Pulp. I do use it. But nobody runs their program through perlcritic every single edit-test cycle.

[Just a data point] At $work we run perlcritic in a git commit hook via https://metacpan.org/pod/App::GitHooks::Plugin::PerlCritic That works very well for us.

I think perlcritic is valuable when manually run every 6 month or so

There are too many edge cases broken to use it in commit hooks. A few examples I just noticed in the past hour​:

@​_{slice} != @​_


  local %_;   @​hdr and @​_{@​hdr} = @​$r;

results in

  [4 - Variables​::RequireLocalizedPunctuationVars]   Magic variable "@​_" should be assigned as "local"   :@​hdr and @​_{@​hdr} = @​$r;

grep { ... } @​foo and action != void context


  grep { !defined } @​_ and croak ($self->SetDiag (1004));

results in

  [3 - BuiltinFunctions​::ProhibitVoidGrep]   "grep" used in void context   :grep { !defined } @​_ and croak ($self->SetDiag (1004));

my ($var) inside sub { } is not the same variable


  $csv->callbacks (after_parse => sub {   my ($csv\, $r) = @​_;

results in

  [3 - Variables​::ProhibitReusedNames]   Reused variable name in lexical scope​: $csv   :my ($csv\, $r) = @​_;

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 9 years ago

From @epa

Zefram \<zefram \ fysh.org> writes​:

checking for duplicate literal hash keys

There's a middle path that you seem to have overlooked​: a compile-time warning implemented by a module.

Would that be able to warn on C\<%h = (a => 1\, a => 2)> but keep quiet on C\<%h = (a => 1); %g = (%h\, a => 2)>?

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From zefram@fysh.org

Ed Avis wrote​:

Zefram \<zefram \ fysh.org> writes​:

There's a middle path that you seem to have overlooked​: a compile-time warning implemented by a module.

Would that be able to warn on C\<%h = (a => 1\, a => 2)> but keep quiet on C\<%h = (a => 1); %g = (%h\, a => 2)>?

Yes. It would actually be quite difficult to link the two statements in that latter example at compile time. There's nothing offensive about (%h\, a=>2) in isolation\, and %h doesn't have its clashing value when that list is compiled.

-zefram

p5pRT commented 9 years ago

From @rcaputo

On Jun 8\, 2015\, at 10​:50\, Zefram \zefram@&#8203;fysh\.org wrote​:

Ed Avis wrote​:

Zefram \<zefram \ fysh.org> writes​:

There's a middle path that you seem to have overlooked​: a compile-time warning implemented by a module.

Would that be able to warn on C\<%h = (a => 1\, a => 2)> but keep quiet on C\<%h = (a => 1); %g = (%h\, a => 2)>?

Yes. It would actually be quite difficult to link the two statements in that latter example at compile time. There's nothing offensive about (%h\, a=>2) in isolation\, and %h doesn't have its clashing value when that list is compiled.

All the questions about edge cases imply that people expect this to be a lexically scoped feature not tightly coupled to compile time.

Tying %h to a class that raises an exception on duplicates would work in all the exceptional cases. To limit the feature's scope\, untie the hash after initialization.

-- Rocco Caputo \rcaputo@&#8203;pobox\.com

p5pRT commented 9 years ago

From zefram@fysh.org

Rocco Caputo wrote​:

All the questions about edge cases imply that people expect this to be a lexically scoped feature not tightly coupled to compile time.

No\, the questions about edge cases imply that it's a semantically tricky area with many edges and more than one route to implementation. Most requested versions of the warning have been semantically compile-time checks\, and most discussion about implementation has been based on implementing it at compile time. I think everyone expects that it would be under lexically-scoped control\, but that's unrelated to all the other things you mention.

Tying %h to a class that raises an exception on duplicates would work in all the exceptional cases. To limit the feature's scope\, untie the hash after initialization.

That sounds messy\, and would interfere with ordinary use of the hash.

-zefram

p5pRT commented 9 years ago

From @epa

Rocco Caputo \<rcaputo \ pobox.com> writes​:

All the questions about edge cases imply that people expect this to be a lexically scoped feature not tightly coupled to compile time.

Speaking for myself\, that is not the intention. I envisage a strictly compile-time check of duplicate hash keys in a literal hash assignment. The check would not need to be lexically scoped because nobody would ever want to disable it - at least I have not imagined a case where they would.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @jkeenan

On Mon Jun 08 08​:23​:28 2015\, eda@​waniasset.com wrote​:

Rocco Caputo \<rcaputo \ pobox.com> writes​:

All the questions about edge cases imply that people expect this to be a lexically scoped feature not tightly coupled to compile time.

Speaking for myself\, that is not the intention. I envisage a strictly compile-time check of duplicate hash keys in a literal hash assignment. The check would not need to be lexically scoped because nobody would ever want to disable it - at least I have not imagined a case where they would.

My sense of the discussion so far in this thread is that\, in order to minimize side effects\, the number of cases in which the warning would actually be generated would have to be strictly limited to the simplest cases.

If that is so\, then I doubt the benefit of this warning would outweigh the cost in human work-hours incurred in​:

* Implementing the warning in the core; I note that no patch was submitted.

* More importantly\, the future maintenance cost for the Porters.

How much hand-holding do we have to do? -- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 9 years ago

From @epa

James E Keenan - you are right\, the warning would be limited to the simplest cases (which are also those which are completely uncontroversial\, IMHO). I suggest that despite the simple-mindedness of the check it would actually catch a fair few bugs in practice. This is because a programmer will often cut and paste to add a new entry to a configuration hash\, and forget to change the hash key. This is just my experience from my own code and reading code written by others.

Although the check is conceptually simple to explain (to reiterate​: when the two duplicate keys are literally present in the source code in the same hash assignment\, and when the determination of odd and even positions is also obvious at compile time)\, I did not have much idea how difficult it would be to implement. If straightforward\, it is an easy win; otherwise it may not be worth the development effort (which in any case is decided by whoever does the work\, unless others are interested in funding this proposal).

Should the bug be closed as WONTFIX or WONTBOTHER\, that is fine of course.

If the perl5-porters would prefer not to receive feature requests which aren't backed by patches\, I will stop sending them.

p5pRT commented 9 years ago

From @ikegami

On Fri\, Jun 5\, 2015 at 12​:23 PM\, Zefram \zefram@&#8203;fysh\.org wrote​:

Paul "LeoNerd" Evans wrote​:

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar context to RHS\, because I do things like

Future->fail( "HTTP GET failed"\, http => $request\, $response );

Would make no difference in this case

Better example​: system(cat => @​files)

p5pRT commented 9 years ago

From @epa

Things like

  system(cat => @​files)

are interesting but outside the scope of this feature\, since it isn't a hash assignment; there isn't the same literal key appearing twice in the sequence; and the number of elements isn't known at compile time.

I would only check fixed lists of k=>v pairs\, or (as a slightly enhanced version of the check) lists which contain only k=>v pairs and other hashes as

  %h = (%i\, a => 1\, b => 2\, %j\, a => 3)

since in this case the odd-even of each position is known at compile time\, even if the total number of elements isn't.

p5pRT commented 9 years ago

From @rcaputo

On Jun 11\, 2015\, at 11​:03\, Ed Avis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Things like

system(cat => @​files)

are interesting but outside the scope of this feature\, since it isn't a hash assignment; there isn't the same literal key appearing twice in the sequence; and the number of elements isn't known at compile time.

If it were a runtime feature of list assignment to a hash\, it would work in all cases\, and the very specific case you'd like to happen at compile time could as an optimization.

-- Rocco Caputo \rcaputo@&#8203;pobox\.com

p5pRT commented 9 years ago

From @epa

I suggest a runtime warning wouldn't really work because (unless I am mistaken) it would not be able to distinguish between

  %c = (a => 1);   %h = (a => 1\, a => 2); # pointless   %h = (%c\, a => 2); # normal idiom for overriding some entries

In both cases the RHS of the assignment is the list ('a'\,1\,'a'\,2) at run time. You really do need to look at the literal program text to see the duplicate keys - hence\, it must be a compile time check.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @rcaputo

On Jun 11\, 2015\, at 16​:02\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

I suggest a runtime warning wouldn't really work because (unless I am mistaken) it would not be able to distinguish between

%c = (a => 1); %h = (a => 1\, a => 2); # pointless %h = (%c\, a => 2); # normal idiom for overriding some entries

In both cases the RHS of the assignment is the list ('a'\,1\,'a'\,2) at run time. You really do need to look at the literal program text to see the duplicate keys - hence\, it must be a compile time check.

If the feature were lexically scoped\, or scope-able\, one could disable the warning where the duplicates were deliberate. Or\, I suppose\, enable it where a problem was anticipated. That's why I mentioned lexical scoping before.

-- Rocco Caputo \rcaputo@&#8203;pobox\.com

p5pRT commented 9 years ago

From @iabyn

On Thu\, Jun 11\, 2015 at 02​:33​:41AM -0700\, Ed Avis via RT wrote​:

Although the check is conceptually simple to explain (to reiterate​: when the two duplicate keys are literally present in the source code in the same hash assignment\, and when the determination of odd and even positions is also obvious at compile time)\, I did not have much idea how difficult it would be to implement. If straightforward\, it is an easy win; otherwise it may not be worth the development effort (which in any case is decided by whoever does the work\, unless others are interested in funding this proposal).

I think the best approach to implementing this is to optimise assigning a constant list to a hash by constant folding the list into a constant hash. Then the warning becomes trivial to implement as part of creating the constant hash.

Should the bug be closed as WONTFIX or WONTBOTHER\, that is fine of course.

I think the ticket should be left open. I'll add constant folding hashes (and maybe arrays) to my list of things to to.

If the perl5-porters would prefer not to receive feature requests which aren't backed by patches\, I will stop sending them.

Speaking purely for myself\, I'm happy for people to suggest features that they themselves are not in a position to implement.

-- You never really learn to swear until you learn to drive.

p5pRT commented 9 years ago

From zefram@fysh.org

Dave Mitchell wrote​:

I think the best approach to implementing this is to optimise assigning a constant list to a hash by constant folding the list into a constant hash.

That would miss cases such as

  %h = (a => $a\, a => $b);

which I think Ed wants to be covered.

-zefram

p5pRT commented 9 years ago

From @kentfredric

On 6 June 2015 at 01​:28\, Ed Avis \perlbug\-followup@&#8203;perl\.org wrote​:

my %h = \(a => 1\, a => 2\);

The first value for key 'a' will immediately be overwritten by the second. Having them both does nothing useful and almost always indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text\,

- two of the keys of the hash (that is\, the even-numbered elements of the list) are literal strings in the program text and are the same string.

Question​:

What would happen here​:

my %h = ( a => 1\, @​list\, a => 2\, @​list );

Given @​list can be an odd sized list\, which may in fact expand as​:

my %h = ( a => 1\, b => a\, 2 => b );

( Apologies if it was covered elsewhere\, I didn't see this exact concern mentioned )

-- Kent

*KENTNL* - https://metacpan.org/author/KENTNL

p5pRT commented 9 years ago

From @epa

my %h = ( a => 1\, @​list\, a => 2\, @​list );

Sadly that would have to be excluded from the check since the odd-even-ness of each element cannot be determined at compile time.

There is an interesting issue here\, and perhaps a case for adding a warning when an x=>y pair ends up 'misaligned' so that x is at an odd position in a hash assignment\, but it is out of scope for this feature request.

Only cases where the number of elements in the list is known at compile time - or\, for a slightly more powerful check\, the odd-even-ness of each element is known at compile time - would be checkable.

p5pRT commented 9 years ago

From @rgs

On 12 June 2015 at 15​:29\, Ed Avis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

my %h = ( a => 1\, @​list\, a => 2\, @​list );

Sadly that would have to be excluded from the check since the odd-even-ness of each element cannot be determined at compile time.

There is an interesting issue here\, and perhaps a case for adding a warning when an x=>y pair ends up 'misaligned' so that x is at an odd position in a hash assignment\, but it is out of scope for this feature request.

Only cases where the number of elements in the list is known at compile time - or\, for a slightly more powerful check\, the odd-even-ness of each element is known at compile time - would be checkable.

I was just thinking about %x = (   a => 1\,   b => 2\,   ($need_some_more_b ? (b => 3) : ())\, ); where the trick is to use a boolean flag to change the default value of b.

p5pRT commented 9 years ago

From @kentfredric

On 13 June 2015 at 01​:29\, Ed Avis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Sadly that would have to be excluded from the check since the odd-even-ness of each element cannot be determined at compile time.

Can I read that to interpret such that​:

( key1 => value\, key2 => value\, ARRAY_OR_SUBCALL\, .* ) will process key1 and key2\, but ignore everything that comes after LIST_OR_SUB?

That is

1​: for hashes that are entirely static declarations\, full key analysis can work 2​: for hashes with part that /could/ be returning a list of values\, ( either @​ or x_() )\, only the "keys" prior to the indeterminate part would be analysed\, and all parts after the indeterminate part ignored. ( Because anything that comes after an @​ or & can't be statically determined to be a key or a value ).

-- Kent

*KENTNL* - https://metacpan.org/author/KENTNL

p5pRT commented 9 years ago

From zefram@fysh.org

Rafael Garcia-Suarez wrote​:

I was just thinking about %x = ( a => 1\, b => 2\, ($need_some_more_b ? (b => 3) : ())\, ); where the trick is to use a boolean flag to change the default value of b.

That seems a perverse way of changing the value ascribed to b.

-zefram

p5pRT commented 9 years ago

From @epa

for hashes with part that /could/ be returning a list of values\, ( either @​ or x_() )\, only the "keys" prior to the indeterminate part would be analysed\, and all parts after the indeterminate part ignored.

That's one option and the one I envisaged. Then the warning is always sound.

I suppose an alternative is just to assume that any subroutine call or array included in the hash initializer will return an even number of elements (since after all\, what crazy person would mix => and odd-sized arrays in a hash initializer expr?) and warn on that basis.

That makes the warning technically unsound - since you *could* concoct an example program which triggers it while not having a duplicate hash key - but perhaps more useful in practice.

p5pRT commented 9 years ago

From @rjbs

* Kent Fredric \kentfredric@&#8203;gmail\.com [2015-06-12T09​:59​:12]

On 13 June 2015 at 01​:29\, Ed Avis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Sadly that would have to be excluded from the check since the odd-even-ness of each element cannot be determined at compile time.

Can I read that to interpret such that​:

( key1 => value\, key2 => value\, ARRAY_OR_SUBCALL\, .* ) will process key1 and key2\, but ignore everything that comes after LIST_OR_SUB?

This is getting out of hand.

If this is going ot happen\, it needs to be straightforward and have no false positives\, which is possible.

* scalar constants count as one thing * $scalar and *glob variables and \references count as one thing * hash variables count as zero things * everything else causes the check to be aborted

If the sum of things is even\, no warning. If the sum of things is odd\, warning.

-- rjbs

p5pRT commented 9 years ago

From @ikegami

On Thu\, Jun 11\, 2015 at 3​:58 PM\, Rocco Caputo \rcaputo@&#8203;pobox\.com wrote​:

If it were a runtime feature of list assignment to a hash\, it would work in all cases\, and the very specific case you'd like to happen at compile time could as an optimization.

Which leads to the reason I wouldn't like it​:

%a = (b => @​c);

would be different than

@​d =(b => @​c); %a = @​d;

p5pRT commented 9 years ago

From @epa

Ed Avis \<eda \ waniasset.com> writes​:

(Sidetrack​: it might have been a good idea for => to impose scalar context on its RHS\, but probably too late for that now.)

Just to mention an interesting bug that arose because programmers weren't aware that the RHS of => is actually list context​:

  http​://blog.gerv.net/2014/10/new

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 9 years ago

From @abigail

On Mon\, Aug 03\, 2015 at 12​:12​:08PM +0000\, Ed Avis wrote​:

Ed Avis \<eda \ waniasset.com> writes​:

(Sidetrack​: it might have been a good idea for => to impose scalar context on its RHS\, but probably too late for that now.)

Just to mention an interesting bug that arose because programmers weren't aware that the RHS of => is actually list context​:

http&#8203;://blog\.gerv\.net/2014/10/new

Well\, it's a very good thing the RHS of => is in list context\, if it were in scalar context\,

  %h = (a => 1\, b => 2);

would be a one element hash (with key 'a'\, and value 2).

(This is also directed to the early subthread which suggests that the RHS of => should be in scalar context).

In the blog post\, the code would have failed if => provided scalar context as well. The code makes *two* incorrect assumptions\, not just one​:

  - => has a higher precedence than \,   - => provides scalar context to its arguments\, where \, doesn't.

Just changing => so it provides scalar context on its RHS only helps someone who wrote​:

  %h = (key => foo());

with foo() returning a multi-element list in list context. (Which is also the case for a new operator ==> which only differs from => by the context it provides to its RHS).

On top of changing the RHS behavior (or introducing a new operator) you'd also need another level of precedence.

But I've yet to encounter someone who says "well\, I'm being baffled by context all the time\, but 24 levels of precedence is a piece of cake -- I wish there were more".

Abigail

p5pRT commented 9 years ago

From @epa

Abigail \<abigail \ abigail.be> writes​:

http​://blog.gerv.net/2014/10/new

The code makes *two* incorrect assumptions\, not just one​:

- => has a higher precedence than \, - => provides scalar context to its arguments\, where \, doesn't.

Those assumptions are both wrong but I think they are the mental model of many Perl programmers when writing hash assignments with =>. At least\, to the extent that the programmer has muddled through with a vague idea of the semantics. I too imagined that the => created a "pair" and by so doing had higher precedence than \, somehow; and any example or tutorial uses only scalars or single-item-returning functions on the RHS\, so you imagine that the RHS of => will always be a scalar.

The authors of Bugzilla must be in the top 20% of Perl programmers\, and they got it wrong too.

But I do agree that it is not an easy thing to solve from where Perl is now. It would be interesting to try tweaking => to bind more tightly than \, and see what proportion of CPAN code compiles differently. TBH\, I doubt that 25 levels of precedence is any more confusing than 24 :=-(

-- Ed Avis \eda@&#8203;waniasset\.com