Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

[PATCH] Allow push/pop/keys/etc to act on references #10777

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#78656 (status was 'resolved')

Searchable as RT78656$

p5pRT commented 14 years ago

From @xdg

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

  push $arrayref\, @​stuff;   unshift $arrayref\, @​stuff;   pop $arrayref;   shift $arrayref;   splice $arrayref\, 0\, 2;   keys $hashref; # or $arrayref   values $hashref # or $arrayref   ($k\,$v) = each $hashref # or $arrayref

Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows​:

* Summary of benefits * Summary of the patch * Rationale

_Summary of benefits_

As I see it\, the primary benefit is visually de-cluttering common operations in situations where the context (acting on a "container") is clear.

As a side effect of the implementation of this change\, calling keys/values directly on a reference gives roughly a 75%+ performance boost compared to Perl 5.13.6.

  keys $hashref; # much faster than "keys %$hashref"

(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)

I suspect that\, if accepted\, similar optimizations might be possible for push/pop/etc ops.

_Summary of the patch_

For push/pop/shift/unshift/splice ops\, I added a new 'ck_push' check routine. If the child op is not an array or array dereference\, the child op is wrapped in an array dereference.

For keys/values/each\, I added three new opcodes to handle the case where the argument is not a hash or array (or explicit dereference of them). All three ops are handled by one function that dereferences the argument (including any magic) and then dispatches to the corresponding hash or array implementation of the original function. If the argument is not a hash or array\, a runtime error is thrown.

For keys/values/each\, when overloaded dereferencing is present\, the overloaded dereference is used instead of dereferencing the underlying reftype. Warnings are issued about assumptions made in the following three ambiguous cases​:

  (a) If both %{} and @​{} overloading exists\, %{} is used   (b) If %{} overloading exists on a blessed arrayref\, %{} is used   (c) If @​{} overloading exists on a blessed hashref\, @​{} is used

The new warnings have been added to perldiag.pod with an explanation.

For all affected built-ins\, the prototype has been changed to use the new 'single-term' prototype '+'. E.g. for push\, the prototype is "+@​".

Documentation and tests are included.

_Rationale_

Perl is a language that give great respect to the idea of context. I use that term below in the linguistic sense in which the pattern of usage guides interpretation (rather than in the programming sense of "scalar" or "list" context).

There are a number of built-in functions that act directly on "containers" (i.e. hashes and arrays) and that pass their first argument by reference rather than flattening the container into a list. E.g. because of the linguistic context of a function like "push" in "push @​array\, @​list"\, perl passes @​array by reference.

In various venues and forums\, I've heard people express the desire for perl do the "obvious thing" when providing a reference directly to such functions that act on containers. E.g. "push $arrayref\, @​list". The linguistic context provided by "push" is exactly the same -- it is clear that the goal is to act on the container itself.

In contrast\, one can't reach the same conclusion about\, for example\, "foo($_) for $arrayref". The "for" doesn't give any clear linguistic context for whether $arrayref is intended to be flattened or used as a single element list aliased to $_.

Therefore\, this patch allows perl to use a reference to a container directly for built in functions only where the linguistic context *already* allows perl to pass the container by reference instead of flattening it.

In discussing this patch or variations of it on email and IRC\, I've heard lots of praise for the idea. The justifications I've seen are various combinations of "easier to read"\, "less to type" and/or "less annoying".

Those who dislike the idea of this patch generally seem to be of the opinion that dereferencing "should be explicit" or that the patch "isn't necessary" (which may just be a variation of the first objection).

To the first objection\, I say that this is a matter of style and reasonable people already disagree on the "best" Perl style. This patch delivers a desired feature to those who do want it without interfering with the ability of others to explicitly dereference everything as a matter of personal (or corporate) style.

I dismiss the second objection out of hand. By its very nature as a community-driven project\, nothing in the development of Perl is "necessary" and the very definition of "necessary" will vary from person to person. In this case\, the work is desired and it is already done.

As mentioned in the summary\, the changes to keys/values/each to accept references happened to lead to a significant performance improvement\, recovering some of what was lost when 5.12 changed the implementation to allow keys/values/each to act on arrays in the first place.

The changes to push/pop/etc did not require the same degree of changes (i.e. I didn't add new opcodes for them) and I kept the patch as simple as possible. However\, if the patch is accepted\, I suspect that the same technique that allows optimization of "keys $hashref" could be explored later to optimize "push $arrayref\, @​list" over "push @​$arrayref\, @​list".


MANIFEST | 1 + doop.c | 5 +- embed.h | 4 + op.c | 85 +++++++++++-- opcode.h | 21 +++- opnames.h | 5 +- pod/perldelta.pod | 26 ++++ pod/perldiag.pod | 16 +++ pod/perlfunc.pod | 75 +++++++++-- pod/perlsub.pod | 10 +- pp.c | 81 ++++++++++++- pp.sym | 4 + proto.h | 9 ++ regen/opcode.pl | 12 ++- t/op/cproto.t | 16 ++-- t/op/push.t | 39 ++++++- t/op/smartkve.t | 361 +++++++++++++++++++++++++++++++++++++++++++++++++++++ t/op/splice.t | 7 +- t/op/unshift.t | 36 +++++- 19 files changed\, 757 insertions(+)\, 56 deletions(-) create mode 100644 t/op/smartkve.t

Inline Patch ```diff diff --git a/MANIFEST b/MANIFEST index c19bb4b..4118b43 100644 --- a/MANIFEST +++ b/MANIFEST @@ -4692,6 +4692,7 @@ t/op/runlevel.t See if die() works from perl_call_*() t/op/setpgrpstack.t See if setpgrp works t/op/sigdispatch.t See if signals are always dispatched t/op/sleep.t See if sleep works +t/op/smartkve.t See if smart deref for keys/values/each works t/op/smartmatch.t See if the ~~ operator works t/op/sort.t See if sort works t/op/splice.t See if splice works diff --git a/doop.c b/doop.c index 35efba6..550e6fb 100644 --- a/doop.c +++ b/doop.c @@ -1436,8 +1436,9 @@ Perl_do_kv(pTHX) register HE *entry; const I32 gimme = GIMME_V; const I32 dokv = (PL_op->op_type == OP_RV2HV || PL_op->op_type == OP_PADHV); - const I32 dokeys = dokv || (PL_op->op_type == OP_KEYS); - const I32 dovalues = dokv || (PL_op->op_type == OP_VALUES); + /* op_type is OP_RKEYS/OP_RVALUES if pp_rkeys delegated to here */ + const I32 dokeys = dokv || (PL_op->op_type == OP_KEYS || PL_op->op_type == OP_RKEYS); + const I32 dovalues = dokv || (PL_op->op_type == OP_VALUES || PL_op->op_type == OP_RVALUES); if (!hv) { if (PL_op->op_flags & OPf_MOD || LVRET) { /* lvalue */ diff --git a/embed.h b/embed.h index 134c349..31cd119 100644 --- a/embed.h +++ b/embed.h @@ -938,6 +938,7 @@ #define ck_method(a) Perl_ck_method(aTHX_ a) #define ck_null(a) Perl_ck_null(aTHX_ a) #define ck_open(a) Perl_ck_open(aTHX_ a) +#define ck_push(a) Perl_ck_push(aTHX_ a) #define ck_readline(a) Perl_ck_readline(aTHX_ a) #define ck_repeat(a) Perl_ck_repeat(aTHX_ a) #define ck_require(a) Perl_ck_require(aTHX_ a) @@ -1333,6 +1334,7 @@ #define pp_rand() Perl_pp_rand(aTHX) #define pp_range() Perl_pp_range(aTHX) #define pp_rcatline() Perl_pp_rcatline(aTHX) +#define pp_reach() Perl_pp_reach(aTHX) #define pp_read() Perl_pp_read(aTHX) #define pp_readdir() Perl_pp_readdir(aTHX) #define pp_readline() Perl_pp_readline(aTHX) @@ -1353,12 +1355,14 @@ #define pp_rewinddir() Perl_pp_rewinddir(aTHX) #define pp_right_shift() Perl_pp_right_shift(aTHX) #define pp_rindex() Perl_pp_rindex(aTHX) +#define pp_rkeys() Perl_pp_rkeys(aTHX) #define pp_rmdir() Perl_pp_rmdir(aTHX) #define pp_rv2av() Perl_pp_rv2av(aTHX) #define pp_rv2cv() Perl_pp_rv2cv(aTHX) #define pp_rv2gv() Perl_pp_rv2gv(aTHX) #define pp_rv2hv() Perl_pp_rv2hv(aTHX) #define pp_rv2sv() Perl_pp_rv2sv(aTHX) +#define pp_rvalues() Perl_pp_rvalues(aTHX) #define pp_sassign() Perl_pp_sassign(aTHX) #define pp_say() Perl_pp_say(aTHX) #define pp_scalar() Perl_pp_scalar(aTHX) diff --git a/op.c b/op.c index ce9c220..290f11a 100644 --- a/op.c +++ b/op.c @@ -310,6 +310,12 @@ Perl_Slab_Free(pTHX_ void *op) #define RETURN_UNLIMITED_NUMBER (PERL_INT_MAX / 2) +#define CHANGE_TYPE(o,type) \ + STMT_START { \ + o->op_type = (OPCODE)type; \ + o->op_ppaddr = PL_ppaddr[type]; \ + } STMT_END + STATIC const char* S_gv_ename(pTHX_ GV *gv) { @@ -8259,7 +8265,7 @@ Perl_ck_shift(pTHX_ OP *o) return newUNOP(type, 0, scalar(argop)); #endif } - return scalar(modkids(ck_fun(o), type)); + return scalar(modkids(ck_push(o), type)); } OP * @@ -9125,30 +9131,81 @@ Perl_ck_substr(pTHX_ OP *o) } OP * -Perl_ck_each(pTHX_ OP *o) +Perl_ck_push(pTHX_ OP *o) { dVAR; OP *kid = o->op_flags & OPf_KIDS ? cLISTOPo->op_first : NULL; + OP *cursor = NULL; + OP *proxy = NULL; - PERL_ARGS_ASSERT_CK_EACH; + PERL_ARGS_ASSERT_CK_PUSH; + /* If 1st kid is pushmark (e.g. push, unshift, splice), we need 2nd kid */ if (kid) { - if (kid->op_type == OP_PADAV || kid->op_type == OP_RV2AV) { - const unsigned new_type = o->op_type == OP_EACH ? OP_AEACH - : o->op_type == OP_KEYS ? OP_AKEYS : OP_AVALUES; - o->op_type = new_type; - o->op_ppaddr = PL_ppaddr[new_type]; - } - else if (!(kid->op_type == OP_PADHV || kid->op_type == OP_RV2HV - || (kid->op_type == OP_CONST && kid->op_private & OPpCONST_BARE) - )) { - bad_type(1, "hash or array", PL_op_desc[o->op_type], kid); - return o; + cursor = kid->op_type == OP_PUSHMARK ? kid->op_sibling : kid; + } + + /* If not array or array deref, wrap it with an array deref. + * For OP_CONST, we only wrap arrayrefs */ + if (cursor) { + if ( ( cursor->op_type != OP_PADAV + && cursor->op_type != OP_RV2AV + && cursor->op_type != OP_CONST + ) + || + ( cursor->op_type == OP_CONST + && SvROK(cSVOPx_sv(cursor)) + && SvTYPE(SvRV(cSVOPx_sv(cursor))) == SVt_PVAV + ) + ) { + proxy = newAVREF(cursor); + if ( cursor == kid ) { + cLISTOPx(o)->op_first = proxy; + } + else { + cLISTOPx(kid)->op_sibling = proxy; + } + cLISTOPx(proxy)->op_sibling = cLISTOPx(cursor)->op_sibling; + cLISTOPx(cursor)->op_sibling = NULL; } } return ck_fun(o); } +OP * +Perl_ck_each(pTHX_ OP *o) +{ + dVAR; + OP *kid = o->op_flags & OPf_KIDS ? cUNOPo->op_first : NULL; + const unsigned orig_type = o->op_type; + const unsigned array_type = orig_type == OP_EACH ? OP_AEACH + : orig_type == OP_KEYS ? OP_AKEYS : OP_AVALUES; + const unsigned ref_type = orig_type == OP_EACH ? OP_REACH + : orig_type == OP_KEYS ? OP_RKEYS : OP_RVALUES; + + PERL_ARGS_ASSERT_CK_EACH; + + if (kid) { + switch (kid->op_type) { + case OP_PADHV: + case OP_RV2HV: + break; + case OP_PADAV: + case OP_RV2AV: + CHANGE_TYPE(o, array_type); + break; + case OP_CONST: + if (kid->op_private == OPpCONST_BARE) + /* we let ck_fun treat as hash */ + break; + default: + CHANGE_TYPE(o, ref_type); + } + } + /* if treating as a reference, defer additional checks to runtime */ + return o->op_type == ref_type ? o : ck_fun(o); +} + /* caller is supposed to assign the return to the container of the rep_op var */ STATIC OP * diff --git a/opcode.h b/opcode.h index b670675..c7a304d 100644 --- a/opcode.h +++ b/opcode.h @@ -399,6 +399,9 @@ EXTCONST char* const PL_op_name[] = { "lock", "once", "custom", + "reach", + "rkeys", + "rvalues", }; #endif @@ -772,6 +775,9 @@ EXTCONST char* const PL_op_desc[] = { "lock", "once", "unknown custom operator", + "each on reference", + "keys on reference", + "values on reference", }; #endif @@ -1159,6 +1165,9 @@ EXT Perl_ppaddr_t PL_ppaddr[] /* or perlvars.h */ Perl_pp_lock, Perl_pp_once, Perl_unimplemented_op, /* Perl_pp_custom */ + Perl_pp_rkeys, /* Perl_pp_reach */ + Perl_pp_rkeys, + Perl_pp_rkeys, /* Perl_pp_rvalues */ } #endif #ifdef PERL_PPADDR_INITED @@ -1327,11 +1336,11 @@ EXT Perl_check_t PL_check[] /* or perlvars.h */ Perl_ck_null, /* lslice */ Perl_ck_fun, /* anonlist */ Perl_ck_fun, /* anonhash */ - Perl_ck_fun, /* splice */ - Perl_ck_fun, /* push */ + Perl_ck_push, /* splice */ + Perl_ck_push, /* push */ Perl_ck_shift, /* pop */ Perl_ck_shift, /* shift */ - Perl_ck_fun, /* unshift */ + Perl_ck_push, /* unshift */ Perl_ck_sort, /* sort */ Perl_ck_fun, /* reverse */ Perl_ck_grep, /* grepstart */ @@ -1543,6 +1552,9 @@ EXT Perl_check_t PL_check[] /* or perlvars.h */ Perl_ck_rfun, /* lock */ Perl_ck_null, /* once */ Perl_ck_null, /* custom */ + Perl_ck_each, /* reach */ + Perl_ck_each, /* rkeys */ + Perl_ck_each, /* rvalues */ } #endif #ifdef PERL_CHECK_INITED @@ -1921,6 +1933,9 @@ EXTCONST U32 PL_opargs[] = { 0x00007b04, /* lock */ 0x00000300, /* once */ 0x00000000, /* custom */ + 0x00001b00, /* reach */ + 0x00001b08, /* rkeys */ + 0x00001b08, /* rvalues */ }; #endif diff --git a/opnames.h b/opnames.h index 07626d4..26c3ba1 100644 --- a/opnames.h +++ b/opnames.h @@ -381,10 +381,13 @@ typedef enum opcode { OP_LOCK = 363, OP_ONCE = 364, OP_CUSTOM = 365, + OP_REACH = 366, + OP_RKEYS = 367, + OP_RVALUES = 368, OP_max } opcode; -#define MAXO 366 +#define MAXO 369 #define OP_phoney_INPUT_ONLY -1 #define OP_phoney_OUTPUT_ONLY -2 diff --git a/pod/perldelta.pod b/pod/perldelta.pod index a516465..b0ee885 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -59,6 +59,32 @@ till the end of the lexical scope: See L for details. +=head2 Array and hash container functions accept references + +All built-in functions that operate directly on array or hash +containers to also accept references to arrays or hashes: + + push $arrayref, @stuff; + unshift $arrayref, @stuff; + pop $arrayref; + shift $arrayref; + splice $arrayref, 0, 2; + keys $hashref; # or $arrayref + values $hashref # or $arrayref + ($k,$v) = each $hashref # or $arrayref + +Calling C or C directly on a reference gives a substantial +performance improvement over explicit dereferencing. + +For C, C, C, when overloaded dereferencing is +present, the overloaded dereference is used instead of dereferencing the +underlying reftype. Warnings are issued about assumptions made in the +following three ambiguous cases: + + (a) If both %{} and @{} overloading exists, %{} is used + (b) If %{} overloading exists on a blessed arrayref, %{} is used + (c) If @{} overloading exists on a blessed hashref, @{} is used + =head1 Security XXX Any security-related notices go here. In particular, any security diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 7bbccdd..3f467af 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -76,6 +76,17 @@ on the operator (e.g. C) or declare the subroutine to be an object method (see L or L). +=item Ambiguous overloaded argument to %s resolved as %s + +(W ambiguous) You called C, C or C on an object that had +overloading of C<%{}> or C<@{}> or both. In such a case, the object is +dereferenced according to its overloading, not its underlying reference type. +The warning is issued when C<%{}> overloading exists on a blessed arrayref, +when C<@{}> overloading exists on a blessed hashref, or when both overloadings +are defined (in which case C<%{}> is used). You can force the interpretation +of the object by explictly dereferencing it as an array or hash instead of +passing the object itself to C, C or C. + =item Ambiguous range in transliteration operator (F) You wrote something like C which doesn't mean anything at @@ -4520,6 +4531,11 @@ certain type. Arrays must be @NAME or C<@{EXPR}>. Hashes must be %NAME or C<%{EXPR}>. No implicit dereferencing is allowed--use the {EXPR} forms as an explicit dereference. See L. +=item Type of argument to %s must be hashref or arrayref + +(F) You called C, C or C with an argument that was +expected to be a reference to a hash or a reference to an array. + =item umask not implemented (F) Your machine doesn't implement the umask function and you tried to diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 22d23b3..f0ae577 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -1452,10 +1452,10 @@ convert a core file into an executable. That's why you should now invoke it as C, if you don't want to be warned against a possible typo. -=item each HASH +=item each HASH (or HASHREF) X X -=item each ARRAY +=item each ARRAY (or ARRAYREF) X When called in list context, returns a 2-element list consisting of the key @@ -1494,6 +1494,16 @@ but in a different order: print "$key=$value\n"; } +When given a reference to a hash or array, the argument will be +dereferenced automatically. + + while (($key,$value) = each $hashref) { ... } + +If the reference is a blessed object that overrides either C<%{}> or +C<@{}>, the override will be used instead of dereferencing the underlying +variable type. If both overrides are provided, C<%{}> will be the default. +If this is not desired, you must dereference the argument yourself. + See also C, C and C. =item eof FILEHANDLE @@ -2602,10 +2612,10 @@ separated by the value of EXPR, and returns that new string. Example: Beware that unlike C, C doesn't take a pattern as its first argument. Compare L. -=item keys HASH +=item keys HASH (or HASHREF) X X -=item keys ARRAY +=item keys ARRAY (or ARRAYREF) Returns a list consisting of all the keys of the named hash, or the indices of an array. (In scalar context, returns the number of keys or indices.) @@ -2662,6 +2672,17 @@ C in this way (but you needn't worry about doing this by accident, as trying has no effect). C in an lvalue context is a syntax error. +When given a reference to a hash or array, the argument will be +dereferenced automatically. + + for (keys $hashref) { ... } + for (keys $obj->get_arrayref) { ... } + +If the reference is a blessed object that overrides either C<%{}> or +C<@{}>, the override will be used instead of dereferencing the underlying +variable type. If both overrides are provided, C<%{}> will be the default. +If this is not desired, you must dereference the argument yourself. + See also C, C and C. =item kill SIGNAL, LIST @@ -4306,7 +4327,7 @@ On systems that support a close-on-exec flag on files, that flag is set on all newly opened file descriptors whose Cs are I than the current value of $^F (by default 2 for C). See L. -=item pop ARRAY +=item pop ARRAY (or ARRAYREF) X X =item pop @@ -4318,6 +4339,9 @@ Returns the undefined value if the array is empty, although this may also happen at other times. If ARRAY is omitted, pops the C<@ARGV> array in the main program, but the C<@_> array in subroutines, just like C. +If given a reference to an array, the argument will be dereferenced +automatically. + =item pos SCALAR X X @@ -4403,7 +4427,7 @@ C) or if its arguments cannot be adequately expressed by a prototype does not really behave like a Perl function. Otherwise, the string describing the equivalent prototype is returned. -=item push ARRAY,LIST +=item push ARRAY (or ARRAYREF),LIST X X Treats ARRAY as a stack, and pushes the values of LIST @@ -4417,6 +4441,9 @@ LIST. Has the same effect as but is more efficient. Returns the number of elements in the array following the completed C. +If given a reference to an array, the argument will be dereferenced +automatically. + =item q/STRING/ =item qq/STRING/ @@ -5309,7 +5336,7 @@ An example disabling Nagle's algorithm on a socket: use Socket qw(IPPROTO_TCP TCP_NODELAY); setsockopt($socket, IPPROTO_TCP, TCP_NODELAY, 1); -=item shift ARRAY +=item shift ARRAY (or ARRAYREF) X =item shift @@ -5322,6 +5349,9 @@ C<@ARGV> array outside a subroutine and also within the lexical scopes established by the C, C, C, C, C and C constructs. +If given a reference to an array, the argument will be dereferenced +automatically. + See also C, C, and C. C and C do the same thing to the left end of an array that C and C do to the right end. @@ -5653,14 +5683,14 @@ eliminate any Cs from the input list. @result = sort { $a <=> $b } grep { $_ == $_ } @input; -=item splice ARRAY,OFFSET,LENGTH,LIST +=item splice ARRAY (or ARRAYREF),OFFSET,LENGTH,LIST X -=item splice ARRAY,OFFSET,LENGTH +=item splice ARRAY (or ARRAYREF),OFFSET,LENGTH -=item splice ARRAY,OFFSET +=item splice ARRAY (or ARRAYREF),OFFSET -=item splice ARRAY +=item splice ARRAY (or ARRAYREF) Removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any. In list context, @@ -5675,6 +5705,9 @@ If both OFFSET and LENGTH are omitted, removes everything. If OFFSET is past the end of the array, Perl issues a warning, and splices at the end of the array. +If given a reference to an array, the argument will be dereferenced +automatically. + The following equivalences hold (assuming C<< $[ == 0 and $#a >= $i >> ) push(@a,$x,$y) splice(@a,@a,0,$x,$y) @@ -7162,7 +7195,7 @@ X Breaks the binding between a variable and a package. (See C.) Has no effect if the variable is not tied. -=item unshift ARRAY,LIST +=item unshift ARRAY (or ARRAYREF),LIST X Does the opposite of a C. Or the opposite of a C, @@ -7175,6 +7208,9 @@ Note the LIST is prepended whole, not one element at a time, so the prepended elements stay in the same order. Use C to do the reverse. +If given a reference to an array, the argument will be dereferenced +automatically. + =item use Module VERSION LIST X X X @@ -7340,10 +7376,10 @@ files. On systems that don't support futimes(2), passing filehandles raises an exception. Filehandles must be passed as globs or glob references to be recognized; barewords are considered filenames. -=item values HASH +=item values HASH (or HASHREF) X -=item values ARRAY +=item values ARRAY (or ARRAYREF) Returns a list consisting of all the values of the named hash, or the values of an array. (In a scalar context, returns the number of values.) @@ -7371,6 +7407,17 @@ modify the contents of the hash: for (values %hash) { s/foo/bar/g } # modifies %hash values for (@hash{keys %hash}) { s/foo/bar/g } # same +When given a reference to a hash or array, the argument will be +dereferenced automatically. + + for (values $hashref) { ... } + for (values $obj->get_arrayref) { ... } + +If the reference is a blessed object that overrides either C<%{}> or +C<@{}>, the override will be used instead of dereferencing the underlying +variable type. If both overrides are provided, C<%{}> will be the default. +If this is not desired, you must dereference the argument yourself. + See also C, C, and C. =item vec EXPR,OFFSET,BITS diff --git a/pod/perlsub.pod b/pod/perlsub.pod index c16db28..cfa4ad4 100644 --- a/pod/perlsub.pod +++ b/pod/perlsub.pod @@ -1053,7 +1053,7 @@ X X Perl supports a very limited kind of compile-time argument checking using function prototyping. If you declare - sub mypush (\@@) + sub mypush (+@) then C takes arguments exactly like C does. The function declaration must be visible at compile time. The prototype @@ -1083,9 +1083,9 @@ corresponding built-in. sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off sub myreverse (@) myreverse $a, $b, $c sub myjoin ($@) myjoin ":", $a, $b, $c - sub mypop (\@) mypop @array - sub mysplice (\@$$@) mysplice @array, 0, 2, @pushme - sub mykeys (\%) mykeys %{$hashref} + sub mypop (+) mypop @array + sub mysplice (+$$@) mysplice @array, 0, 2, @pushme + sub mykeys (+) mykeys %{$hashref} sub myopen (*;$) myopen HANDLE, $name sub mypipe (**) mypipe READHANDLE, WRITEHANDLE sub mygrep (&@) mygrep { /foo/ } $a, $b, $c @@ -1141,7 +1141,7 @@ C<\[@%]> when given a literal array or hash variable, but will otherwise force scalar context on the argument. This is useful for functions which should accept either a literal array or an array reference as the argument: - sub smartpush (+@) { + sub mypush (+@) { my $aref = shift; die "Not an array or arrayref" unless ref $aref eq 'ARRAY'; push @$aref, @_; diff --git a/pp.c b/pp.c index c99d697..da85dbe 100644 --- a/pp.c +++ b/pp.c @@ -429,7 +429,19 @@ PP(pp_prototype) goto set; } if (code == -KEY_keys || code == -KEY_values || code == -KEY_each) { - ret = newSVpvs_flags("\\[@%]", SVs_TEMP); + ret = newSVpvs_flags("+", SVs_TEMP); + goto set; + } + if (code == -KEY_push || code == -KEY_unshift) { + ret = newSVpvs_flags("+@", SVs_TEMP); + goto set; + } + if (code == -KEY_pop || code == -KEY_shift) { + ret = newSVpvs_flags(";+", SVs_TEMP); + goto set; + } + if (code == -KEY_splice) { + ret = newSVpvs_flags("+;$$@", SVs_TEMP); goto set; } if (code == -KEY_tied || code == -KEY_untie) { @@ -4625,6 +4637,71 @@ PP(pp_aslice) RETURN; } +/* Smart dereferencing for keys, values and each */ +PP(pp_rkeys) +{ + dVAR; + dSP; + dPOPss; + + if (!SvOK(sv)) + RETURN; + + if (SvROK(sv)) { + SvGETMAGIC(sv); + if (SvAMAGIC(sv)) { + /* N.B.: AMG macros return sv if no overloading is found */ + SV *maybe_hv = AMG_CALLun_var(sv,to_hv_amg); + SV *maybe_av = AMG_CALLun_var(sv,to_av_amg); + if ( maybe_hv != sv && maybe_av != sv ) { + Perl_ck_warner(aTHX_ packWARN(WARN_AMBIGUOUS), + Perl_form(aTHX_ "Ambiguous overloaded argument to %s resolved as %{}", + PL_op_desc[PL_op->op_type] + ) + ); + sv = maybe_hv; + } + else if ( maybe_av != sv ) { + if ( SvTYPE(SvRV(sv)) == SVt_PVHV ) { + /* @{} overload, but underlying reftype is HV */ + Perl_ck_warner(aTHX_ packWARN(WARN_AMBIGUOUS), + Perl_form(aTHX_ "Ambiguous overloaded argument to %s resolved as @{}", + PL_op_desc[PL_op->op_type] + ) + ); + } + sv = maybe_av; + } + else if ( maybe_hv != sv ) { + if ( SvTYPE(SvRV(sv)) == SVt_PVAV ) { + /* %{} overload, but underlying reftype is AV */ + Perl_ck_warner(aTHX_ packWARN(WARN_AMBIGUOUS), + Perl_form(aTHX_ "Ambiguous overloaded argument to %s resolved as %{}", + PL_op_desc[PL_op->op_type] + ) + ); + } + sv = maybe_hv; + } + } + sv = SvRV(sv); + } + + if ( SvTYPE(sv) != SVt_PVHV && SvTYPE(sv) != SVt_PVAV ) { + DIE(Perl_form(aTHX_ "Type of argument to %s must be hashref or arrayref", + PL_op_desc[PL_op->op_type] )); + } + + /* Delegate to correct function for op type */ + PUSHs(sv); + if (PL_op->op_type == OP_RKEYS || PL_op->op_type == OP_RVALUES) { + return (SvTYPE(sv) == SVt_PVHV) ? Perl_do_kv(aTHX) : Perl_pp_akeys(aTHX); + } + else { + return (SvTYPE(sv) == SVt_PVHV) ? Perl_pp_each(aTHX) : Perl_pp_aeach(aTHX); + } +} + PP(pp_aeach) { dVAR; @@ -4670,7 +4747,7 @@ PP(pp_akeys) EXTEND(SP, n + 1); - if (PL_op->op_type == OP_AKEYS) { + if (PL_op->op_type == OP_AKEYS || PL_op->op_type == OP_RKEYS) { n += i; for (; i <= n; i++) { mPUSHi(i); diff --git a/pp.sym b/pp.sym index 611550e..095ee2e 100644 --- a/pp.sym +++ b/pp.sym @@ -30,6 +30,7 @@ Perl_ck_match Perl_ck_method Perl_ck_null Perl_ck_open +Perl_ck_push Perl_ck_readline Perl_ck_repeat Perl_ck_require @@ -409,5 +410,8 @@ Perl_pp_getlogin Perl_pp_syscall Perl_pp_lock Perl_pp_once +Perl_pp_reach +Perl_pp_rkeys +Perl_pp_rvalues # ex: set ro: diff --git a/proto.h b/proto.h index 0027180..80660a3 100644 --- a/proto.h +++ b/proto.h @@ -407,6 +407,12 @@ PERL_CALLCONV OP * Perl_ck_open(pTHX_ OP *o) #define PERL_ARGS_ASSERT_CK_OPEN \ assert(o) +PERL_CALLCONV OP * Perl_ck_push(pTHX_ OP *o) + __attribute__warn_unused_result__ + __attribute__nonnull__(pTHX_1); +#define PERL_ARGS_ASSERT_CK_PUSH \ + assert(o) + PERL_CALLCONV OP * Perl_ck_readline(pTHX_ OP *o) __attribute__warn_unused_result__ __attribute__nonnull__(pTHX_1); @@ -3073,6 +3079,7 @@ PERL_CALLCONV OP * Perl_pp_quotemeta(pTHX); PERL_CALLCONV OP * Perl_pp_rand(pTHX); PERL_CALLCONV OP * Perl_pp_range(pTHX); PERL_CALLCONV OP * Perl_pp_rcatline(pTHX); +PERL_CALLCONV OP * Perl_pp_reach(pTHX); PERL_CALLCONV OP * Perl_pp_read(pTHX); PERL_CALLCONV OP * Perl_pp_readdir(pTHX); PERL_CALLCONV OP * Perl_pp_readline(pTHX); @@ -3093,12 +3100,14 @@ PERL_CALLCONV OP * Perl_pp_reverse(pTHX); PERL_CALLCONV OP * Perl_pp_rewinddir(pTHX); PERL_CALLCONV OP * Perl_pp_right_shift(pTHX); PERL_CALLCONV OP * Perl_pp_rindex(pTHX); +PERL_CALLCONV OP * Perl_pp_rkeys(pTHX); PERL_CALLCONV OP * Perl_pp_rmdir(pTHX); PERL_CALLCONV OP * Perl_pp_rv2av(pTHX); PERL_CALLCONV OP * Perl_pp_rv2cv(pTHX); PERL_CALLCONV OP * Perl_pp_rv2gv(pTHX); PERL_CALLCONV OP * Perl_pp_rv2hv(pTHX); PERL_CALLCONV OP * Perl_pp_rv2sv(pTHX); +PERL_CALLCONV OP * Perl_pp_rvalues(pTHX); PERL_CALLCONV OP * Perl_pp_sassign(pTHX); PERL_CALLCONV OP * Perl_pp_say(pTHX); PERL_CALLCONV OP * Perl_pp_scalar(pTHX); diff --git a/regen/opcode.pl b/regen/opcode.pl index d1a47d5..9369c2e 100755 --- a/regen/opcode.pl +++ b/regen/opcode.pl @@ -105,6 +105,7 @@ my @raw_alias = ( Perl_pp_bit_or => ['bit_xor'], Perl_pp_rv2av => ['rv2hv'], Perl_pp_akeys => ['avalues'], + Perl_pp_rkeys => [qw(rvalues reach)], ); while (my ($func, $names) = splice @raw_alias, 0, 2) { @@ -808,11 +809,11 @@ lslice list slice ck_null 2 H L L anonlist anonymous list ([]) ck_fun ms@ L anonhash anonymous hash ({}) ck_fun ms@ L -splice splice ck_fun m@ A S? S? L -push push ck_fun imsT@ A L +splice splice ck_push m@ A S? S? L +push push ck_push imsT@ A L pop pop ck_shift s% A? shift shift ck_shift s% A? -unshift unshift ck_fun imsT@ A L +unshift unshift ck_push imsT@ A L sort sort ck_sort dm@ C? L reverse reverse ck_fun mt@ L @@ -1099,3 +1100,8 @@ lock lock ck_rfun s% R once once ck_null | custom unknown custom operator ck_null 0 + +# For smart dereference for each/keys/values +reach each on reference ck_each % S +rkeys keys on reference ck_each t% S +rvalues values on reference ck_each t% S diff --git a/t/op/cproto.t b/t/op/cproto.t index 3e3c0de..b1a4944 100644 --- a/t/op/cproto.t +++ b/t/op/cproto.t @@ -57,7 +57,7 @@ delete undef die (@) do undef dump () -each (\[@%]) +each (+) else undef elsif undef endgrent () @@ -120,7 +120,7 @@ index ($$;$) int (_) ioctl (*$$) join ($@) -keys (\[@%]) +keys (+) kill (@) last undef lc (_) @@ -156,12 +156,12 @@ our undef pack ($@) package undef pipe (**) -pop (;\@) +pop (;+) pos undef print undef printf undef prototype undef -push (\@@) +push (+@) q undef qq undef qr undef @@ -204,7 +204,7 @@ setprotoent ($) setpwent () setservent ($) setsockopt (*$$$) -shift (;\@) +shift (;+) shmctl ($$$) shmget ($$$) shmread ($$$$) @@ -215,7 +215,7 @@ sleep (;$) socket (*$$$) socketpair (**$$$) sort undef -splice (\@;$$@) +splice (+;$$@) split undef sprintf ($@) sqrt (_) @@ -247,12 +247,12 @@ undef undef unless undef unlink (@) unpack ($;$) -unshift (\@@) +unshift (+@) untie (\[$@%*]) until undef use undef utime (@) -values (\[@%]) +values (+) vec ($$$) wait () waitpid ($$) diff --git a/t/op/push.t b/t/op/push.t index 2024706..e903a97 100644 --- a/t/op/push.t +++ b/t/op/push.t @@ -14,7 +14,7 @@ -4, 4 5 6 7, 0 1 2 3 EOF -print "1..", 4 + @tests, "\n"; +print "1..", 11 + 2*@tests, "\n"; die "blech" unless @tests; @x = (1,2,3); @@ -35,18 +35,46 @@ if (join(':',@x) eq '1:2:3:1:2:3:4:3') {print "ok 3\n";} else {print "not ok 3\n } if (join(':',@x) eq '1:2:3:1:2:3:4') {print "ok 4\n";} else {print "not ok 4\n";} -$test = 5; +# test for push/pop on arrayref +push(\@x,5); +if (join(':',@x) eq '1:2:3:1:2:3:4:5') {print "ok 5\n";} else {print "not ok 5\n";} +pop(\@x); +if (join(':',@x) eq '1:2:3:1:2:3:4') {print "ok 6\n";} else {print "not ok 6\n";} + +# test autovivification +push @$undef1, 1, 2, 3; +if (join(':',@$undef1) eq '1:2:3') {print "ok 7\n";} else {print "not ok 7\n";} +push $undef2, 1, 2, 3; +if (join(':',@$undef2) eq '1:2:3') {print "ok 8\n";} else {print "not ok 8\n";} + +# test constant +use constant CONST_ARRAYREF => [qw/a b c/]; +push CONST_ARRAYREF(), qw/d e f/; +if (join(':',@{CONST_ARRAYREF()}) eq 'a:b:c:d:e:f') {print "ok 9\n";} else {print "not ok 9\n";} + +# test implicit dereference errors +eval "push 42, 0, 1, 2, 3"; +if ( $@ && $@ =~ /must be array/ ) {print "ok 10\n"} else {print "not ok 10 # \$\@ = $@\n"} + +$hashref = { }; +eval { push $hashref, 0, 1, 2, 3 }; +if ( $@ && $@ =~ /Not an ARRAY reference/ ) {print "ok 11\n"} else {print "not ok 11 # \$\@ = $@\n"} + +$test = 12; foreach $line (@tests) { ($list,$get,$leave) = split(/,\t*/,$line); ($pos, $len, @list) = split(' ',$list); @get = split(' ',$get); @leave = split(' ',$leave); @x = (0,1,2,3,4,5,6,7); + $y = [0,1,2,3,4,5,6,7]; if (defined $len) { @got = splice(@x, $pos, $len, @list); + @got2 = splice(@$y, $pos, $len, @list); } else { @got = splice(@x, $pos); + @got2 = splice(@$y, $pos); } if (join(':',@got) eq join(':',@get) && join(':',@x) eq join(':',@leave)) { @@ -55,6 +83,13 @@ foreach $line (@tests) { else { print "not ok ",$test++," got: @got == @get left: @x == @leave\n"; } + if (join(':',@got2) eq join(':',@get) && + join(':',@$y) eq join(':',@leave)) { + print "ok ",$test++,"\n"; + } + else { + print "not ok ",$test++," got (arrayref): @got2 == @get left: @$y == @leave\n"; + } } 1; # this file is require'd by lib/tie-stdpush.t diff --git a/t/op/smartkve.t b/t/op/smartkve.t new file mode 100644 index 0000000..4cb19f5 --- /dev/null +++ b/t/op/smartkve.t @@ -0,0 +1,361 @@ +#!./perl + +BEGIN { + chdir 't' if -d 't'; + @INC = '../lib'; + require './test.pl'; +} +use strict; +use warnings; +no warnings 'deprecated'; +use vars qw($data $array $values $hash); + +plan 'no_plan'; + +sub j { join(":",@_) } + +BEGIN { # in BEGIN for "use constant ..." later + $array = [ qw(pi e i) ]; + $values = [ 3.14, 2.72, -1 ]; + $hash = { pi => 3.14, e => 2.72, i => -1 } ; + $data = { + hash => { %$hash }, + array => [ @$array ], + }; +} + +package Foo; +sub new { + my $self = { + hash => {%{$main::hash} }, + array => [@{$main::array}] + }; + bless $self, shift; +} +sub hash { no overloading; $_[0]->{hash} }; +sub array { no overloading; $_[0]->{array} }; + +package Foo::Overload::Array; +sub new { return bless [ qw/foo bar/ ], shift } +use overload '@{}' => sub { $main::array }, fallback => 1; + +package Foo::Overload::Hash; +sub new { return bless { qw/foo bar/ }, shift } +use overload '%{}' => sub { $main::hash }, fallback => 1; + +package Foo::Overload::Both; +sub new { return bless { qw/foo bar/ }, shift } +use overload '%{}' => sub { $main::hash }, + '@{}' => sub { $main::array }, fallback => 1; + +package Foo::Overload::HashOnArray; +sub new { return bless [ qw/foo bar/ ], shift } +use overload '%{}' => sub { $main::hash }, fallback => 1; + +package Foo::Overload::ArrayOnHash; +sub new { return bless { qw/foo bar/ }, shift } +use overload '@{}' => sub { $main::array }, fallback => 1; + +package main; + +use constant CONST_HASH => { %$hash }; +use constant CONST_ARRAY => [ @$array ]; + +my %a_hash = %$hash; +my @an_array = @$array; +sub hash_sub { return \%a_hash; } +sub array_sub { return \@an_array; } + +my $obj = Foo->new; + +my ($empty, $h_expect, $a_expect, @tmp, @tmp2, $k, $v); + +# Keys -- void + +keys $hash; pass('Void: keys $hash;'); +keys $data->{hash}; pass('Void: keys $data->{hash};'); +keys CONST_HASH; pass('Void: keys CONST_HASH;'); +keys CONST_HASH(); pass('Void: keys CONST_HASH();'); +keys hash_sub(); pass('Void: keys hash_sub();'); +keys hash_sub; pass('Void: keys hash_sub;'); +keys $obj->hash; pass('Void: keys $obj->hash;'); +keys $array; pass('Void: keys $array;'); +keys $data->{array}; pass('Void: keys $data->{array};'); +keys CONST_ARRAY; pass('Void: keys CONST_ARRAY;'); +keys CONST_ARRAY(); pass('Void: keys CONST_ARRAY();'); +keys array_sub; pass('Void: keys array_sub;'); +keys array_sub(); pass('Void: keys array_sub();'); +keys $obj->array; pass('Void: keys $obj->array;'); + +# Keys -- scalar + +is(keys $hash ,3, 'Scalar: keys $hash'); +is(keys $data->{hash} ,3, 'Scalar: keys $data->{hash}'); +is(keys CONST_HASH ,3, 'Scalar: keys CONST_HASH'); +is(keys CONST_HASH() ,3, 'Scalar: keys CONST_HASH()'); +is(keys hash_sub ,3, 'Scalar: keys hash_sub'); +is(keys hash_sub() ,3, 'Scalar: keys hash_sub()'); +is(keys $obj->hash ,3, 'Scalar: keys $obj->hash'); +is(keys $array ,3, 'Scalar: keys $array'); +is(keys $data->{array} ,3, 'Scalar: keys $data->{array}'); +is(keys CONST_ARRAY ,3, 'Scalar: keys CONST_ARRAY'); +is(keys CONST_ARRAY() ,3, 'Scalar: keys CONST_ARRAY()'); +is(keys array_sub ,3, 'Scalar: keys array_sub'); +is(keys array_sub() ,3, 'Scalar: keys array_sub()'); +is(keys $obj->array ,3, 'Scalar: keys $obj->array'); + +# Keys -- list + +$h_expect = j(keys %$hash); +$a_expect = j(keys @$array); + +is(j(keys $hash) ,$h_expect, 'List: keys $hash'); +is(j(keys $data->{hash}) ,$h_expect, 'List: keys $data->{hash}'); +is(j(keys CONST_HASH) ,$h_expect, 'List: keys CONST_HASH'); +is(j(keys CONST_HASH()) ,$h_expect, 'List: keys CONST_HASH()'); +is(j(keys hash_sub) ,$h_expect, 'List: keys hash_sub'); +is(j(keys hash_sub()) ,$h_expect, 'List: keys hash_sub()'); +is(j(keys $obj->hash) ,$h_expect, 'List: keys $obj->hash'); +is(j(keys $array) ,$a_expect, 'List: keys $array'); +is(j(keys $data->{array}) ,$a_expect, 'List: keys $data->{array}'); +is(j(keys CONST_ARRAY) ,$a_expect, 'List: keys CONST_ARRAY'); +is(j(keys CONST_ARRAY()) ,$a_expect, 'List: keys CONST_ARRAY()'); +is(j(keys array_sub) ,$a_expect, 'List: keys array_sub'); +is(j(keys array_sub()) ,$a_expect, 'List: keys array_sub()'); +is(j(keys $obj->array) ,$a_expect, 'List: keys $obj->array'); + +# Keys -- undef + +undef $empty; +is(j(keys undef), '', 'Undef: keys undef is empty list'); +is(j(keys $empty), '', 'Undef: keys $empty is empty list'); +is($empty, undef, 'Undef: $empty is not vivified'); + +# Keys -- vivification +is(j(keys $empty->{hash}), '', 'Vivify: keys $empty->{hash}'); +ok(defined $empty , 'Vivify: $empty is HASHREF'); +ok(!defined $empty->{hash} , 'Vivify: $empty->{hash} is undef'); + +# Keys -- errors +eval "keys 3"; +ok($@ =~ qr/Type of argument to keys on reference must be hashref or arrayref/, + 'Errors: keys CONSTANT throws error' +); + +eval "keys qr/foo/"; +ok($@ =~ qr/Type of argument to keys on reference must be hashref or arrayref/, + 'Errors: keys qr/foo/ throws error' +); + +eval "keys $hash qw/fo bar/"; +ok($@ =~ qr/syntax error/, + 'Errors: keys $hash, @stuff throws error' +) or print "# Got: $@"; + +# Values -- void + +values $hash; pass('Void: values $hash;'); +values $data->{hash}; pass('Void: values $data->{hash};'); +values CONST_HASH; pass('Void: values CONST_HASH;'); +values CONST_HASH(); pass('Void: values CONST_HASH();'); +values hash_sub(); pass('Void: values hash_sub();'); +values hash_sub; pass('Void: values hash_sub;'); +values $obj->hash; pass('Void: values $obj->hash;'); +values $array; pass('Void: values $array;'); +values $data->{array}; pass('Void: values $data->{array};'); +values CONST_ARRAY; pass('Void: values CONST_ARRAY;'); +values CONST_ARRAY(); pass('Void: values CONST_ARRAY();'); +values array_sub; pass('Void: values array_sub;'); +values array_sub(); pass('Void: values array_sub();'); +values $obj->array; pass('Void: values $obj->array;'); + +# Values -- scalar + +is(values $hash ,3, 'Scalar: values $hash'); +is(values $data->{hash} ,3, 'Scalar: values $data->{hash}'); +is(values CONST_HASH ,3, 'Scalar: values CONST_HASH'); +is(values CONST_HASH() ,3, 'Scalar: values CONST_HASH()'); +is(values hash_sub ,3, 'Scalar: values hash_sub'); +is(values hash_sub() ,3, 'Scalar: values hash_sub()'); +is(values $obj->hash ,3, 'Scalar: values $obj->hash'); +is(values $array ,3, 'Scalar: values $array'); +is(values $data->{array} ,3, 'Scalar: values $data->{array}'); +is(values CONST_ARRAY ,3, 'Scalar: values CONST_ARRAY'); +is(values CONST_ARRAY() ,3, 'Scalar: values CONST_ARRAY()'); +is(values array_sub ,3, 'Scalar: values array_sub'); +is(values array_sub() ,3, 'Scalar: values array_sub()'); +is(values $obj->array ,3, 'Scalar: values $obj->array'); + +# Values -- list + +$h_expect = j(values %$hash); +$a_expect = j(values @$array); + +is(j(values $hash) ,$h_expect, 'List: values $hash'); +is(j(values $data->{hash}) ,$h_expect, 'List: values $data->{hash}'); +is(j(values CONST_HASH) ,$h_expect, 'List: values CONST_HASH'); +is(j(values CONST_HASH()) ,$h_expect, 'List: values CONST_HASH()'); +is(j(values hash_sub) ,$h_expect, 'List: values hash_sub'); +is(j(values hash_sub()) ,$h_expect, 'List: values hash_sub()'); +is(j(values $obj->hash) ,$h_expect, 'List: values $obj->hash'); +is(j(values $array) ,$a_expect, 'List: values $array'); +is(j(values $data->{array}) ,$a_expect, 'List: values $data->{array}'); +is(j(values CONST_ARRAY) ,$a_expect, 'List: values CONST_ARRAY'); +is(j(values CONST_ARRAY()) ,$a_expect, 'List: values CONST_ARRAY()'); +is(j(values array_sub) ,$a_expect, 'List: values array_sub'); +is(j(values array_sub()) ,$a_expect, 'List: values array_sub()'); +is(j(values $obj->array) ,$a_expect, 'List: values $obj->array'); + +# Values -- undef + +undef $empty; +is(j(values undef), '', 'Undef: values undef is empty list'); +is(j(values $empty), '', 'Undef: values $empty is empty list'); +is($empty, undef, 'Undef: $empty is not vivified'); + +# Values -- vivification +is(j(values $empty->{hash}), '', 'Vivify: values $empty->{hash}'); +ok(defined $empty , 'Vivify: $empty is HASHREF'); +ok(!defined $empty->{hash} , 'Vivify: $empty->{hash} is undef'); + +# Values -- errors +eval "values 3"; +ok($@ =~ qr/Type of argument to values on reference must be hashref or arrayref/, + 'Errors: values CONSTANT throws error' +); + +eval "values qr/foo/"; +ok($@ =~ qr/Type of argument to values on reference must be hashref or arrayref/, + 'Errors: values qr/foo/ throws error' +); + +eval "values $hash qw/fo bar/"; +ok($@ =~ qr/syntax error/, + 'Errors: values $hash, @stuff throws error' +) or print "# Got: $@"; + +# Each -- void + +each $hash; pass('Void: each $hash'); +each $data->{hash}; pass('Void: each $data->{hash}'); +each CONST_HASH; pass('Void: each CONST_HASH'); +each CONST_HASH(); pass('Void: each CONST_HASH()'); +each hash_sub(); pass('Void: each hash_sub()'); +each hash_sub; pass('Void: each hash_sub'); +each $obj->hash; pass('Void: each $obj->hash'); +each $array; pass('Void: each $array'); +each $data->{array}; pass('Void: each $data->{array}'); +each CONST_ARRAY; pass('Void: each CONST_ARRAY'); +each CONST_ARRAY(); pass('Void: each CONST_ARRAY()'); +each array_sub; pass('Void: each array_sub'); +each array_sub(); pass('Void: each array_sub()'); +each $obj->array; pass('Void: each $obj->array'); + +# Reset iterators + +keys $hash; +keys $data->{hash}; +keys CONST_HASH; +keys CONST_HASH(); +keys hash_sub(); +keys hash_sub; +keys $obj->hash; +keys $array; +keys $data->{array}; +keys CONST_ARRAY; +keys CONST_ARRAY(); +keys array_sub; +keys array_sub(); +keys $obj->array; + +# Each -- scalar + +@tmp=(); while(defined( $k = each $hash)) {push @tmp,$k}; is(j(@tmp),j(keys $hash), 'Scalar: each $hash'); +@tmp=(); while(defined( $k = each $data->{hash})){push @tmp,$k}; is(j(@tmp),j(keys $data->{hash}), 'Scalar: each $data->{hash}'); +@tmp=(); while(defined( $k = each CONST_HASH)){push @tmp,$k}; is(j(@tmp),j(keys CONST_HASH), 'Scalar: each CONST_HASH'); +@tmp=(); while(defined( $k = each CONST_HASH())){push @tmp,$k}; is(j(@tmp),j(keys CONST_HASH()), 'Scalar: each CONST_HASH()'); +@tmp=(); while(defined( $k = each hash_sub())){push @tmp,$k}; is(j(@tmp),j(keys hash_sub()), 'Scalar: each hash_sub()'); +@tmp=(); while(defined( $k = each hash_sub)){push @tmp,$k}; is(j(@tmp),j(keys hash_sub), 'Scalar: each hash_sub'); +@tmp=(); while(defined( $k = each $obj->hash)){push @tmp,$k}; is(j(@tmp),j(keys $obj->hash), 'Scalar: each $obj->hash'); +@tmp=(); while(defined( $k = each $array)){push @tmp,$k}; is(j(@tmp),j(keys $array), 'Scalar: each $array'); +@tmp=(); while(defined( $k = each $data->{array})){push @tmp,$k}; is(j(@tmp),j(keys $data->{array}), 'Scalar: each $data->{array}'); +@tmp=(); while(defined( $k = each CONST_ARRAY)){push @tmp,$k}; is(j(@tmp),j(keys CONST_ARRAY), 'Scalar: each CONST_ARRAY'); +@tmp=(); while(defined( $k = each CONST_ARRAY())){push @tmp,$k}; is(j(@tmp),j(keys CONST_ARRAY()), 'Scalar: each CONST_ARRAY()'); +@tmp=(); while(defined( $k = each array_sub)){push @tmp,$k}; is(j(@tmp),j(keys array_sub), 'Scalar: each array_sub'); +@tmp=(); while(defined( $k = each array_sub())){push @tmp,$k}; is(j(@tmp),j(keys array_sub()), 'Scalar: each array_sub()'); +@tmp=(); while(defined( $k = each $obj->array)){push @tmp,$k}; is(j(@tmp),j(keys $obj->array), 'Scalar: each $obj->array'); + +# Each -- list + +@tmp=@tmp2=(); while(($k,$v) = each $hash) {push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys $hash, values $hash), 'List: each $hash'); +@tmp=@tmp2=(); while(($k,$v) = each $data->{hash}){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys $data->{hash}, values $data->{hash}), 'List: each $data->{hash}'); +@tmp=@tmp2=(); while(($k,$v) = each CONST_HASH){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys CONST_HASH, values CONST_HASH), 'List: each CONST_HASH'); +@tmp=@tmp2=(); while(($k,$v) = each CONST_HASH()){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys CONST_HASH(), values CONST_HASH()), 'List: each CONST_HASH()'); +@tmp=@tmp2=(); while(($k,$v) = each hash_sub()){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys hash_sub(), values hash_sub()), 'List: each hash_sub()'); +@tmp=@tmp2=(); while(($k,$v) = each hash_sub){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys hash_sub, values hash_sub), 'List: each hash_sub'); +@tmp=@tmp2=(); while(($k,$v) = each $obj->hash){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys $obj->hash, values $obj->hash), 'List: each $obj->hash'); +@tmp=@tmp2=(); while(($k,$v) = each $array){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys $array, values $array), 'List: each $array'); +@tmp=@tmp2=(); while(($k,$v) = each $data->{array}){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys $data->{array}, values $data->{array}), 'List: each $data->{array}'); +@tmp=@tmp2=(); while(($k,$v) = each CONST_ARRAY){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys CONST_ARRAY, values CONST_ARRAY), 'List: each CONST_ARRAY'); +@tmp=@tmp2=(); while(($k,$v) = each CONST_ARRAY()){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys CONST_ARRAY(), values CONST_ARRAY()), 'List: each CONST_ARRAY()'); +@tmp=@tmp2=(); while(($k,$v) = each array_sub){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys array_sub, values array_sub), 'List: each array_sub'); +@tmp=@tmp2=(); while(($k,$v) = each array_sub()){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys array_sub(), values array_sub()), 'List: each array_sub()'); +@tmp=@tmp2=(); while(($k,$v) = each $obj->array){push @tmp,$k; push @tmp2,$v}; is(j(@tmp,@tmp2),j(keys $obj->array, values $obj->array), 'List: each $obj->array'); + +# Each -- undef + +undef $empty; +is(j(@{[each undef]}), '', 'Undef: each undef is empty list'); +is(j(@{[each $empty]}), '', 'Undef: each $empty is empty list'); +is($empty, undef, 'Undef: $empty is not vivified'); + +# Values -- vivification +is(j(@{[each $empty->{hash}]}), '', 'Vivify: each $empty->{hash} is empty list'); +ok(defined $empty , 'Vivify: $empty is HASHREF'); +ok(!defined $empty->{hash} , 'Vivify: $empty->{hash} is undef'); + +# Values -- errors +eval "each 3"; +ok($@ =~ qr/Type of argument to each on reference must be hashref or arrayref/, + 'Errors: each CONSTANT throws error' +); + +eval "each qr/foo/"; +ok($@ =~ qr/Type of argument to each on reference must be hashref or arrayref/, + 'Errors: each qr/foo/ throws error' +); + +eval "each $hash qw/foo bar/"; +ok($@ =~ qr/syntax error/, + 'Errors: each $hash, @stuff throws error' +) or print "# Got: $@"; + +# Overloaded objects +my $over_a = Foo::Overload::Array->new; +my $over_h = Foo::Overload::Hash->new; +my $over_b = Foo::Overload::Both->new; +my $over_h_a = Foo::Overload::HashOnArray->new; +my $over_a_h = Foo::Overload::ArrayOnHash->new; + +my $re_warn_array = qr/Ambiguous overloaded argument to keys on reference resolved as \@\{\}/; +my $re_warn_hash = qr/Ambiguous overloaded argument to keys on reference resolved as \%\{\}/; + +{ + my $warn = ''; + local $SIG{__WARN__} = sub { $warn = shift }; + + is(j(keys $over_a), j(keys @$array), "Overload: array dereference"); + is($warn, '', "no warning issued"); $warn = ''; + + is(j(keys $over_h), j(keys %$hash), "Overload: hash dereference"); + is($warn, '', "no warning issued"); $warn = ''; + + is(j(keys $over_b), j(keys %$hash), "Overload: ambiguous dereference (both) resolves to hash"); + like($warn, $re_warn_hash, "warning correct"); $warn = ''; + + is(j(keys $over_h_a), j(keys %$hash), "Overload: ambiguous dereference resolves to hash"); + like($warn, $re_warn_hash, "warning correct"); $warn = ''; + + is(j(keys $over_a_h), j(keys @$array), "Overload: ambiguous dereference resolves to array"); + like($warn, $re_warn_array, "warning correct"); $warn = ''; +} diff --git a/t/op/splice.t b/t/op/splice.t index 93718a1..07a3e67 100644 --- a/t/op/splice.t +++ b/t/op/splice.t @@ -1,6 +1,6 @@ #!./perl -print "1..20\n"; +print "1..21\n"; @a = (1..10); @@ -92,3 +92,8 @@ splice @Foo::ISA, 0, 0, 'Bar'; print "not " if !Foo->isa('Bar'); print "ok 20\n"; + +# Test vivification +splice( $new_arrayref, 0, 0, 1, 2, 3 ); +print "not " unless j(@$new_arrayref) eq j(1,2,3); +print "ok 21\n"; diff --git a/t/op/unshift.t b/t/op/unshift.t index 9659ee4..475b3e7 100644 --- a/t/op/unshift.t +++ b/t/op/unshift.t @@ -4,64 +4,98 @@ BEGIN { require "test.pl"; } -plan(18); +plan(36); @array = (1, 2, 3); +$aref = [1, 2, 3]; { no warnings 'syntax'; $count3 = unshift (@array); + $count3r = unshift ($aref); } is(join(' ',@array), '1 2 3', 'unshift null'); cmp_ok($count3, '==', 3, 'unshift count == 3'); +is(join(' ',@$aref), '1 2 3', 'unshift null (ref)'); +cmp_ok($count3r, '==', 3, 'unshift count == 3 (ref)'); + $count3_2 = unshift (@array, ()); is(join(' ',@array), '1 2 3', 'unshift null empty'); cmp_ok($count3_2, '==', 3, 'unshift count == 3 again'); +$count3_2r = unshift ($aref, ()); +is(join(' ',@$aref), '1 2 3', 'unshift null empty (ref)'); +cmp_ok($count3_2r, '==', 3, 'unshift count == 3 again (ref)'); $count4 = unshift (@array, 0); is(join(' ',@array), '0 1 2 3', 'unshift singleton list'); cmp_ok($count4, '==', 4, 'unshift count == 4'); +$count4r = unshift ($aref, 0); +is(join(' ',@$aref), '0 1 2 3', 'unshift singleton list (ref)'); +cmp_ok($count4r, '==', 4, 'unshift count == 4 (ref)'); $count7 = unshift (@array, 3, 2, 1); is(join(' ',@array), '3 2 1 0 1 2 3', 'unshift list'); cmp_ok($count7, '==', 7, 'unshift count == 7'); +$count7r = unshift ($aref, 3, 2, 1); +is(join(' ',@$aref), '3 2 1 0 1 2 3', 'unshift list (ref)'); +cmp_ok($count7r, '==', 7, 'unshift count == 7 (ref)'); @list = (5, 4); $count9 = unshift (@array, @list); is(join(' ',@array), '5 4 3 2 1 0 1 2 3', 'unshift array'); cmp_ok($count9, '==', 9, 'unshift count == 9'); +$count9r = unshift ($aref, @list); +is(join(' ',@$aref), '5 4 3 2 1 0 1 2 3', 'unshift array (ref)'); +cmp_ok($count9r, '==', 9, 'unshift count == 9 (ref)'); + @list = (7); @list2 = (6); $count11 = unshift (@array, @list, @list2); is(join(' ',@array), '7 6 5 4 3 2 1 0 1 2 3', 'unshift arrays'); cmp_ok($count11, '==', 11, 'unshift count == 11'); +$count11r = unshift ($aref, @list, @list2); +is(join(' ',@$aref), '7 6 5 4 3 2 1 0 1 2 3', 'unshift arrays (ref)'); +cmp_ok($count11r, '==', 11, 'unshift count == 11 (ref)'); # ignoring counts @alpha = ('y', 'z'); +$alpharef = ['y', 'z']; { no warnings 'syntax'; unshift (@alpha); + unshift ($alpharef); } is(join(' ',@alpha), 'y z', 'void unshift null'); +is(join(' ',@$alpharef), 'y z', 'void unshift null (ref)'); unshift (@alpha, ()); is(join(' ',@alpha), 'y z', 'void unshift null empty'); +unshift ($alpharef, ()); +is(join(' ',@$alpharef), 'y z', 'void unshift null empty (ref)'); unshift (@alpha, 'x'); is(join(' ',@alpha), 'x y z', 'void unshift singleton list'); +unshift ($alpharef, 'x'); +is(join(' ',@$alpharef), 'x y z', 'void unshift singleton list (ref)'); unshift (@alpha, 'u', 'v', 'w'); is(join(' ',@alpha), 'u v w x y z', 'void unshift list'); +unshift ($alpharef, 'u', 'v', 'w'); +is(join(' ',@$alpharef), 'u v w x y z', 'void unshift list (ref)'); @bet = ('s', 't'); unshift (@alpha, @bet); is(join(' ',@alpha), 's t u v w x y z', 'void unshift array'); +unshift ($alpharef, @bet); +is(join(' ',@$alpharef), 's t u v w x y z', 'void unshift array (ref)'); @bet = ('q'); @gimel = ('r'); unshift (@alpha, @bet, @gimel); is(join(' ',@alpha), 'q r s t u v w x y z', 'void unshift arrays'); +unshift ($alpharef, @bet, @gimel); +is(join(' ',@$alpharef), 'q r s t u v w x y z', 'void unshift arrays (ref)'); -- 1.7.1 ```
p5pRT commented 14 years ago

From @sciurius

David Golden (via RT) \perlbug\-followup@&#8203;perl\.org writes​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

+1

-- Johan

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

From @nwc10

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)

This is not correct.

*Your blog* is not the right place to report bugs (or performance regressions)

Report your benchmark here\, and I will demonstrate that this was not the cause. The code changes for keys/values on arrays was entirely a a compile- time action. The runtime code for keys/arrays never changed.)

Your science is duff. Duff science annoys me.

(I believe that you will find that the cause actually relates to assignment and lexical variables\, specifically fafafbaf705adfdf and then 0f907b96d618c97c)

I suspect that\, if accepted\, similar optimizations might be possible for push/pop/etc ops.

Which would continue to increase the complexity of the core codebase.

As mentioned in the summary\, the changes to keys/values/each to accept references happened to lead to a significant performance improvement\, recovering some of what was lost when 5.12 changed the implementation to allow keys/values/each to act on arrays in the first place.

This is false.

*Also*\, your changes bring no benefit unless code is rewritten to use them. Any code written to use them is incompatible with existing perl installations\, and can't help widely deployed code for many years to come.

The speed gain mainly comes from removing 1 op from the *benchmark*'s hot path. The cost is complexity in the core codebase\, which increases the maintainability burden\, and likely slightly slows down some other part of perl.

A more general optimisation *for existing code* would be to change the peephole optimiser to recognise this structure​:

$ perl -MO=Concise -e 'my $a; print keys %$a' a \<@​> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e​:1) v ->3 3 \<0> padsv[$a​:1\,2] vM/LVINTRO ->4 4 \<;> nextstate(main 2 -e​:1) v ->5 9 \<@​> print vK ->a 5 \<0> pushmark s ->6 8 \<1> keys[t3] lK/1 ->9 7 \<1> rv2hv[t2] lKRM/1 ->8 6 \<0> padsv[$a​:1\,2] sM/DREFHV ->7 -e syntax OK

and replace it with something close to this one​:

$ perl -MO=Concise -e 'my %a; print keys %a' 9 \<@​> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e​:1) v ->3 3 \<0> padhv[%a​:1\,2] vM/LVINTRO ->4 4 \<;> nextstate(main 2 -e​:1) v ->5 8 \<@​> print vK ->9 5 \<0> pushmark s ->6 7 \<1> keys[t2] lK/1 ->8 6 \<0> padhv[%a​:1\,2] lRM ->7 -e syntax OK

ie remove the rv2hv OP\, and replace the keys op with one that will initially implement the dereference of rv2hv.

We should be discussing the merits/demerits of this functionality solely on syntax\, not on speed.

Nicholas Clark

p5pRT commented 14 years ago

From ambrus@math.bme.hu

On Thu\, Oct 28\, 2010 at 12​:42 AM\, David Golden \perlbug\-followup@&#8203;perl\.org wrote​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; keys $hashref;

Do I understand it correctly that you can write any expression returning a reference in place of $arrayref? Is it right that there's no ambiguity between this and the traditional syntax because an array/hash expression ('@​foo' or '%foo') or an array/hash dereference expression ('@​$foo' or '@​{foo()}' or '%$foo' or '%{foo()}') can never return a reference in scalar context\, for it returns either a plain number or a '3/8' style string? Do you just assume nobody will want to tie a has %foo so it returns a reference to some other hash or array\, and then try to use keys(%foo) as a shortcut for keys(@​%foo) ?

Ambrus

p5pRT commented 14 years ago

From @tux

On Wed\, 27 Oct 2010 15​:42​:44 -0700\, David Golden (via RT) \perlbug\-followup@&#8203;perl\.org wrote​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows​:

WOW! YES! A killing new feature for 5.14 :)

I personally wanted to see if this

  my %hash;   push @​{$hash{new}{entry}}\, 42;

could now be written as

  my %hash;   push $has{new}{entry}\, 42;

as that kind of autovivivication would be where I would use this new feature most. So I checked out blead\, applied the patch and tested​:

Test Summary Report


op/stash.t (Wstat​: 0 Tests​: 46 Failed​: 0)   TODO passed​: 42\, 46 porting/manifest.t (Wstat​: 0 Tests​: 9919 Failed​: 3)   Failed tests​: 9914\, 9918-9919 ../ext/XS-APItest/t/caller.t (Wstat​: 0 Tests​: 24 Failed​: 0)   TODO passed​: 22-23 Files=1951\, Tests=403036\, 249 wallclock secs (13.12 usr 62.58 sys + 447.95 cusr 35.43 csys = 559.08 CPU) Result​: FAIL

Before patch​:

$ perl -MData​::Dumper -wle'my%h;push$h{new}{entry}\,42;print Dumper\%h' Type of arg 1 to push must be array (not hash element) at -e line 1\, near "42;" Execution of -e aborted due to compilation errors.

After patch​:

$ ./perl -Ilib -MData​::Dumper -wle'my%h;push$h{new}{entry}\,42;print Dumper\%h' $VAR1 = {   'new' => {   'entry' => [   42   ]   }   };

# # ####### ##### ## # # # # # ## # # # # ##   # # ##### ##### ##   # # #   # # # # ##   # ####### ##### ##

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 14 years ago

From @nwc10

On Thu\, Oct 28\, 2010 at 10​:54​:07AM +0200\, H.Merijn Brand wrote​:

WOW! YES! A killing new feature for 5.14 :)

I thought we already had a roughly 5% general speedup over 5.12. (Although other things may have eaten away at that)

Nicholas Clark

p5pRT commented 14 years ago

From @tux

On Thu\, 28 Oct 2010 10​:54​:07 +0200\, "H.Merijn Brand" \h\.m\.brand@&#8203;xs4all\.nl wrote​:

On Wed\, 27 Oct 2010 15​:42​:44 -0700\, David Golden (via RT) \perlbug\-followup@&#8203;perl\.org wrote​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows​:

WOW! YES! A killing new feature for 5.14 :)

I personally wanted to see if this

my %hash; push @​{$hash{new}{entry}}\, 42;

could now be written as

my %hash; push $has{new}{entry}\, 42;

as that kind of autovivivication would be where I would use this new feature most. So I checked out blead\, applied the patch and tested​:

--8\<---

Inline Patch ```diff --- t/op/push.t 2010-10-28 11:00:35.284705589 +0200 +++ t/op/push.t 2010-10-28 11:00:17.872705573 +0200 @@ -14,7 +14,7 @@ -4, 4 5 6 7, 0 1 2 3 EOF -print "1..", 11 + 2*@tests, "\n"; +print "1..", 12 + 2*@tests, "\n"; die "blech" unless @tests; @x = (1,2,3); @@ -92,4 +92,16 @@ foreach $line (@tests) { } } +my %hash; +push $hash{new}{entry}, 42; +if ( exists $hash{new} && exists $hash{new}{entry} + && ref $hash{new}{entry} eq "ARRAY" + && $hash{new}{entry}[0] == 42) { + print "ok ",$test++,"\n"; +} +else { + print "not ok ",$test++," \$hash{new}{entry} != [ 42 ]\n"; +} + + 1; # this file is require'd by lib/tie-stdpush.t -->8--- -- ```

H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 14 years ago

From @demerphq

On 28 October 2010 10​:35\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)

Only tenusously on subject..... But I did write a patch\, which never got applied\, which made

if (%$hash) {}

as fast as

if (keys %$hash) {}

That is\, in place changing the former to the latter when used in boolean context.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

From @tux

On Thu\, 28 Oct 2010 09​:59​:00 +0100\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Thu\, Oct 28\, 2010 at 10​:54​:07AM +0200\, H.Merijn Brand wrote​:

WOW! YES! A killing new feature for 5.14 :)

I thought we already had a roughly 5% general speedup over 5.12. (Although other things may have eaten away at that)

The more reasons to get people to install 5.14 the better.

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 14 years ago

From @nwc10

On Thu\, Oct 28\, 2010 at 11​:04​:39AM +0200\, demerphq wrote​:

On 28 October 2010 10​:35\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)

Only tenusously on subject..... But I did write a patch\, which never got applied\, which made

if (%$hash) {}

as fast as

if (keys %$hash) {}

That is\, in place changing the former to the latter when used in boolean context.

IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op​:

$ ~/Sandpit/5120-g/bin/perl5.12.0 -MO=Concise -e '1 if %hash' 7 \<@​> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e​:1) v​:{ ->3 - \<1> null vK/1 ->7 6 \<|> and(other->7) vK/1 ->7 5 \<1> boolkeys sK/1 ->6 4 \<1> rv2hv[t1] sKRM/1 ->5 3 \<$> gv(*hash) s ->4 - \<0> ex-const v ->-

which didn't use to exist​:

$ perl -MO=Concise -e '1 if %hash' 6 \<@​> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e​:1) v ->3 - \<1> null vK/1 ->6 5 \<|> and(other->6) vK/1 ->6 4 \<1> rv2hv[t2] sK/1 ->5 3 \<#> gv[*hash] s ->4 - \<0> ex-const v ->- -e syntax OK

I don't know if it ever got a "thanks applied" message.

Anyway\, if it didn't\, a belated "thanks" for it\, as it made the syntactically simplest way of expressing "has this hash got any keys" also be the fastest.

Nicholas Clark

p5pRT commented 14 years ago

From ambrus@math.bme.hu

On Thu\, Oct 28\, 2010 at 11​:12 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Thu\, Oct 28\, 2010 at 11​:04​:39AM +0200\, demerphq wrote​:

Only tenusously on subject..... But I did write a patch\, which never got applied\, which made

if (%$hash) {}

as fast as

if (keys %$hash) {}

That is\, in place changing the former to the latter when used in boolean context.

IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op​:

perl5120delta mentions this (section Selected Performance Enhancements)​:

"if (%foo)" has been optimized to be faster than "if (keys %foo)".

That's the same\, isn't it?

Ambrus

p5pRT commented 14 years ago

From @nwc10

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

# New Ticket Created by David Golden # Please include the string​: [perl #78656] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=78656 >

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows​:

* Summary of benefits * Summary of the patch * Rationale

_Summary of benefits_

As I see it\, the primary benefit is visually de-cluttering common operations in situations where the context (acting on a "container") is clear.

I think Merijn's advertising copy is better than yours. :-)

Was​:

  push @​{$hash->{$key}}\, $data2;

Now​:

  push $hash->{$key}\, $data2;

Am I right in thinking that all the new behaviour used to be syntax errors? (Specifically\, compile time errors)

For keys/values/each\, I added three new opcodes to handle the case where the argument is not a hash or array (or explicit dereference of them). All three ops are handled by one function that dereferences the argument (including any magic) and then dispatches to the corresponding hash or array implementation of the original function. If the argument is not a hash or array\, a runtime error is thrown.

The changes to push/pop/etc did not require the same degree of changes (i.e. I didn't add new opcodes for them) and I kept the patch as simple as possible. However\, if the patch is accepted\, I suspect that the same technique that allows optimization of "keys $hashref" could be explored later to optimize "push $arrayref\, @​list" over "push @​$arrayref\, @​list".

It's not obvious where the "speed"/space trade off lies.

For the lesser-used ops*\, on a modern CPU with a finite instruction cache and reasonable branch prediction\, at some point it will become better to less code\, achieved by having more branches\, because that code will more likely already be hot ready in the CPU.

For the array each/keys/values\, the code implementing them was "completely" different (ie an if/else with two large blocks)\, so it seemed to make more sense to split the two.

Nicholas Clark

* problem - collecting stats on what is used more\, and what is used less

p5pRT commented 14 years ago

From @xdg

On Thu\, Oct 28\, 2010 at 5​:21 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

As I see it\, the primary benefit is visually de-cluttering common operations in situations where the context (acting on a "container") is clear.

I think Merijn's advertising copy is better than yours. :-)

Was​:

  push @​{$hash->{$key}}\, $data2;

Now​:

  push $hash->{$key}\, $data2;

Agreed. Tux++

Am I right in thinking that all the new behaviour used to be syntax errors? (Specifically\, compile time errors)

Yes. Previously\, these functions were effectively prototyped to require a variable or an explicit dereference of the correct type. The new behavior acts comparable to an explicit dereference\, i.e. it gives a run-time error if the referent is not of the appropriate type.

The changes to push/pop/etc did not require the same degree of changes (i.e. I didn't add new opcodes for them) and I kept the patch as simple as possible.  However\, if the patch is accepted\, I suspect that the same technique that allows optimization of "keys $hashref" could be explored later to optimize "push $arrayref\, @​list" over "push @​$arrayref\, @​list".

It's not obvious where the "speed"/space trade off lies.

I agree. I used words like "explored" because it wasn't an obvious win and doing the major surgery needed wasn't necessary to provide this new feature.

-- David

p5pRT commented 14 years ago

From @obra

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

# New Ticket Created by David Golden # Please include the string​: [perl #78656] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=78656 >

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.

p5pRT commented 14 years ago

From @xdg

On Thu\, Oct 28\, 2010 at 4​:37 AM\, Zsbán Ambrus \ambrus@&#8203;math\.bme\.hu wrote​:

On Thu\, Oct 28\, 2010 at 12​:42 AM\, David Golden \perlbug\-followup@&#8203;perl\.org wrote​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

 push $arrayref\, @​stuff;  keys $hashref;

Do I understand it correctly that you can write any expression returning a reference in place of $arrayref?

Correct. You can do "keys gives_a_hashref()"

 Is it right that there's no ambiguity between this and the traditional syntax because an array/hash expression ('@​foo' or '%foo') or an array/hash dereference expression ('@​$foo' or '@​{foo()}' or '%$foo' or '%{foo()}') can never return a reference in scalar context\, for it returns either a plain number or a '3/8' style string?  Do you just assume nobody will want to tie a has %foo so it returns a reference to some other hash or array\, and then try to use keys(%foo) as a shortcut for keys(@​%foo) ?

I'm not sure I understand what you're asking with relation to ties. If you're asking about overloading\, it takes that into account and warns on ambiguity.

For example\, if you have "$obj = bless {}\, $class" where $class overloads %{}\, then if you do "keys $obj"\, the overloading will be used to get the hash to act on\, just as if you did %$obj directly. If you have an ambiguous case like "$obj = bless []\, $class" with $class providing %{} overloading\, then "keys $obj" will assume you want the %{} overloading and not a dereference of the underlying type\, and will issue a warning about it. However\, "$obj = bless \(my $scalar)\, $class" with %{} overloading will allow "keys $obj" without a warning.

-- David

p5pRT commented 14 years ago

From gr@univie.ac.at

On 28.10.2010\, at 14​:42\, Jesse Vincent wrote​:

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.

The only parts that come to mind are​:

- overloading - tie()ing - autobox

But I can't think of any problem in particular with those - just something to consider.

Marcel

p5pRT commented 14 years ago

From @xdg

On Thu\, Oct 28\, 2010 at 8​:49 AM\, Marcel Grünauer \gr@&#8203;univie\.ac\.at wrote​:

The only parts that come to mind are​:

- overloading - tie()ing - autobox

But I can't think of any problem in particular with those - just something to consider.

I've already described how overloading is handled in an earlier reply.

I don't see that this patch changes what happens with ties at all. If you pass a reference to a tied array or hash\, it gets dereferenced to the array or hash and then it follows the same code path that an explicit dereference would have followed (which calls the tied functions as usual).

I can't speak to autobox\, as that's black magic to me\, but my limited understanding is that it hooks method calls\, which is unrelated to this patch.

-- David

p5pRT commented 14 years ago

From @demerphq

On 28 October 2010 11​:12\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Thu\, Oct 28\, 2010 at 11​:04​:39AM +0200\, demerphq wrote​:

On 28 October 2010 10​:35\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)

Only tenusously on subject..... But I did write a patch\, which never got applied\, which made

if (%$hash) {}

as fast as

if (keys %$hash) {}

That is\, in place changing the former to the latter when used in boolean context.

IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op​:

$  ~/Sandpit/5120-g/bin/perl5.12.0 -MO=Concise -e '1 if %hash' 7  \<@​> leave[1 ref] vKP/REFC ->(end) 1     \<0> enter ->2 2     \<;> nextstate(main 1 -e​:1) v​:{ ->3 -     \<1> null vK/1 ->7 6        \<|> and(other->7) vK/1 ->7 5           \<1> boolkeys sK/1 ->6 4              \<1> rv2hv[t1] sKRM/1 ->5 3                 \<$> gv(*hash) s ->4 -           \<0> ex-const v ->-

which didn't use to exist​:

$ perl -MO=Concise -e '1 if %hash' 6  \<@​> leave[1 ref] vKP/REFC ->(end) 1     \<0> enter ->2 2     \<;> nextstate(main 1 -e​:1) v ->3 -     \<1> null vK/1 ->6 5        \<|> and(other->6) vK/1 ->6 4           \<1> rv2hv[t2] sK/1 ->5 3              \<#> gv[*hash] s ->4 -           \<0> ex-const v ->- -e syntax OK

I don't know if it ever got a "thanks applied" message.

Anyway\, if it didn't\, a belated "thanks" for it\, as it made the syntactically simplest way of expressing "has this hash got any keys" also be the fastest.

Oh. Cool!

:-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

From @demerphq

On 28 October 2010 11​:18\, Zsbán Ambrus \ambrus@&#8203;math\.bme\.hu wrote​:

On Thu\, Oct 28\, 2010 at 11​:12 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Thu\, Oct 28\, 2010 at 11​:04​:39AM +0200\, demerphq wrote​:

Only tenusously on subject..... But I did write a patch\, which never got applied\, which made

if (%$hash) {}

as fast as

if (keys %$hash) {}

That is\, in place changing the former to the latter when used in boolean context.

IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op​:

perl5120delta mentions this (section Selected Performance Enhancements)​:

"if (%foo)" has been optimized to be faster than "if (keys %foo)".

That's the same\, isn't it?

I guess so. If its my patch then it should be "as fast as"\, if its not\, well then thanks to whomever improved on the original idea. :-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

From @demerphq

On 28 October 2010 18​:46\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 28 October 2010 11​:18\, Zsbán Ambrus \ambrus@&#8203;math\.bme\.hu wrote​:

On Thu\, Oct 28\, 2010 at 11​:12 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Thu\, Oct 28\, 2010 at 11​:04​:39AM +0200\, demerphq wrote​:

Only tenusously on subject..... But I did write a patch\, which never got applied\, which made

if (%$hash) {}

as fast as

if (keys %$hash) {}

That is\, in place changing the former to the latter when used in boolean context.

IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op​:

perl5120delta mentions this (section Selected Performance Enhancements)​:

"if (%foo)" has been optimized to be faster than "if (keys %foo)".

That's the same\, isn't it?

I guess so. If its my patch then it should be "as fast as"\, if its not\, well then thanks to whomever improved on the original idea. :-)

Er. I need a very long vacation I think. It was my patch\, and it did speed things up. And Nicholas applied it\, and I even knew it.

/me goes back to sleep

yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

From j.imrie@virginmedia.com

On 28/10/2010 13​:42\, Jesse Vincent wrote​:

On Wed\, Oct 27\, 2010 at 03​:42​:44PM -0700\, David Golden wrote​:

# New Ticket Created by David Golden # Please include the string​: [perl #78656] # in the subject line of all future correspondence about this issue. #\<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=78656>

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.

My damming problem with this is you have pushed a compile time syntax error\, that I can test with perl -c\, being that push et all expect the next none white space character to be a '@​' to a run time check. that I have to now run the code on in order to check I am getting the correct reference passed in.

If Perl was strongly typed this would not be a problem. But it's not so it is. \

John

p5pRT commented 14 years ago

From @xdg

On Thu\, Oct 28\, 2010 at 3​:40 PM\, John \j\.imrie@&#8203;virginmedia\.com wrote​:

My damming problem with this is you have pushed a compile time syntax error\, that I can test with perl -c\, being that push et all expect the next none white space character to be a '@​' to a run time check. that I have to now run the code on in order to check I am getting the correct reference passed in.

If Perl was strongly typed this would not be a problem. But it's not so it is. \

Well\, "perl -c" wouldn't have helped you with "push @​{ $hashref }\, @​list" anyway. :-)

-- David

p5pRT commented 14 years ago

From @rurban

2010/10/28 David Golden \xdaveg@&#8203;gmail\.com​: --- a/opnames.h +++ b/opnames.h @​@​ -381\,10 +381\,13 @​@​ typedef enum opcode {   OP_LOCK = 363\,   OP_ONCE = 364\,   OP_CUSTOM = 365\, + OP_REACH = 366\, + OP_RKEYS = 367\, + OP_RVALUES = 368\,   OP_max } opcode;

Shouldn't custom always stay the last? Keeping the existing number is not needed for plc's as customs are a placeholder for custom ops. ;) -- Reini Urban http​://phpwiki.org/           http​://murbreak.at/

p5pRT commented 14 years ago

From @xdg

On Thu\, Oct 28\, 2010 at 8​:27 PM\, Reini Urban \rurban@&#8203;x\-ray\.at wrote​:

2010/10/28 David Golden \xdaveg@&#8203;gmail\.com​: --- a/opnames.h +++ b/opnames.h @​@​ -381\,10 +381\,13 @​@​ typedef enum opcode {       OP_LOCK          = 363\,       OP_ONCE          = 364\,       OP_CUSTOM        = 365\, +       OP_REACH         = 366\, +       OP_RKEYS         = 367\, +       OP_RVALUES       = 368\,       OP_max  } opcode;

Shouldn't custom always stay the last? Keeping the existing number is not needed for plc's as customs are a placeholder for custom ops. ;)

If you look right under __END__ in regen/opcode.pl it says this​:

  # New ops always go at the end   # The restriction on having custom as the last op has been removed

I just followed instructions.

-- David

p5pRT commented 14 years ago

From @hvds

David Golden (via RT) \perlbug\-followup@&#8203;perl\.org wrote​: :_Rationale_ [...] :Those who dislike the idea of this patch generally seem to be of the :opinion that dereferencing "should be explicit" or that the patch :"isn't necessary" (which may just be a variation of the first :objection). : :To the first objection\, I say that this is a matter of style and :reasonable people already disagree on the "best" Perl style. This :patch delivers a desired feature to those who do want it without :interfering with the ability of others to explicitly dereference :everything as a matter of personal (or corporate) style. : :I dismiss the second objection out of hand. By its very nature as a :community-driven project\, nothing in the development of Perl is :"necessary" and the very definition of "necessary" will vary from :person to person. In this case\, the work is desired and it is already :done.

I am for this patch\, but I think you treat these objections more lightly than you should. I can imagine many patches I would be strongly against whose proponents would offer the above arguments because they have no better case to make.

Maybe there are other points that could be added\, but I can think of at least​: - this breaks no existing code that was free of syntax errors; - it does what people would expect\, given the new constructs\, even without   reading the docs; - it does what people *have* expected\, and are tripped up by.

A change to the syntax affects anyone that reads or writes\, teaches or learns perl. TIMTOWTDI doesn't mean we are required to support *every* possible way to do it.

:--- a/t/op/push.t :+++ b/t/op/push.t [...] : foreach $line (@​tests) { : ($list\,$get\,$leave) = split(/\,\t*/\,$line); : ($pos\, $len\, @​list) = split(' '\,$list); : @​get = split(' '\,$get); : @​leave = split(' '\,$leave); : @​x = (0\,1\,2\,3\,4\,5\,6\,7); :+ $y = [0\,1\,2\,3\,4\,5\,6\,7]; : if (defined $len) { : @​got = splice(@​x\, $pos\, $len\, @​list); :+ @​got2 = splice(@​$y\, $pos\, $len\, @​list); : } : else { : @​got = splice(@​x\, $pos); :+ @​got2 = splice(@​$y\, $pos); : } : if (join('​:'\,@​got) eq join('​:'\,@​get) && : join('​:'\,@​x) eq join('​:'\,@​leave)) {

These look like they were intended to test C\< splice($y\, ...) > instead of\, or as well as\, C\< @​$y >.

I'd like to see some additional tests for which the newly unfettered array argument is returned by one of various subroutines. In particular\, I'd have liked to examine those to understand what would happen in a case such as​:

  my($a\, $b) = ([1]\, [2]);   sub twoargs { return +($a\, $b) }   push twoargs()\, 3;

My first guess would be C\< $a := [1\, [2]\, 3] >\, but I can imagine quite a few other possibilities.

Hugo

p5pRT commented 14 years ago

From @xdg

On Fri\, Oct 29\, 2010 at 4​:03 AM\, \hv@&#8203;crypt\.org wrote​:

These look like they were intended to test C\< splice($y\, ...) > instead of\, or as well as\, C\< @​$y >.

Thanks. I've fixed those.

I'd like to see some additional tests for which the newly unfettered array argument is returned by one of various subroutines. In particular\, I'd have liked to examine those to understand what would happen in a case such as​:

 my($a\, $b) = ([1]\, [2]);  sub twoargs { return +($a\, $b) }  push twoargs()\, 3;

My first guess would be C\< $a := [1\, [2]\, 3] >\, but I can imagine quite a few other possibilities.

That should do the same thing that "push @​{ twoargs() }\, 3;" would do\, but I'll add a test for that\, too.

-- David

p5pRT commented 14 years ago

From @xdg

On Fri\, Oct 29\, 2010 at 7​:30 AM\, David Golden \xdaveg@&#8203;gmail\.com wrote​:

 my($a\, $b) = ([1]\, [2]);  sub twoargs { return +($a\, $b) }  push twoargs()\, 3;

My first guess would be C\< $a := [1\, [2]\, 3] >\, but I can imagine quite a few other possibilities.

That should do the same thing that "push @​{ twoargs() }\, 3;" would do\, but I'll add a test for that\, too.

I've confirmed it with a test. $a is [1] and $b is [2\,3] (because in scalar context\, the list returns the last item\, which is then the target of push)

I've added the test\, the fix to splice tests and some other minor fixes (compiling under threads -- thank you rafl and avar)\, rebased it all to recent blead and published a private branch for ongoing review​:

  git​://github.com/dagolden/perl.git -- branch is "private/push-pop-keys-etc-on-refs"

Browseable​: http​://github.com/dagolden/perl/tree/private/push-pop-keys-etc-on-refs

-- David

p5pRT commented 14 years ago

From @xdg

On Sun\, Oct 31\, 2010 at 8​:26 AM\, Zsbán Ambrus \ambrus@&#8203;math\.bme\.hu wrote​:

None of this is any stranger than any existing perl syntax quirks. None of this can break existing code\, not even most obfus.  The only implication of all this is that you have to be careful when you document the new syntax in the manual.

I get it now. Thank you. I'll double-check the documentation and try to avoid implying exact equivalence.

-- David

p5pRT commented 14 years ago

From @cpansprout

On Wed Oct 27 15​:42​:44 2010\, dagolden@​cpan.org wrote​:

This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes​:

Please don’t do it!

It makes it harder for module authors to maintain backward compatibility.

If I have a function that returns a hash ref\, I am already unable to change it to\, say\, a blessed scalar with %{} overloading\, because then this would break​:

  *hash = get_hashref();

With your proposed change\, there is another limitation​: I cannot change a function that returns a hash ref to return a blesesd hash ref with @​{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.

In Perl\, the type of a scalar (and a reference is a scalar\, of course) is generally determined by the operators\, not the operands. Every exception to this is a design flaw and causes problems now and then. (See also bugs 1804\, 77496\, 77502\, 77508\, 77688\, 71686\, 77684\, 77810\, 20661\, 45133\, 77812\, 77492\, 57706 and 77926.) So\, can we please stop adding more of these cases?

(In case someone brings it up\, I never use ~~ or when() for precisely these reasons. And I hope no one using my modules ever uses them....)

p5pRT commented 14 years ago

From @cpansprout

On Thu Oct 28 09​:50​:54 2010\, demerphq wrote​:

/me goes back to sleep

yves

Wait! could you look at bugs 68564\, 70998 and 78356 first? Or am I too late? :-)

p5pRT commented 14 years ago

From [Unknown Contact. See original ticket]

On Thu Oct 28 09​:50​:54 2010\, demerphq wrote​:

/me goes back to sleep

yves

Wait! could you look at bugs 68564\, 70998 and 78356 first? Or am I too late? :-)

p5pRT commented 14 years ago

From ambrus@math.bme.hu

On Sun\, Oct 31\, 2010 at 9​:17 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Please don’t do it!

With your proposed change\, there is another limitation​: I cannot change a function that returns a hash ref to return a blesesd hash ref with @​{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.

This does seem like a good argument to me. However\, what if the new syntax always used values as hash references in keys\, values\, each; and always as array references in push\, pop\, shift\, unshift\, splice? Your complaint doesn't work in that case\, does it?

Ambrus

p5pRT commented 14 years ago

From @xdg

On Sun\, Oct 31\, 2010 at 4​:17 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

With your proposed change\, there is another limitation​: I cannot change a function that returns a hash ref to return a blesesd hash ref with @​{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.

If your API claims to return a hash reference and starts returning an object instead\, you've just changed your API and there are all the usual consequences. What if I was validating that your return value was indeed a hashref\, e.g. C\<ref get_hashref() eq 'HASH'>? That breaks\, too\, and has nothing to do with this patch.

My opinion is that objects should be treated as if they were opaque. Dealing directly with the underlying representation from outside the class is breaking encapsulation and if your change to internal representation causes warnings for someone breaking encapsulation\, you should feel no responsibility. The warnings in this patch exist to tell naughty people that they're being naughty\, so they blame perl (if not themselves) and not you.

The proper thing to do with objects is to provide "as_hash" or "as_array" (or whatever) methods to provide an unblessed representation and probably a deep copy\, too\, so they can't fiddle your object state outside the API.

-- David

p5pRT commented 14 years ago

From @xdg

On Thu\, Oct 28\, 2010 at 8​:42 AM\, Jesse Vincent \jesse@&#8203;fsck\.com wrote​:

  push $arrayref\, @​stuff;   unshift $arrayref\, @​stuff;   pop $arrayref;   shift $arrayref;   splice $arrayref\, 0\, 2;   keys $hashref;           # or $arrayref   values $hashref          # or $arrayref   ($k\,$v) = each $hashref  # or $arrayref

+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.

After applying some fixes from rafl and avar\, and clarifying some documentation from comments in this thread\, the patch has been applied as commit cba5a3b05660d6a40525beb667a389a690900298.

Thank you to everyone involved throughout my work on this over the past several weeks. I couldn't have done it without your feedback and help.

-- David

p5pRT commented 14 years ago

From @xdg

Applied (with fixes) as commit cba5a3b05660d6a40525beb667a389a690900298

p5pRT commented 14 years ago

@xdg - Status changed from 'open' to 'resolved'

p5pRT commented 14 years ago

From @cpansprout

On Sun Oct 31 17​:03​:05 2010\, b_jonas wrote​:

On Sun\, Oct 31\, 2010 at 9​:17 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Please don’t do it!

With your proposed change\, there is another limitation​: I cannot change a function that returns a hash ref to return a blesesd hash ref with @​{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.

This does seem like a good argument to me. However\, what if the new syntax always used values as hash references in keys\, values\, each; and always as array references in push\, pop\, shift\, unshift\, splice? Your complaint doesn't work in that case\, does it?

No. That would be perfect.

p5pRT commented 14 years ago

From @cpansprout

On Sun Oct 31 18​:12​:54 2010\, dagolden@​cpan.org wrote​:

On Sun\, Oct 31\, 2010 at 4​:17 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

With your proposed change\, there is another limitation​: I cannot change a function that returns a hash ref to return a blesesd hash ref with @​{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.

If your API claims to return a hash reference and starts returning an object instead\, you've just changed your API and there are all the usual consequences. What if I was validating that your return value was indeed a hashref\, e.g. C\<ref get_hashref() eq 'HASH'>?

Any such code is asking for trouble anyway\, precisely because objects *can* be (used as) hash refs.

(I always have to employ workarounds when\, for instance\, modules try to reject an object because a string is expected\, when objects are strings anyway. This is Perl after all.)

My opinion is that objects should be treated as if they were opaque. Dealing directly with the underlying representation from outside the class is breaking encapsulation

Unless it is documented as part of the public interface.

p5pRT commented 14 years ago

From @demerphq

On 1 November 2010 00​:18\, Father Chrysostomos via RT \perlbug\-comment@&#8203;perl\.org wrote​:

On Thu Oct 28 09​:50​:54 2010\, demerphq wrote​:

/me goes back to sleep

yves

Wait! could you look at bugs 68564\, 70998 and 78356 first? Or am I too late? :-)

68564​: Your patch doesn't *feel* right to me\, like its addressing the symptom and not the cause. I need to poke deeper later. 70998​: Similar story. As far as i remember

/\xab|\xa9/

should be handled by the trie optimisation\, and then converted into a charclass (ANYOF). So the question is why doesnt this problem also occur with

/[\xab\xa9]/

? Im not saying the patch is wrong\, just it doesnt clarify this aspect of things.

78356​: Ihis is an interesting case. IMO there are a few things at play here. First is that we go through a phase during optimisation where we join consequtive EXACT nodes together. The original code used to produce one EXACT node per character\, later it was realized that merging them made a big difference. However it seems the case of NOTHING EXACT doesnt get reduced (properly) to EXACT alone.

IIRC that is what this output is​:   | 13| tail~ BRANCH (9) -> TAIL   | | tsdy~ NOTHING (3) -> PSEUDO   | | ~ EXACT \ (4) -> EXACT   | | ~ attach to TAIL (12) offset to 8   | | tsdy~ EXACT \ (7) -> EXACT   | | ~ attach to TAIL (12) offset to 5   | | tsdy~ EXACT \ (10) -> EXACT   | | ~ attach to TAIL (12) offset to 2

Next\, it appears that the jump trie code isnt working for the case of NOTHING. which is a separate problem.

We see this in action here​:

Final program​:   1​: SBOL (2)   2​: TRIE-EXACT\<S​:1/7 W​:3 L​:0/3 C​:6/6>[bz] (13)   \<> (4)   4​: EXACT \ (13)   \ (13)   \ (13)   13​: EOS (14)   14​: END (0) anchored ""$ at 3 anchored(SBOL) minlen 3 r->extflags​: ANCH_SBOL Matching REx "\A(?​:(?​:)foo|bar|zot)\z" against "foo"   Setting an EVAL scope\, savestack=3 regmatch start   0 \<> \ | 1​:SBOL(2)   0 \<> \ | 2​:TRIE-EXACT\<S​:1/7 W​:3 L​:0/3 C​:6/6>[bz](13)   matched empty string...   0 \<> \ | 13​:EOS(14)   failed...

Where we expect to execute node 4 after the "matched empty string"\, yet we did not. For some reason we didnt "jump" but execute a "normal" trie.

Ill try to dig further later on when I have more time. I hope this is helpful to you.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

From zefram@fysh.org

Nicholas Clark wrote​:

ie remove the rv2hv OP\, and replace the keys op with one that will initially implement the dereference of rv2hv.

+1. It's a sufficiently common pattern to warrant this optimisation.

-zefram

p5pRT commented 14 years ago

From zefram@fysh.org

David Golden wrote​:

push $arrayref\, @​stuff; unshift $arrayref\, @​stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2;

For these I have no strong opinion. I find the implicit dereference mildly distasteful\, but there's no real problem. Provided\, of course\, that we will never want push\, splice\, et al to operate on scalars. The equivalent operations at the character level all exist under different names (.=\, substr)\, so it doesn't seem likely to arise.

keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

I am opposed to these\, because they operate on both arrays and hashes. Currently there is some mild syntactic overloading\, in that keys(@​$x) and keys(%$x) use the same keyword while being distinct operators. Your patch turns that compile-time syntactic overloading into a runtime semantic ambiguity\, in that keys($x) doesn't decide whether it's doing array-keys or hash-keys until it actually executes.

keys(@​$x) always does an array-keys operation\, even if $x references a hash blessed into a class doing @​{} overloading\, or an array blessed into a class doing %{} overloading. keys($x) would (under your patch) do hash-keys on such refs. So replacing keys(@​$x) with keys($x)\, the shortcut that you're encouraging\, is incorrect if $x might reference something other than a plain array. If $x is an input from a caller\, and the requirement on $x has previously been merely that it can be used as an array ref in Perl operations\, changing the code in this way will break it. It'll break it in an obscure situation that won't be in the test suite.

(Sure\, you can check that a supplied `array ref' actually does reference an unblessed array. Some of *my* code does. But most people don't bother.)

This is another form of the general issue with Perl polymorphism. Perl is slanted heavily towards values being polymorphic\, behaving as whatever type the operator requires. This means that operators have to know what operation they're performing. It doesn't get along at all well with polymorphic operators\, that behave differently according to the type of their operands. We have a few such operators\, such as the bitwise ops\, and they're a nightmare. You drew a parallel with the context system\, but that's a faulty analogy​: context is determined from lexically surrounding code or the dynamically surrounding sub call\, it does not depend on the operands supplied.

So\, in summary\, I am opposed to this polymorphism of keys\, values\, and each\, because it encourages the writing of incorrect code. The correct place for these polymorphic operators\, if you really want to use them\, is in a module on CPAN. You don't need them to be in the core for them to compile to single ops.

-zefram

p5pRT commented 14 years ago

From @cpansprout

On Tue Nov 02 13​:27​:14 2010\, zefram@​fysh.org wrote​:

David Golden wrote​:

keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref

I am opposed to these\, because they operate on both arrays and hashes. Currently there is some mild syntactic overloading\, in that keys(@​$x) and keys(%$x) use the same keyword while being distinct operators. Your patch turns that compile-time syntactic overloading into a runtime semantic ambiguity\, in that keys($x) doesn't decide whether it's doing array-keys or hash-keys until it actually executes.

keys(@​$x) always does an array-keys operation\, even if $x references a hash blessed into a class doing @​{} overloading\, or an array blessed into a class doing %{} overloading. keys($x) would (under your patch) do hash-keys on such refs. So replacing keys(@​$x) with keys($x)\, the shortcut that you're encouraging\, is incorrect if $x might reference something other than a plain array. If $x is an input from a caller\, and the requirement on $x has previously been merely that it can be used as an array ref in Perl operations\, changing the code in this way will break it. It'll break it in an obscure situation that won't be in the test suite.

Phew! At least I’m not the only one. You explained it more clearly than I could.

(Sure\, you can check that a supplied `array ref' actually does reference an unblessed array. Some of *my* code does.

So does some of mine\, but none of my *new* code\, as I realised a while ago that such an approach is fundamentally flawed. There’s no reason to reject arrays just because they happen to be blessed.

But most people don't bother.)

This is another form of the general issue with Perl polymorphism. Perl is slanted heavily towards values being polymorphic\, behaving as whatever type the operator requires. This means that operators have to know what operation they're performing. It doesn't get along at all well with polymorphic operators\, that behave differently according to the type of their operands. We have a few such operators\, such as the bitwise ops\, and they're a nightmare. You drew a parallel with the context system\, but that's a faulty analogy​: context is determined from lexically surrounding code or the dynamically surrounding sub call\, it does not depend on the operands supplied.

So\, in summary\, I am opposed to this polymorphism of keys\, values\, and each\, because it encourages the writing of incorrect code. The correct place for these polymorphic operators\, if you really want to use them\, is in a module on CPAN. You don't need them to be in the core for them to compile to single ops.

Or we would follow Zsbán Ambrus’ suggestion and make those three do array operations *only* when the @​ is explicit\, and do hash operations otherwise.

If I go ahead and do that\, will anyone object strongly?

In perl 5.12 there are at least two different sets of rules for dereferencing. Now there are at least three. The two in perl 5.12 can be unified without breaking backward compatibility. But these new cases cannot be reconciled them with existing rules. So there are just more layers of inconsistency.

p5pRT commented 14 years ago

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @​ is explicit\, and do hash operations otherwise.

That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle​: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype. Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway. To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.

It may be saner\, even under xdg's keys behaviour\, to give up on perfect matching with prototypes. With cv_set_call_checker() we have at least exposed the layer in which the beyond-prototype magic occurs; it is (and always was) at least possible for modules to implement the kind of non-prototype parameter magic that the core functions have. (In fact\, on a tangent from the File​::stat work we discussed in 2009-03\, I'm planning to use op check magic to implement a drop-in stat() replacement\, properly replicating CORE​::stat's parameter processing but returning complex objects.)

-zefram

p5pRT commented 14 years ago

From @cpansprout

On Sun Nov 07 12​:57​:32 2010\, zefram@​fysh.org wrote​:

Father Chrysostomos via RT wrote​:

Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @​ is explicit\, and do hash operations otherwise.

That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle​: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype. Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway. To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.

My proposed change to (+) (or (\[+])) would eliminate that problem\, wouldn’t it?

p5pRT commented 14 years ago

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

My proposed change to (+) (or (\[+])) would eliminate that problem\, wouldn???t it?

No\, it would change it to a different form\, in that an argument value that can be dereferenced as both scalar and array would be ambiguous between keys(@​a) and keys($b). Do you do array-keys on @​$arg\, or hash-keys on %$$arg?

-zefram

p5pRT commented 14 years ago

From @xdg

On Sun\, Nov 7\, 2010 at 3​:57 PM\, Zefram \zefram@&#8203;fysh\.org wrote​:

Father Chrysostomos via RT wrote​:

Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @​ is explicit\, and do hash operations otherwise.

That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle​: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype.  Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway.  To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.

Not surprisingly\, I disagree. It eliminates the possibility of the expert user knowingly acting on arrayrefs to avoid the bug that occurs in an edge case when the non-expert user doesn't properly specify an API or validate inputs.

If you're suggesting that only in the ambiguous cases {reftype HASH or overloaded %{} exists) it should always resolve as a hash\, then I still disagree but not as violently. E.g.

  sub mykeys(+) {   my $item = shift;   if ( ! blessed $ref && reftype $item eq 'ARRAY' ) {   return (scalar @​$item ? (0 .. $#$item) : ());   }   else {   return keys %$item;   }   }

In such a case\, documenting that it only interprets an argument as an arrayref for pure\, unblessed arrayrefs\, and that any blessed object is interpreted as a hash provides unambiguous behavior for end users. If you aren't *sure* it's a pure arrayref\, you must explicitly dereference it or else you could be surprised. Given that in the documentation\, it's up to users to act responsibly.

My approach in the patch was for more dwimmery -- let someone pass an object with @​{} overloading if that wasn't ambiguous and otherwise do something "reasonable" in the ambiguous edge cases and warn about it.

What I describe above in the code sample reduces the dwimmery and gives more consistent behavior\, which maybe some prefer. But I certainly see no reason to prohibit "keys \@​array" since there is no possibility of ambiguity of interpretation.

-- David

p5pRT commented 14 years ago

From @cpansprout

On Sun Nov 07 16​:13​:02 2010\, xdaveg@​gmail.com wrote​:

On Sun\, Nov 7\, 2010 at 3​:57 PM\, Zefram \zefram@&#8203;fysh\.org wrote​:

Father Chrysostomos via RT wrote​:

Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @​ is explicit\, and do hash operations otherwise.

That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle​: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype.  Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway.  To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.

Not surprisingly\, I disagree.

Not being one to argue\, I am willing to let you have your way. I was hoping that expostulation could resolve this\, but I see no point in further discussion. What you suggest below sounds nice\, and I would appreciate it if you could make it work like that\, but I won’t push for it.

It eliminates the possibility of the expert user knowingly acting on arrayrefs to avoid the bug that occurs in an edge case when the non-expert user doesn't properly specify an API or validate inputs.

If you're suggesting that only in the ambiguous cases {reftype HASH or overloaded %{} exists) it should always resolve as a hash\, then I still disagree but not as violently. E.g.

sub mykeys(+) { my $item = shift; if ( ! blessed $ref && reftype $item eq 'ARRAY' ) { return (scalar @​$item ? (0 .. $#$item) : ()); } else { return keys %$item; } }

In such a case\, documenting that it only interprets an argument as an arrayref for pure\, unblessed arrayrefs\, and that any blessed object is interpreted as a hash provides unambiguous behavior for end users. If you aren't *sure* it's a pure arrayref\, you must explicitly dereference it or else you could be surprised. Given that in the documentation\, it's up to users to act responsibly.

My approach in the patch was for more dwimmery -- let someone pass an object with @​{} overloading if that wasn't ambiguous and otherwise do something "reasonable" in the ambiguous edge cases and warn about it.

What I describe above in the code sample reduces the dwimmery and gives more consistent behavior\, which maybe some prefer. But I certainly see no reason to prohibit "keys \@​array" since there is no possibility of ambiguity of interpretation.

-- David

p5pRT commented 13 years ago

From @ap

* David Golden \dagolden@&#8203;cpan\.org [2010-11-08 05​:10]​:

If an API claims to take an arrayref\, then it should validate the input as being either an arrayref or as something that overloads @​{}. It can (and should) then safely call "keys @​$array_ref" because it know that it can be dereferenced as an array. (Or it can not validate and still call "keys @​$array_ref" and let an exception be thrown\, but that's an implementation choice.)

However\, if an API claims to return an arrayref\, then it should return an arrayref. If it suddenly changes to return a scalar blessed into a class that overloads @​{}\, then it will break my code if I'm validating that "ref $return eq 'ARRAY'" consistent with the API claim. That is completely different from an API that claims to return a value that can be dereferenced as an array. In that case\, I can't validate using ref()\, I need to validate that it meets that claim. (Hello Params​::Util​::_ARRAYLIKE!)

Thus\, my objection to the original example was that it is equivalent to saying if I break my API claim (by giving an object instead of a raw data structure)\, then downstream bugs can happen\, which has everything to do with the API change and nothing to do with a change to keys().

That makes no sense whatsoever to me. Why are you concerned with such a low level detail? Are you saying that it behooves me and every other CPAN author in the world to put “something that behaves as an” in front of every “array ref”\, “hash ref” etc.? Why? In what way does your code become more functional when you make it longer just so you can reject inputs that the caller or callee might have had good reason to hand you? (Cf. Carter’s compass.) This is Perl\, not a systems language.

I agree with the charge that operators in Perl should be monomorphic. We don’t want to become Javascript (which has both polymorphic values and polymorphic operators\, and they’re working on repairing it now). The only sane options are to be Perl (polymorphic values\, monomorphic ops) or Python (the reverse).

I refused to use smart match in 5.10.0 for this reason. The non-commutative smart match in 5.10.1 is much saner but I’ll still only use it with literals on the right side. (Thankfully that’s enough to make it a huge help.)

I don’t mind having `each`\, `keys` and `values` operate on ref scalars\, but please have them use a fixed set of rules to decide what to do\, that will not change based on the value passed in. I care not *which* set of rules you pick\, just that it be fixed.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 13 years ago

From ambrus@math.bme.hu

On Sun\, Nov 21\, 2010 at 4​:05 AM\, Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de wrote​:

I agree with the charge that operators in Perl should be monomorphic. We don’t want to become Javascript (which has both polymorphic values and polymorphic operators\, and they’re working on repairing it now). The only sane options are to be Perl (polymorphic values\, monomorphic ops) or Python (the reverse).

I refused to use smart match in 5.10.0 for this reason. The non-commutative smart match in 5.10.1 is much saner but I’ll still only use it with literals on the right side. (Thankfully that’s enough to make it a huge help.)

So true. Thanks for explaining why I don't like smart match more clearly than I could have.

Ambrus