Closed p5pRT closed 14 years ago
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows:
* Summary of benefits * Summary of the patch * Rationale
_Summary of benefits_
As I see it\, the primary benefit is visually de-cluttering common operations in situations where the context (acting on a "container") is clear.
As a side effect of the implementation of this change\, calling keys/values directly on a reference gives roughly a 75%+ performance boost compared to Perl 5.13.6.
keys $hashref; # much faster than "keys %$hashref"
(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)
I suspect that\, if accepted\, similar optimizations might be possible for push/pop/etc ops.
_Summary of the patch_
For push/pop/shift/unshift/splice ops\, I added a new 'ck_push' check routine. If the child op is not an array or array dereference\, the child op is wrapped in an array dereference.
For keys/values/each\, I added three new opcodes to handle the case where the argument is not a hash or array (or explicit dereference of them). All three ops are handled by one function that dereferences the argument (including any magic) and then dispatches to the corresponding hash or array implementation of the original function. If the argument is not a hash or array\, a runtime error is thrown.
For keys/values/each\, when overloaded dereferencing is present\, the overloaded dereference is used instead of dereferencing the underlying reftype. Warnings are issued about assumptions made in the following three ambiguous cases:
(a) If both %{} and @{} overloading exists\, %{} is used (b) If %{} overloading exists on a blessed arrayref\, %{} is used (c) If @{} overloading exists on a blessed hashref\, @{} is used
The new warnings have been added to perldiag.pod with an explanation.
For all affected built-ins\, the prototype has been changed to use the new 'single-term' prototype '+'. E.g. for push\, the prototype is "+@".
Documentation and tests are included.
_Rationale_
Perl is a language that give great respect to the idea of context. I use that term below in the linguistic sense in which the pattern of usage guides interpretation (rather than in the programming sense of "scalar" or "list" context).
There are a number of built-in functions that act directly on "containers" (i.e. hashes and arrays) and that pass their first argument by reference rather than flattening the container into a list. E.g. because of the linguistic context of a function like "push" in "push @array\, @list"\, perl passes @array by reference.
In various venues and forums\, I've heard people express the desire for perl do the "obvious thing" when providing a reference directly to such functions that act on containers. E.g. "push $arrayref\, @list". The linguistic context provided by "push" is exactly the same -- it is clear that the goal is to act on the container itself.
In contrast\, one can't reach the same conclusion about\, for example\, "foo($_) for $arrayref". The "for" doesn't give any clear linguistic context for whether $arrayref is intended to be flattened or used as a single element list aliased to $_.
Therefore\, this patch allows perl to use a reference to a container directly for built in functions only where the linguistic context *already* allows perl to pass the container by reference instead of flattening it.
In discussing this patch or variations of it on email and IRC\, I've heard lots of praise for the idea. The justifications I've seen are various combinations of "easier to read"\, "less to type" and/or "less annoying".
Those who dislike the idea of this patch generally seem to be of the opinion that dereferencing "should be explicit" or that the patch "isn't necessary" (which may just be a variation of the first objection).
To the first objection\, I say that this is a matter of style and reasonable people already disagree on the "best" Perl style. This patch delivers a desired feature to those who do want it without interfering with the ability of others to explicitly dereference everything as a matter of personal (or corporate) style.
I dismiss the second objection out of hand. By its very nature as a community-driven project\, nothing in the development of Perl is "necessary" and the very definition of "necessary" will vary from person to person. In this case\, the work is desired and it is already done.
As mentioned in the summary\, the changes to keys/values/each to accept references happened to lead to a significant performance improvement\, recovering some of what was lost when 5.12 changed the implementation to allow keys/values/each to act on arrays in the first place.
The changes to push/pop/etc did not require the same degree of changes (i.e. I didn't add new opcodes for them) and I kept the patch as simple as possible. However\, if the patch is accepted\, I suspect that the same technique that allows optimization of "keys $hashref" could be explored later to optimize "push $arrayref\, @list" over "push @$arrayref\, @list".
MANIFEST | 1 + doop.c | 5 +- embed.h | 4 + op.c | 85 +++++++++++-- opcode.h | 21 +++- opnames.h | 5 +- pod/perldelta.pod | 26 ++++ pod/perldiag.pod | 16 +++ pod/perlfunc.pod | 75 +++++++++-- pod/perlsub.pod | 10 +- pp.c | 81 ++++++++++++- pp.sym | 4 + proto.h | 9 ++ regen/opcode.pl | 12 ++- t/op/cproto.t | 16 ++-- t/op/push.t | 39 ++++++- t/op/smartkve.t | 361 +++++++++++++++++++++++++++++++++++++++++++++++++++++ t/op/splice.t | 7 +- t/op/unshift.t | 36 +++++- 19 files changed\, 757 insertions(+)\, 56 deletions(-) create mode 100644 t/op/smartkve.t
David Golden (via RT) \perlbug\-followup@​perl\.org writes:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
+1
-- Johan
The RT System itself - Status changed from 'new' to 'open'
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)
This is not correct.
*Your blog* is not the right place to report bugs (or performance regressions)
Report your benchmark here\, and I will demonstrate that this was not the cause. The code changes for keys/values on arrays was entirely a a compile- time action. The runtime code for keys/arrays never changed.)
Your science is duff. Duff science annoys me.
(I believe that you will find that the cause actually relates to assignment and lexical variables\, specifically fafafbaf705adfdf and then 0f907b96d618c97c)
I suspect that\, if accepted\, similar optimizations might be possible for push/pop/etc ops.
Which would continue to increase the complexity of the core codebase.
As mentioned in the summary\, the changes to keys/values/each to accept references happened to lead to a significant performance improvement\, recovering some of what was lost when 5.12 changed the implementation to allow keys/values/each to act on arrays in the first place.
This is false.
*Also*\, your changes bring no benefit unless code is rewritten to use them. Any code written to use them is incompatible with existing perl installations\, and can't help widely deployed code for many years to come.
The speed gain mainly comes from removing 1 op from the *benchmark*'s hot path. The cost is complexity in the core codebase\, which increases the maintainability burden\, and likely slightly slows down some other part of perl.
A more general optimisation *for existing code* would be to change the peephole optimiser to recognise this structure:
$ perl -MO=Concise -e 'my $a; print keys %$a' a \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e:1) v ->3 3 \<0> padsv[$a:1\,2] vM/LVINTRO ->4 4 \<;> nextstate(main 2 -e:1) v ->5 9 \<@> print vK ->a 5 \<0> pushmark s ->6 8 \<1> keys[t3] lK/1 ->9 7 \<1> rv2hv[t2] lKRM/1 ->8 6 \<0> padsv[$a:1\,2] sM/DREFHV ->7 -e syntax OK
and replace it with something close to this one:
$ perl -MO=Concise -e 'my %a; print keys %a' 9 \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e:1) v ->3 3 \<0> padhv[%a:1\,2] vM/LVINTRO ->4 4 \<;> nextstate(main 2 -e:1) v ->5 8 \<@> print vK ->9 5 \<0> pushmark s ->6 7 \<1> keys[t2] lK/1 ->8 6 \<0> padhv[%a:1\,2] lRM ->7 -e syntax OK
ie remove the rv2hv OP\, and replace the keys op with one that will initially implement the dereference of rv2hv.
We should be discussing the merits/demerits of this functionality solely on syntax\, not on speed.
Nicholas Clark
On Thu\, Oct 28\, 2010 at 12:42 AM\, David Golden \perlbug\-followup@​perl\.org wrote:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; keys $hashref;
Do I understand it correctly that you can write any expression returning a reference in place of $arrayref? Is it right that there's no ambiguity between this and the traditional syntax because an array/hash expression ('@foo' or '%foo') or an array/hash dereference expression ('@$foo' or '@{foo()}' or '%$foo' or '%{foo()}') can never return a reference in scalar context\, for it returns either a plain number or a '3/8' style string? Do you just assume nobody will want to tie a has %foo so it returns a reference to some other hash or array\, and then try to use keys(%foo) as a shortcut for keys(@%foo) ?
Ambrus
On Wed\, 27 Oct 2010 15:42:44 -0700\, David Golden (via RT) \perlbug\-followup@​perl\.org wrote:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows:
WOW! YES! A killing new feature for 5.14 :)
I personally wanted to see if this
my %hash; push @{$hash{new}{entry}}\, 42;
could now be written as
my %hash; push $has{new}{entry}\, 42;
as that kind of autovivivication would be where I would use this new feature most. So I checked out blead\, applied the patch and tested:
Test Summary Report
op/stash.t (Wstat: 0 Tests: 46 Failed: 0) TODO passed: 42\, 46 porting/manifest.t (Wstat: 0 Tests: 9919 Failed: 3) Failed tests: 9914\, 9918-9919 ../ext/XS-APItest/t/caller.t (Wstat: 0 Tests: 24 Failed: 0) TODO passed: 22-23 Files=1951\, Tests=403036\, 249 wallclock secs (13.12 usr 62.58 sys + 447.95 cusr 35.43 csys = 559.08 CPU) Result: FAIL
Before patch:
$ perl -MData::Dumper -wle'my%h;push$h{new}{entry}\,42;print Dumper\%h' Type of arg 1 to push must be array (not hash element) at -e line 1\, near "42;" Execution of -e aborted due to compilation errors.
After patch:
$ ./perl -Ilib -MData::Dumper -wle'my%h;push$h{new}{entry}\,42;print Dumper\%h' $VAR1 = { 'new' => { 'entry' => [ 42 ] } };
# # ####### ##### ## # # # # # ## # # # # ## # # ##### ##### ## # # # # # # # ## # ####### ##### ##
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On Thu\, Oct 28\, 2010 at 10:54:07AM +0200\, H.Merijn Brand wrote:
WOW! YES! A killing new feature for 5.14 :)
I thought we already had a roughly 5% general speedup over 5.12. (Although other things may have eaten away at that)
Nicholas Clark
On Thu\, 28 Oct 2010 10:54:07 +0200\, "H.Merijn Brand" \h\.m\.brand@​xs4all\.nl wrote:
On Wed\, 27 Oct 2010 15:42:44 -0700\, David Golden (via RT) \perlbug\-followup@​perl\.org wrote:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows:
WOW! YES! A killing new feature for 5.14 :)
I personally wanted to see if this
my %hash; push @{$hash{new}{entry}}\, 42;
could now be written as
my %hash; push $has{new}{entry}\, 42;
as that kind of autovivivication would be where I would use this new feature most. So I checked out blead\, applied the patch and tested:
--8\<---
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On 28 October 2010 10:35\, Nicholas Clark \nick@​ccl4\.org wrote:
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)
Only tenusously on subject..... But I did write a patch\, which never got applied\, which made
if (%$hash) {}
as fast as
if (keys %$hash) {}
That is\, in place changing the former to the latter when used in boolean context.
cheers\, Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
On Thu\, 28 Oct 2010 09:59:00 +0100\, Nicholas Clark \nick@​ccl4\.org wrote:
On Thu\, Oct 28\, 2010 at 10:54:07AM +0200\, H.Merijn Brand wrote:
WOW! YES! A killing new feature for 5.14 :)
I thought we already had a roughly 5% general speedup over 5.12. (Although other things may have eaten away at that)
The more reasons to get people to install 5.14 the better.
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.3 and AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On Thu\, Oct 28\, 2010 at 11:04:39AM +0200\, demerphq wrote:
On 28 October 2010 10:35\, Nicholas Clark \nick@​ccl4\.org wrote:
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)
Only tenusously on subject..... But I did write a patch\, which never got applied\, which made
if (%$hash) {}
as fast as
if (keys %$hash) {}
That is\, in place changing the former to the latter when used in boolean context.
IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op:
$ ~/Sandpit/5120-g/bin/perl5.12.0 -MO=Concise -e '1 if %hash' 7 \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e:1) v:{ ->3 - \<1> null vK/1 ->7 6 \<|> and(other->7) vK/1 ->7 5 \<1> boolkeys sK/1 ->6 4 \<1> rv2hv[t1] sKRM/1 ->5 3 \<$> gv(*hash) s ->4 - \<0> ex-const v ->-
which didn't use to exist:
$ perl -MO=Concise -e '1 if %hash' 6 \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e:1) v ->3 - \<1> null vK/1 ->6 5 \<|> and(other->6) vK/1 ->6 4 \<1> rv2hv[t2] sK/1 ->5 3 \<#> gv[*hash] s ->4 - \<0> ex-const v ->- -e syntax OK
I don't know if it ever got a "thanks applied" message.
Anyway\, if it didn't\, a belated "thanks" for it\, as it made the syntactically simplest way of expressing "has this hash got any keys" also be the fastest.
Nicholas Clark
On Thu\, Oct 28\, 2010 at 11:12 AM\, Nicholas Clark \nick@​ccl4\.org wrote:
On Thu\, Oct 28\, 2010 at 11:04:39AM +0200\, demerphq wrote:
Only tenusously on subject..... But I did write a patch\, which never got applied\, which made
if (%$hash) {}
as fast as
if (keys %$hash) {}
That is\, in place changing the former to the latter when used in boolean context.
IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op:
perl5120delta mentions this (section Selected Performance Enhancements):
"if (%foo)" has been optimized to be faster than "if (keys %foo)".
That's the same\, isn't it?
Ambrus
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
# New Ticket Created by David Golden # Please include the string: [perl #78656] # in the subject line of all future correspondence about this issue. # \<URL: http://rt.perl.org/rt3/Ticket/Display.html?id=78656 >
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
Because this patch is a substantial new feature and will likely merit some discussion\, I have organized this commit message as follows:
* Summary of benefits * Summary of the patch * Rationale
_Summary of benefits_
As I see it\, the primary benefit is visually de-cluttering common operations in situations where the context (acting on a "container") is clear.
I think Merijn's advertising copy is better than yours. :-)
Was:
push @{$hash->{$key}}\, $data2;
Now:
push $hash->{$key}\, $data2;
Am I right in thinking that all the new behaviour used to be syntax errors? (Specifically\, compile time errors)
For keys/values/each\, I added three new opcodes to handle the case where the argument is not a hash or array (or explicit dereference of them). All three ops are handled by one function that dereferences the argument (including any magic) and then dispatches to the corresponding hash or array implementation of the original function. If the argument is not a hash or array\, a runtime error is thrown.
The changes to push/pop/etc did not require the same degree of changes (i.e. I didn't add new opcodes for them) and I kept the patch as simple as possible. However\, if the patch is accepted\, I suspect that the same technique that allows optimization of "keys $hashref" could be explored later to optimize "push $arrayref\, @list" over "push @$arrayref\, @list".
It's not obvious where the "speed"/space trade off lies.
For the lesser-used ops*\, on a modern CPU with a finite instruction cache and reasonable branch prediction\, at some point it will become better to less code\, achieved by having more branches\, because that code will more likely already be hot ready in the CPU.
For the array each/keys/values\, the code implementing them was "completely" different (ie an if/else with two large blocks)\, so it seemed to make more sense to split the two.
Nicholas Clark
* problem - collecting stats on what is used more\, and what is used less
On Thu\, Oct 28\, 2010 at 5:21 AM\, Nicholas Clark \nick@​ccl4\.org wrote:
As I see it\, the primary benefit is visually de-cluttering common operations in situations where the context (acting on a "container") is clear.
I think Merijn's advertising copy is better than yours. :-)
Was:
push @{$hash->{$key}}\, $data2;
Now:
push $hash->{$key}\, $data2;
Agreed. Tux++
Am I right in thinking that all the new behaviour used to be syntax errors? (Specifically\, compile time errors)
Yes. Previously\, these functions were effectively prototyped to require a variable or an explicit dereference of the correct type. The new behavior acts comparable to an explicit dereference\, i.e. it gives a run-time error if the referent is not of the appropriate type.
The changes to push/pop/etc did not require the same degree of changes (i.e. I didn't add new opcodes for them) and I kept the patch as simple as possible. However\, if the patch is accepted\, I suspect that the same technique that allows optimization of "keys $hashref" could be explored later to optimize "push $arrayref\, @list" over "push @$arrayref\, @list".
It's not obvious where the "speed"/space trade off lies.
I agree. I used words like "explored" because it wasn't an obvious win and doing the major surgery needed wasn't necessary to provide this new feature.
-- David
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
# New Ticket Created by David Golden # Please include the string: [perl #78656] # in the subject line of all future correspondence about this issue. # \<URL: http://rt.perl.org/rt3/Ticket/Display.html?id=78656 >
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.
On Thu\, Oct 28\, 2010 at 4:37 AM\, Zsbán Ambrus \ambrus@​math\.bme\.hu wrote:
On Thu\, Oct 28\, 2010 at 12:42 AM\, David Golden \perlbug\-followup@​perl\.org wrote:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; keys $hashref;
Do I understand it correctly that you can write any expression returning a reference in place of $arrayref?
Correct. You can do "keys gives_a_hashref()"
Is it right that there's no ambiguity between this and the traditional syntax because an array/hash expression ('@foo' or '%foo') or an array/hash dereference expression ('@$foo' or '@{foo()}' or '%$foo' or '%{foo()}') can never return a reference in scalar context\, for it returns either a plain number or a '3/8' style string? Do you just assume nobody will want to tie a has %foo so it returns a reference to some other hash or array\, and then try to use keys(%foo) as a shortcut for keys(@%foo) ?
I'm not sure I understand what you're asking with relation to ties. If you're asking about overloading\, it takes that into account and warns on ambiguity.
For example\, if you have "$obj = bless {}\, $class" where $class overloads %{}\, then if you do "keys $obj"\, the overloading will be used to get the hash to act on\, just as if you did %$obj directly. If you have an ambiguous case like "$obj = bless []\, $class" with $class providing %{} overloading\, then "keys $obj" will assume you want the %{} overloading and not a dereference of the underlying type\, and will issue a warning about it. However\, "$obj = bless \(my $scalar)\, $class" with %{} overloading will allow "keys $obj" without a warning.
-- David
On 28.10.2010\, at 14:42\, Jesse Vincent wrote:
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.
The only parts that come to mind are:
- overloading - tie()ing - autobox
But I can't think of any problem in particular with those - just something to consider.
Marcel
On Thu\, Oct 28\, 2010 at 8:49 AM\, Marcel Grünauer \gr@​univie\.ac\.at wrote:
The only parts that come to mind are:
- overloading - tie()ing - autobox
But I can't think of any problem in particular with those - just something to consider.
I've already described how overloading is handled in an earlier reply.
I don't see that this patch changes what happens with ties at all. If you pass a reference to a tied array or hash\, it gets dereferenced to the array or hash and then it follows the same code path that an explicit dereference would have followed (which calls the tied functions as usual).
I can't speak to autobox\, as that's black magic to me\, but my limited understanding is that it hooks method calls\, which is unrelated to this patch.
-- David
On 28 October 2010 11:12\, Nicholas Clark \nick@​ccl4\.org wrote:
On Thu\, Oct 28\, 2010 at 11:04:39AM +0200\, demerphq wrote:
On 28 October 2010 10:35\, Nicholas Clark \nick@​ccl4\.org wrote:
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
(Prior to 5.12\, keys %$hashref was similarly fast\, but the optimization was lost when keys/values added support for arrays)
Only tenusously on subject..... But I did write a patch\, which never got applied\, which made
if (%$hash) {}
as fast as
if (keys %$hash) {}
That is\, in place changing the former to the latter when used in boolean context.
IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op:
$ ~/Sandpit/5120-g/bin/perl5.12.0 -MO=Concise -e '1 if %hash' 7 \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e:1) v:{ ->3 - \<1> null vK/1 ->7 6 \<|> and(other->7) vK/1 ->7 5 \<1> boolkeys sK/1 ->6 4 \<1> rv2hv[t1] sKRM/1 ->5 3 \<$> gv(*hash) s ->4 - \<0> ex-const v ->-
which didn't use to exist:
$ perl -MO=Concise -e '1 if %hash' 6 \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 1 -e:1) v ->3 - \<1> null vK/1 ->6 5 \<|> and(other->6) vK/1 ->6 4 \<1> rv2hv[t2] sK/1 ->5 3 \<#> gv[*hash] s ->4 - \<0> ex-const v ->- -e syntax OK
I don't know if it ever got a "thanks applied" message.
Anyway\, if it didn't\, a belated "thanks" for it\, as it made the syntactically simplest way of expressing "has this hash got any keys" also be the fastest.
Oh. Cool!
:-)
Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
On 28 October 2010 11:18\, Zsbán Ambrus \ambrus@​math\.bme\.hu wrote:
On Thu\, Oct 28\, 2010 at 11:12 AM\, Nicholas Clark \nick@​ccl4\.org wrote:
On Thu\, Oct 28\, 2010 at 11:04:39AM +0200\, demerphq wrote:
Only tenusously on subject..... But I did write a patch\, which never got applied\, which made
if (%$hash) {}
as fast as
if (keys %$hash) {}
That is\, in place changing the former to the latter when used in boolean context.
IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op:
perl5120delta mentions this (section Selected Performance Enhancements):
"if (%foo)" has been optimized to be faster than "if (keys %foo)".
That's the same\, isn't it?
I guess so. If its my patch then it should be "as fast as"\, if its not\, well then thanks to whomever improved on the original idea. :-)
Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
On 28 October 2010 18:46\, demerphq \demerphq@​gmail\.com wrote:
On 28 October 2010 11:18\, Zsbán Ambrus \ambrus@​math\.bme\.hu wrote:
On Thu\, Oct 28\, 2010 at 11:12 AM\, Nicholas Clark \nick@​ccl4\.org wrote:
On Thu\, Oct 28\, 2010 at 11:04:39AM +0200\, demerphq wrote:
Only tenusously on subject..... But I did write a patch\, which never got applied\, which made
if (%$hash) {}
as fast as
if (keys %$hash) {}
That is\, in place changing the former to the latter when used in boolean context.
IIRC it did get applied some time before 5.12 shipped. There's a boolkeys op:
perl5120delta mentions this (section Selected Performance Enhancements):
"if (%foo)" has been optimized to be faster than "if (keys %foo)".
That's the same\, isn't it?
I guess so. If its my patch then it should be "as fast as"\, if its not\, well then thanks to whomever improved on the original idea. :-)
Er. I need a very long vacation I think. It was my patch\, and it did speed things up. And Nicholas applied it\, and I even knew it.
/me goes back to sleep
yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
On 28/10/2010 13:42\, Jesse Vincent wrote:
On Wed\, Oct 27\, 2010 at 03:42:44PM -0700\, David Golden wrote:
# New Ticket Created by David Golden # Please include the string: [perl #78656] # in the subject line of all future correspondence about this issue. #\<URL: http://rt.perl.org/rt3/Ticket/Display.html?id=78656>
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.
My damming problem with this is you have pushed a compile time syntax error\, that I can test with perl -c\, being that push et all expect the next none white space character to be a '@' to a run time check. that I have to now run the code on in order to check I am getting the correct reference passed in.
If Perl was strongly typed this would not be a problem. But it's not so
it is. \
John
On Thu\, Oct 28\, 2010 at 3:40 PM\, John \j\.imrie@​virginmedia\.com wrote:
My damming problem with this is you have pushed a compile time syntax error\, that I can test with perl -c\, being that push et all expect the next none white space character to be a '@' to a run time check. that I have to now run the code on in order to check I am getting the correct reference passed in.
If Perl was strongly typed this would not be a problem. But it's not so it is. \
Well\, "perl -c" wouldn't have helped you with "push @{ $hashref }\, @list" anyway. :-)
-- David
2010/10/28 David Golden \xdaveg@​gmail\.com: --- a/opnames.h +++ b/opnames.h @@ -381\,10 +381\,13 @@ typedef enum opcode { OP_LOCK = 363\, OP_ONCE = 364\, OP_CUSTOM = 365\, + OP_REACH = 366\, + OP_RKEYS = 367\, + OP_RVALUES = 368\, OP_max } opcode;
Shouldn't custom always stay the last? Keeping the existing number is not needed for plc's as customs are a placeholder for custom ops. ;) -- Reini Urban http://phpwiki.org/ http://murbreak.at/
On Thu\, Oct 28\, 2010 at 8:27 PM\, Reini Urban \rurban@​x\-ray\.at wrote:
2010/10/28 David Golden \xdaveg@​gmail\.com: --- a/opnames.h +++ b/opnames.h @@ -381\,10 +381\,13 @@ typedef enum opcode { OP_LOCK = 363\, OP_ONCE = 364\, OP_CUSTOM = 365\, + OP_REACH = 366\, + OP_RKEYS = 367\, + OP_RVALUES = 368\, OP_max } opcode;
Shouldn't custom always stay the last? Keeping the existing number is not needed for plc's as customs are a placeholder for custom ops. ;)
If you look right under __END__ in regen/opcode.pl it says this:
# New ops always go at the end # The restriction on having custom as the last op has been removed
I just followed instructions.
-- David
David Golden (via RT) \perlbug\-followup@​perl\.org wrote: :_Rationale_ [...] :Those who dislike the idea of this patch generally seem to be of the :opinion that dereferencing "should be explicit" or that the patch :"isn't necessary" (which may just be a variation of the first :objection). : :To the first objection\, I say that this is a matter of style and :reasonable people already disagree on the "best" Perl style. This :patch delivers a desired feature to those who do want it without :interfering with the ability of others to explicitly dereference :everything as a matter of personal (or corporate) style. : :I dismiss the second objection out of hand. By its very nature as a :community-driven project\, nothing in the development of Perl is :"necessary" and the very definition of "necessary" will vary from :person to person. In this case\, the work is desired and it is already :done.
I am for this patch\, but I think you treat these objections more lightly than you should. I can imagine many patches I would be strongly against whose proponents would offer the above arguments because they have no better case to make.
Maybe there are other points that could be added\, but I can think of at least: - this breaks no existing code that was free of syntax errors; - it does what people would expect\, given the new constructs\, even without reading the docs; - it does what people *have* expected\, and are tripped up by.
A change to the syntax affects anyone that reads or writes\, teaches or learns perl. TIMTOWTDI doesn't mean we are required to support *every* possible way to do it.
:--- a/t/op/push.t :+++ b/t/op/push.t [...] : foreach $line (@tests) { : ($list\,$get\,$leave) = split(/\,\t*/\,$line); : ($pos\, $len\, @list) = split(' '\,$list); : @get = split(' '\,$get); : @leave = split(' '\,$leave); : @x = (0\,1\,2\,3\,4\,5\,6\,7); :+ $y = [0\,1\,2\,3\,4\,5\,6\,7]; : if (defined $len) { : @got = splice(@x\, $pos\, $len\, @list); :+ @got2 = splice(@$y\, $pos\, $len\, @list); : } : else { : @got = splice(@x\, $pos); :+ @got2 = splice(@$y\, $pos); : } : if (join(':'\,@got) eq join(':'\,@get) && : join(':'\,@x) eq join(':'\,@leave)) {
These look like they were intended to test C\< splice($y\, ...) > instead of\, or as well as\, C\< @$y >.
I'd like to see some additional tests for which the newly unfettered array argument is returned by one of various subroutines. In particular\, I'd have liked to examine those to understand what would happen in a case such as:
my($a\, $b) = ([1]\, [2]); sub twoargs { return +($a\, $b) } push twoargs()\, 3;
My first guess would be C\< $a := [1\, [2]\, 3] >\, but I can imagine quite a few other possibilities.
Hugo
On Fri\, Oct 29\, 2010 at 4:03 AM\, \hv@​crypt\.org wrote:
These look like they were intended to test C\< splice($y\, ...) > instead of\, or as well as\, C\< @$y >.
Thanks. I've fixed those.
I'd like to see some additional tests for which the newly unfettered array argument is returned by one of various subroutines. In particular\, I'd have liked to examine those to understand what would happen in a case such as:
my($a\, $b) = ([1]\, [2]); sub twoargs { return +($a\, $b) } push twoargs()\, 3;
My first guess would be C\< $a := [1\, [2]\, 3] >\, but I can imagine quite a few other possibilities.
That should do the same thing that "push @{ twoargs() }\, 3;" would do\, but I'll add a test for that\, too.
-- David
On Fri\, Oct 29\, 2010 at 7:30 AM\, David Golden \xdaveg@​gmail\.com wrote:
my($a\, $b) = ([1]\, [2]); sub twoargs { return +($a\, $b) } push twoargs()\, 3;
My first guess would be C\< $a := [1\, [2]\, 3] >\, but I can imagine quite a few other possibilities.
That should do the same thing that "push @{ twoargs() }\, 3;" would do\, but I'll add a test for that\, too.
I've confirmed it with a test. $a is [1] and $b is [2\,3] (because in scalar context\, the list returns the last item\, which is then the target of push)
I've added the test\, the fix to splice tests and some other minor fixes (compiling under threads -- thank you rafl and avar)\, rebased it all to recent blead and published a private branch for ongoing review:
git://github.com/dagolden/perl.git -- branch is "private/push-pop-keys-etc-on-refs"
Browseable: http://github.com/dagolden/perl/tree/private/push-pop-keys-etc-on-refs
-- David
On Sun\, Oct 31\, 2010 at 8:26 AM\, Zsbán Ambrus \ambrus@​math\.bme\.hu wrote:
None of this is any stranger than any existing perl syntax quirks. None of this can break existing code\, not even most obfus. The only implication of all this is that you have to be careful when you document the new syntax in the manual.
I get it now. Thank you. I'll double-check the documentation and try to avoid implying exact equivalence.
-- David
On Wed Oct 27 15:42:44 2010\, dagolden@cpan.org wrote:
This patch updates all functions that operate directly on array or hash containers to also accept references to arrays or hashes:
Please don’t do it!
It makes it harder for module authors to maintain backward compatibility.
If I have a function that returns a hash ref\, I am already unable to change it to\, say\, a blessed scalar with %{} overloading\, because then this would break:
*hash = get_hashref();
With your proposed change\, there is another limitation: I cannot change a function that returns a hash ref to return a blesesd hash ref with @{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.
In Perl\, the type of a scalar (and a reference is a scalar\, of course) is generally determined by the operators\, not the operands. Every exception to this is a design flaw and causes problems now and then. (See also bugs 1804\, 77496\, 77502\, 77508\, 77688\, 71686\, 77684\, 77810\, 20661\, 45133\, 77812\, 77492\, 57706 and 77926.) So\, can we please stop adding more of these cases?
(In case someone brings it up\, I never use ~~ or when() for precisely these reasons. And I hope no one using my modules ever uses them....)
On Thu Oct 28 09:50:54 2010\, demerphq wrote:
/me goes back to sleep
yves
Wait! could you look at bugs 68564\, 70998 and 78356 first? Or am I too late? :-)
On Thu Oct 28 09:50:54 2010\, demerphq wrote:
/me goes back to sleep
yves
Wait! could you look at bugs 68564\, 70998 and 78356 first? Or am I too late? :-)
On Sun\, Oct 31\, 2010 at 9:17 PM\, Father Chrysostomos via RT \perlbug\-followup@​perl\.org wrote:
Please don’t do it!
With your proposed change\, there is another limitation: I cannot change a function that returns a hash ref to return a blesesd hash ref with @{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.
This does seem like a good argument to me. However\, what if the new syntax always used values as hash references in keys\, values\, each; and always as array references in push\, pop\, shift\, unshift\, splice? Your complaint doesn't work in that case\, does it?
Ambrus
On Sun\, Oct 31\, 2010 at 4:17 PM\, Father Chrysostomos via RT \perlbug\-followup@​perl\.org wrote:
With your proposed change\, there is another limitation: I cannot change a function that returns a hash ref to return a blesesd hash ref with @{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.
If your API claims to return a hash reference and starts returning an object instead\, you've just changed your API and there are all the usual consequences. What if I was validating that your return value was indeed a hashref\, e.g. C\<ref get_hashref() eq 'HASH'>? That breaks\, too\, and has nothing to do with this patch.
My opinion is that objects should be treated as if they were opaque. Dealing directly with the underlying representation from outside the class is breaking encapsulation and if your change to internal representation causes warnings for someone breaking encapsulation\, you should feel no responsibility. The warnings in this patch exist to tell naughty people that they're being naughty\, so they blame perl (if not themselves) and not you.
The proper thing to do with objects is to provide "as_hash" or "as_array" (or whatever) methods to provide an unblessed representation and probably a deep copy\, too\, so they can't fiddle your object state outside the API.
-- David
On Thu\, Oct 28\, 2010 at 8:42 AM\, Jesse Vincent \jesse@​fsck\.com wrote:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2; keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
+1 - Unless someone comes up with a partticularly damning problem with this\, I'd love to see it get in for the next blead release.
After applying some fixes from rafl and avar\, and clarifying some documentation from comments in this thread\, the patch has been applied as commit cba5a3b05660d6a40525beb667a389a690900298.
Thank you to everyone involved throughout my work on this over the past several weeks. I couldn't have done it without your feedback and help.
-- David
Applied (with fixes) as commit cba5a3b05660d6a40525beb667a389a690900298
@xdg - Status changed from 'open' to 'resolved'
On Sun Oct 31 17:03:05 2010\, b_jonas wrote:
On Sun\, Oct 31\, 2010 at 9:17 PM\, Father Chrysostomos via RT \perlbug\-followup@​perl\.org wrote:
Please don’t do it!
With your proposed change\, there is another limitation: I cannot change a function that returns a hash ref to return a blesesd hash ref with @{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.
This does seem like a good argument to me. However\, what if the new syntax always used values as hash references in keys\, values\, each; and always as array references in push\, pop\, shift\, unshift\, splice? Your complaint doesn't work in that case\, does it?
No. That would be perfect.
On Sun Oct 31 18:12:54 2010\, dagolden@cpan.org wrote:
On Sun\, Oct 31\, 2010 at 4:17 PM\, Father Chrysostomos via RT \perlbug\-followup@​perl\.org wrote:
With your proposed change\, there is another limitation: I cannot change a function that returns a hash ref to return a blesesd hash ref with @{} overloading. I cannot even countermand that by adding %{} overloading as well\, as there is still a warning. So the only choice is to add a new function\, resulting in API bloat.
If your API claims to return a hash reference and starts returning an object instead\, you've just changed your API and there are all the usual consequences. What if I was validating that your return value was indeed a hashref\, e.g. C\<ref get_hashref() eq 'HASH'>?
Any such code is asking for trouble anyway\, precisely because objects *can* be (used as) hash refs.
(I always have to employ workarounds when\, for instance\, modules try to reject an object because a string is expected\, when objects are strings anyway. This is Perl after all.)
My opinion is that objects should be treated as if they were opaque. Dealing directly with the underlying representation from outside the class is breaking encapsulation
Unless it is documented as part of the public interface.
On 1 November 2010 00:18\, Father Chrysostomos via RT \perlbug\-comment@​perl\.org wrote:
On Thu Oct 28 09:50:54 2010\, demerphq wrote:
/me goes back to sleep
yves
Wait! could you look at bugs 68564\, 70998 and 78356 first? Or am I too late? :-)
68564: Your patch doesn't *feel* right to me\, like its addressing the symptom and not the cause. I need to poke deeper later. 70998: Similar story. As far as i remember
/\xab|\xa9/
should be handled by the trie optimisation\, and then converted into a charclass (ANYOF). So the question is why doesnt this problem also occur with
/[\xab\xa9]/
? Im not saying the patch is wrong\, just it doesnt clarify this aspect of things.
78356: Ihis is an interesting case. IMO there are a few things at play here. First is that we go through a phase during optimisation where we join consequtive EXACT nodes together. The original code used to produce one EXACT node per character\, later it was realized that merging them made a big difference. However it seems the case of NOTHING EXACT doesnt get reduced (properly) to EXACT alone.
IIRC that is what this output is:
| 13| tail~ BRANCH (9) -> TAIL
| | tsdy~ NOTHING (3) -> PSEUDO
| | ~ EXACT \
Next\, it appears that the jump trie code isnt working for the case of NOTHING. which is a separate problem.
We see this in action here:
Final program:
1: SBOL (2)
2: TRIE-EXACT\<S:1/7 W:3 L:0/3 C:6/6>[bz] (13)
\<> (4)
4: EXACT \
Where we expect to execute node 4 after the "matched empty string"\, yet we did not. For some reason we didnt "jump" but execute a "normal" trie.
Ill try to dig further later on when I have more time. I hope this is helpful to you.
cheers\, Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
Nicholas Clark wrote:
ie remove the rv2hv OP\, and replace the keys op with one that will initially implement the dereference of rv2hv.
+1. It's a sufficiently common pattern to warrant this optimisation.
-zefram
David Golden wrote:
push $arrayref\, @stuff; unshift $arrayref\, @stuff; pop $arrayref; shift $arrayref; splice $arrayref\, 0\, 2;
For these I have no strong opinion. I find the implicit dereference mildly distasteful\, but there's no real problem. Provided\, of course\, that we will never want push\, splice\, et al to operate on scalars. The equivalent operations at the character level all exist under different names (.=\, substr)\, so it doesn't seem likely to arise.
keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
I am opposed to these\, because they operate on both arrays and hashes. Currently there is some mild syntactic overloading\, in that keys(@$x) and keys(%$x) use the same keyword while being distinct operators. Your patch turns that compile-time syntactic overloading into a runtime semantic ambiguity\, in that keys($x) doesn't decide whether it's doing array-keys or hash-keys until it actually executes.
keys(@$x) always does an array-keys operation\, even if $x references a hash blessed into a class doing @{} overloading\, or an array blessed into a class doing %{} overloading. keys($x) would (under your patch) do hash-keys on such refs. So replacing keys(@$x) with keys($x)\, the shortcut that you're encouraging\, is incorrect if $x might reference something other than a plain array. If $x is an input from a caller\, and the requirement on $x has previously been merely that it can be used as an array ref in Perl operations\, changing the code in this way will break it. It'll break it in an obscure situation that won't be in the test suite.
(Sure\, you can check that a supplied `array ref' actually does reference an unblessed array. Some of *my* code does. But most people don't bother.)
This is another form of the general issue with Perl polymorphism. Perl is slanted heavily towards values being polymorphic\, behaving as whatever type the operator requires. This means that operators have to know what operation they're performing. It doesn't get along at all well with polymorphic operators\, that behave differently according to the type of their operands. We have a few such operators\, such as the bitwise ops\, and they're a nightmare. You drew a parallel with the context system\, but that's a faulty analogy: context is determined from lexically surrounding code or the dynamically surrounding sub call\, it does not depend on the operands supplied.
So\, in summary\, I am opposed to this polymorphism of keys\, values\, and each\, because it encourages the writing of incorrect code. The correct place for these polymorphic operators\, if you really want to use them\, is in a module on CPAN. You don't need them to be in the core for them to compile to single ops.
-zefram
On Tue Nov 02 13:27:14 2010\, zefram@fysh.org wrote:
David Golden wrote:
keys $hashref; # or $arrayref values $hashref # or $arrayref ($k\,$v) = each $hashref # or $arrayref
I am opposed to these\, because they operate on both arrays and hashes. Currently there is some mild syntactic overloading\, in that keys(@$x) and keys(%$x) use the same keyword while being distinct operators. Your patch turns that compile-time syntactic overloading into a runtime semantic ambiguity\, in that keys($x) doesn't decide whether it's doing array-keys or hash-keys until it actually executes.
keys(@$x) always does an array-keys operation\, even if $x references a hash blessed into a class doing @{} overloading\, or an array blessed into a class doing %{} overloading. keys($x) would (under your patch) do hash-keys on such refs. So replacing keys(@$x) with keys($x)\, the shortcut that you're encouraging\, is incorrect if $x might reference something other than a plain array. If $x is an input from a caller\, and the requirement on $x has previously been merely that it can be used as an array ref in Perl operations\, changing the code in this way will break it. It'll break it in an obscure situation that won't be in the test suite.
Phew! At least I’m not the only one. You explained it more clearly than I could.
(Sure\, you can check that a supplied `array ref' actually does reference an unblessed array. Some of *my* code does.
So does some of mine\, but none of my *new* code\, as I realised a while ago that such an approach is fundamentally flawed. There’s no reason to reject arrays just because they happen to be blessed.
But most people don't bother.)
This is another form of the general issue with Perl polymorphism. Perl is slanted heavily towards values being polymorphic\, behaving as whatever type the operator requires. This means that operators have to know what operation they're performing. It doesn't get along at all well with polymorphic operators\, that behave differently according to the type of their operands. We have a few such operators\, such as the bitwise ops\, and they're a nightmare. You drew a parallel with the context system\, but that's a faulty analogy: context is determined from lexically surrounding code or the dynamically surrounding sub call\, it does not depend on the operands supplied.
So\, in summary\, I am opposed to this polymorphism of keys\, values\, and each\, because it encourages the writing of incorrect code. The correct place for these polymorphic operators\, if you really want to use them\, is in a module on CPAN. You don't need them to be in the core for them to compile to single ops.
Or we would follow Zsbán Ambrus’ suggestion and make those three do array operations *only* when the @ is explicit\, and do hash operations otherwise.
If I go ahead and do that\, will anyone object strongly?
In perl 5.12 there are at least two different sets of rules for dereferencing. Now there are at least three. The two in perl 5.12 can be unified without breaking backward compatibility. But these new cases cannot be reconciled them with existing rules. So there are just more layers of inconsistency.
Father Chrysostomos via RT wrote:
Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @ is explicit\, and do hash operations otherwise.
That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype. Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway. To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.
It may be saner\, even under xdg's keys behaviour\, to give up on perfect matching with prototypes. With cv_set_call_checker() we have at least exposed the layer in which the beyond-prototype magic occurs; it is (and always was) at least possible for modules to implement the kind of non-prototype parameter magic that the core functions have. (In fact\, on a tangent from the File::stat work we discussed in 2009-03\, I'm planning to use op check magic to implement a drop-in stat() replacement\, properly replicating CORE::stat's parameter processing but returning complex objects.)
-zefram
On Sun Nov 07 12:57:32 2010\, zefram@fysh.org wrote:
Father Chrysostomos via RT wrote:
Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @ is explicit\, and do hash operations otherwise.
That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype. Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway. To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.
My proposed change to (+) (or (\[+])) would eliminate that problem\, wouldn’t it?
Father Chrysostomos via RT wrote:
My proposed change to (+) (or (\[+])) would eliminate that problem\, wouldn???t it?
No\, it would change it to a different form\, in that an argument value that can be dereferenced as both scalar and array would be ambiguous between keys(@a) and keys($b). Do you do array-keys on @$arg\, or hash-keys on %$$arg?
-zefram
On Sun\, Nov 7\, 2010 at 3:57 PM\, Zefram \zefram@​fysh\.org wrote:
Father Chrysostomos via RT wrote:
Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @ is explicit\, and do hash operations otherwise.
That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype. Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway. To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.
Not surprisingly\, I disagree. It eliminates the possibility of the expert user knowingly acting on arrayrefs to avoid the bug that occurs in an edge case when the non-expert user doesn't properly specify an API or validate inputs.
If you're suggesting that only in the ambiguous cases {reftype HASH or overloaded %{} exists) it should always resolve as a hash\, then I still disagree but not as violently. E.g.
sub mykeys(+) { my $item = shift; if ( ! blessed $ref && reftype $item eq 'ARRAY' ) { return (scalar @$item ? (0 .. $#$item) : ()); } else { return keys %$item; } }
In such a case\, documenting that it only interprets an argument as an arrayref for pure\, unblessed arrayrefs\, and that any blessed object is interpreted as a hash provides unambiguous behavior for end users. If you aren't *sure* it's a pure arrayref\, you must explicitly dereference it or else you could be surprised. Given that in the documentation\, it's up to users to act responsibly.
My approach in the patch was for more dwimmery -- let someone pass an object with @{} overloading if that wasn't ambiguous and otherwise do something "reasonable" in the ambiguous edge cases and warn about it.
What I describe above in the code sample reduces the dwimmery and gives more consistent behavior\, which maybe some prefer. But I certainly see no reason to prohibit "keys \@array" since there is no possibility of ambiguity of interpretation.
-- David
On Sun Nov 07 16:13:02 2010\, xdaveg@gmail.com wrote:
On Sun\, Nov 7\, 2010 at 3:57 PM\, Zefram \zefram@​fysh\.org wrote:
Father Chrysostomos via RT wrote:
Or we would follow Zsb??n Ambrus??? suggestion and make those three do array operations *only* when the @ is explicit\, and do hash operations otherwise.
That possibility occurred to me some time after writing my mail. It's neat in many respects\, with just one niggle: it doesn't correspond to the behaviour you can get from a user-defined function using the (+) prototype. Actually\, if you go strictly through the edge cases that I considered\, xdg's keys behaviour doesn't entirely correspond to its (+) prototype anyway. To maintain correspondence with prototype behaviour you'd have to introduce at least one more prototype feature\, which lets the called function know which kind of dereferencement occurred in parameter processing.
Not surprisingly\, I disagree.
Not being one to argue\, I am willing to let you have your way. I was hoping that expostulation could resolve this\, but I see no point in further discussion. What you suggest below sounds nice\, and I would appreciate it if you could make it work like that\, but I won’t push for it.
It eliminates the possibility of the expert user knowingly acting on arrayrefs to avoid the bug that occurs in an edge case when the non-expert user doesn't properly specify an API or validate inputs.
If you're suggesting that only in the ambiguous cases {reftype HASH or overloaded %{} exists) it should always resolve as a hash\, then I still disagree but not as violently. E.g.
sub mykeys(+) { my $item = shift; if ( ! blessed $ref && reftype $item eq 'ARRAY' ) { return (scalar @$item ? (0 .. $#$item) : ()); } else { return keys %$item; } }
In such a case\, documenting that it only interprets an argument as an arrayref for pure\, unblessed arrayrefs\, and that any blessed object is interpreted as a hash provides unambiguous behavior for end users. If you aren't *sure* it's a pure arrayref\, you must explicitly dereference it or else you could be surprised. Given that in the documentation\, it's up to users to act responsibly.
My approach in the patch was for more dwimmery -- let someone pass an object with @{} overloading if that wasn't ambiguous and otherwise do something "reasonable" in the ambiguous edge cases and warn about it.
What I describe above in the code sample reduces the dwimmery and gives more consistent behavior\, which maybe some prefer. But I certainly see no reason to prohibit "keys \@array" since there is no possibility of ambiguity of interpretation.
-- David
* David Golden \dagolden@​cpan\.org [2010-11-08 05:10]:
If an API claims to take an arrayref\, then it should validate the input as being either an arrayref or as something that overloads @{}. It can (and should) then safely call "keys @$array_ref" because it know that it can be dereferenced as an array. (Or it can not validate and still call "keys @$array_ref" and let an exception be thrown\, but that's an implementation choice.)
However\, if an API claims to return an arrayref\, then it should return an arrayref. If it suddenly changes to return a scalar blessed into a class that overloads @{}\, then it will break my code if I'm validating that "ref $return eq 'ARRAY'" consistent with the API claim. That is completely different from an API that claims to return a value that can be dereferenced as an array. In that case\, I can't validate using ref()\, I need to validate that it meets that claim. (Hello Params::Util::_ARRAYLIKE!)
Thus\, my objection to the original example was that it is equivalent to saying if I break my API claim (by giving an object instead of a raw data structure)\, then downstream bugs can happen\, which has everything to do with the API change and nothing to do with a change to keys().
That makes no sense whatsoever to me. Why are you concerned with such a low level detail? Are you saying that it behooves me and every other CPAN author in the world to put “something that behaves as an” in front of every “array ref”\, “hash ref” etc.? Why? In what way does your code become more functional when you make it longer just so you can reject inputs that the caller or callee might have had good reason to hand you? (Cf. Carter’s compass.) This is Perl\, not a systems language.
I agree with the charge that operators in Perl should be monomorphic. We don’t want to become Javascript (which has both polymorphic values and polymorphic operators\, and they’re working on repairing it now). The only sane options are to be Perl (polymorphic values\, monomorphic ops) or Python (the reverse).
I refused to use smart match in 5.10.0 for this reason. The non-commutative smart match in 5.10.1 is much saner but I’ll still only use it with literals on the right side. (Thankfully that’s enough to make it a huge help.)
I don’t mind having `each`\, `keys` and `values` operate on ref scalars\, but please have them use a fixed set of rules to decide what to do\, that will not change based on the value passed in. I care not *which* set of rules you pick\, just that it be fixed.
Regards\, -- Aristotle Pagaltzis // \<http://plasmasturm.org/>
On Sun\, Nov 21\, 2010 at 4:05 AM\, Aristotle Pagaltzis \pagaltzis@​gmx\.de wrote:
I agree with the charge that operators in Perl should be monomorphic. We don’t want to become Javascript (which has both polymorphic values and polymorphic operators\, and they’re working on repairing it now). The only sane options are to be Perl (polymorphic values\, monomorphic ops) or Python (the reverse).
I refused to use smart match in 5.10.0 for this reason. The non-commutative smart match in 5.10.1 is much saner but I’ll still only use it with literals on the right side. (Thankfully that’s enough to make it a huge help.)
So true. Thanks for explaining why I don't like smart match more clearly than I could have.
Ambrus
Migrated from rt.perl.org#78656 (status was 'resolved')
Searchable as RT78656$