Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

`isa` is confused about the representation type of objects #18531

Open leonerd opened 3 years ago

leonerd commented 3 years ago

Perl 5.32 added the isa operator.

isa is never true of any value that is not a blessed object:

$ perl5.32.0 -M5.032 -Mexperimental=isa
say +([] isa ARRAY ? "ISA" : "not")

not

It is true for any class that appears in the @ISA tree of any blessed object:

my $obj = bless {}, "MyClass";
@MyClass::ISA = qw(A::Base::Class);
say +($obj isa MyClass ? "ISA" : "not");
say +($obj isa A::Base::Class ? "ISA" : "not");

ISA
ISA

These are all fine.

However, a quirk of implementation means that any object also appears to be isa to a class of the same string that reftype() would return for the underlying container type of the object.

$ perl5.32.0 -M5.032 -Mexperimental=isa
my $obj = bless {}, "MyClass";
say +($obj isa HASH ? "ISA" : "not");

ISA

This shouldn't be the case. It ought to be the case that $obj isa CLASS is only true if CLASS is somewhere in the method resolution order of $obj, and thus subs in that package are candidates for the method call operator $obj->METHOD.

leonerd commented 3 years ago

Annoyingly though, the ->isa method behaves this way too:

$ perl5.32.0 -M5.032 -Mexperimental=isa
my $obj = bless {}, "MyClass";
say +($obj->isa("HASH") ? "ISA" : "not");

ISA
haarg commented 3 years ago

While this is unfortunate, it would require divorcing the operator from the isa method somehow. I don't think that's a good idea.

The documentation for isa ought to be updated to mention its relationship with the isa method.

leonerd commented 3 years ago

@haarg Yes - it may be that equivalence to the ->isa method is deemed more important than sensible answers equivalent to method dispatch. It is unfortunate that ->isa has always worked this way, but of course far too late to change it now.

In that case, if we think we're happy with the isa operator as it stands, what is the policy on removing the experimental tag from it?

leonerd commented 3 years ago

Also discussed in #18585 which is a near-duplicate of this issue.

leonerd commented 3 years ago

18585 (now closed as a duplicate of this one) presented basically three options, plus the default do-nothing:

  1. Keep current behaviour, but add docs to point it out
  2. warn/croak on attempts to check isa to any of the special reftype names that aren't actual classes (ARRAY, HASH, etc..)
  3. Don't include those reftype names when implementing the default behaviour
  4. Rename the isa operator entirely, thus giving it a new name whose sensible behaviour can differ from ->isa and not confuse people

A variant of option 1, being 1a, would silently return false for those special reftype names, rather than warn or croak.

The difference between 1a and 2 being that in the presence of an overloaded ->isa method on the object, option 1a has already returned false before the method is even invoked, whereas under option 2 the object's overloaded method runs and gets to see it as we only changed the default mechanism.

hvds commented 3 years ago

In #18585, @leonerd wrote:

Ugh, except option 1 is also bad in the (admittedly "really you shouldn't be doing this" case of):

my $obj = bless [], "HASH";
$obj->isa("HASH")    # is true
$obj->isa("ARRAY")   # is true

$obj isa HASH    # is true
$obj isa ARRAY   # is true

If we warn (or croak) about either of the final two lines we have to about both.

Where does "have to" come from here, from the implementation or from the semantics? I think it would be perfectly reasonable, semantically, for this to yield true for HASH and false for ARRAY.

All this comes from the annoying dual-purpose use of ref(), which yields the package name of blessed objects but a stringified allcaps representation of the reftype for unblessed objects. I submit that has always been a mistake. :( But of course too late to fix that now...

I guess this is the aspect of implementation that makes it so, but is the implementation of isa forced to use ref as its sole source of information, and if so why? Elsewhere we already have the concepts of blessed(), reftype() etc.

I think we should have something that will take a class name and tell you if an object has the class in its inheritance hierarchy (for real, not by special-casing a specific list of names); it would make far more sense for that something to be called isa than anything else, so I think it is worth at least exploring what the costs would be of making it so, and whether there's anything we can do to mitigate those.

For example, could we remotely consider a deprecation cycle that would aim towards changing both the method and the keyword, pointing users at reftype() when they want the underlying type, and perhaps at a new Scalar::Util::broken_isa if they specifically want the original behaviour?

leonerd commented 3 years ago

In #18585, @leonerd wrote:

If we warn (or croak) about either of the final two lines we have to about both.

Where does "have to" come from here, from the implementation or from the semantics? I think it would be perfectly reasonable, semantically, for this to yield true for HASH and false for ARRAY.

That comes from the fact that in option 1, we are pre-filtering the possible $class values early on; we are forbidding (by either warn, croak, or return false) the concept of testing isa HASH or similar names.

This is how it differs from option 2, wherein we alter the behaviour of the underlying testing function, to not report that it matches when it only matched because of the reftype string name.

I guess this is the aspect of implementation that makes it so, but is the implementation of isa forced to use ref as its sole source of information, and if so why? Elsewhere we already have the concepts of blessed(), reftype() etc.

That was where I was aiming at with option 2. Option 2 says we use a different source of truth.

I think we should have something that will take a class name and tell you if an object has the class in its inheritance hierarchy (for real, not by special-casing a specific list of names); it would make far more sense for that something to be called isa than anything else, so I think it is worth at least exploring what the costs would be of making it so, and whether there's anything we can do to mitigate those.

Yes; this is option 2, which is the one I like.

For example, could we remotely consider a deprecation cycle that would aim towards changing both the method and the keyword, pointing users at reftype() when they want the underlying type, and perhaps at a new Scalar::Util::broken_isa if they specifically want the original behaviour?

That could be another avenue, yes. In practice I wonder how much it would break though.

I think if we were to consider that one, and aiming people at using reftype, we'd best make sure that is provided at the true language level as a true operator, rather than having to be imported from Scalar::Util in its current implementation. So such an option would be somewhat dependent on getting those things into core.

Grinnz commented 3 years ago

I don't feel that users have any expectations that something named "isa" would consider an object "HASH" just because its underlying reftype is. So option 2 would be my choice, barring any performance considerations.

hvds commented 3 years ago

That comes from the fact that in option 1, we are pre-filtering the possible $class values early on; we are forbidding (by either warn, croak, or return false) the concept of testing isa HASH or similar names.

This is how it differs from option 2, wherein we alter the behaviour of the underlying testing function, to not report that it matches when it only matched because of the reftype string name.

Thanks for the clarification; I guess I was confused by the wording "those reftype names" in option 2, which I read as referring to a set of strings rather than their provenance. I think recasting option 2 in terms of what it does rather than what it doesn't do might help avoid such confusion - based on what I think that would say, I'd swing to 2 too.

I think if we were to consider that one, and aiming people at using reftype, we'd best make sure that is provided at the true language level as a true operator, rather than having to be imported from Scalar::Util in its current implementation. So such an option would be somewhat dependent on getting those things into core.

While I'd be in favour of making reftype and blessed keywords, I don't see that as being required for this - the main argument I could see for needing it would be for one-liners, but I'd expect the need for reftype in one-liners to be exceedingly rare.

yuki-kimoto commented 3 years ago

Is it possible to be a compile error if ARRAY or HASH is used by the isa operator?

isa operator right operand is bareword ARRAY, HASH, not string like "ARRAY", "HASH".

so I ask this question.

leonerd commented 3 years ago

isa operator right operand is bareword ARRAY, HASH, not string like "ARRAY", "HASH".

In the above examples it has been. It doesn't have to be - this is just as valid:

my $classname = "ARRAY";

if($obj isa $classname) { ... }
yuki-kimoto commented 3 years ago

use Foo; # ok use "Foo"; # not ok.

on the other hand,

$obj isa ARRAY # ok $obj isa "ARRAY" #ok

Can I understand this way?

I thought the right side was a bare word.

leonerd commented 3 years ago

I thought the right side was a bare word.

It can be. It doesn't have to be. I made it cope with either; so it has a certain symmetry with ->new methods which can be invoked on a bareword or any kind of expression (of which a string literal is one):

my $obj = SomeClass->new;
ok($obj isa SomeClass);

my $obj = "A::Class::String"->new;
ok($obj isa "A::Class::String");

my $cls = "Dynamic::Class";
my $obj = $cls->new;
ok($obj isa $cls);

This wouldn't work for use because that happens at compiletime, too early for any assignments to have taken place:

my $module = "Some::Module";
use $module;   # at BEGIN time $module is not yet defined
salva commented 3 years ago

While this is unfortunate, it would require divorcing the operator from the isa method somehow. I don't think that's a good idea.

One reason why the isa operator must rely on the isa method is that the later may be redefined. For instance, some class A wanting to impersonate some other class B without actually deriving from it could do as follows:

package A;

sub isa {
  my ($self, $class) = @_;
  return 1 if $class eq "B";
  $self->SUPER::isa($class)
}
yuki-kimoto commented 3 years ago

@leonerd

ok.