facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.18k stars 2.99k forks source link

classname<T> and private covariant type permits type violation #7216

Open acrylic-origami opened 8 years ago

acrylic-origami commented 8 years ago

HHVM Version

$ hhvm --version
HipHop VM 3.13.1 (rel)

Standalone code, or other way to reproduce the problem

Define a class covariant on one of its type parameters and assert the constructor is consistent:

<?hh // strict
<<__ConsistentConstruct>>
abstract class A<+T> {
    abstract public function __construct(T $v);
    abstract public function foo(): int;
}

Then extend that class and constrain that same type parameter to a derived type, and act on that type in some methods:

<?hh // strict
class Base {}
class Derived extends Base {
    public function __construct(public int $foo) {}
}
class B<+T as Derived> extends A<T> {
    public function __construct(private T $v) {}
    public function foo(): int {
        return $this->v->foo; // we assume here T is a subtype of Derived
    }
}

Cast the classname of the extended class to the classname of the base class parameterized with a supertype of the extended class's constraint. Invoke a method that acts on the type parameter in the extended class.

<?hh // strict
class C {
    public static function foo(): void {
        self::bar(B::class);
    }
    public static function bar(classname<A<Base>> $v): void {
        echo (new $v(new Base()))->foo(); // `B::foo` attempts to summon an int from thin air! 
                                          // It's not very effective.
    }
}

This is where B needs to be covariant on T: the typechecker doesn't allow this cast if it isn't.

Expected result

Some sort of variance violation? I don't know which is the more illegal step: allowing private properties of a covariant type, or the classname cast.

Actual result

No errors! from the typechecker, but

Notice: Undefined property: Base::$foo in B.php on line 10

Catchable fatal error: Value returned from method B::foo() must be of type int, null given in B.php on line 10

from executing C::foo().

jwatzman commented 8 years ago

Hey @dlreeves this is interesting -- can you make sure Andrew Kennedy sees this, seems like this sort of thing he's been looking into recently. And maybe also tell him to sign up for GitHub and put the email address he commits with on the account and use our internal tool to add him as a contributor to HHVM so I can cc him properly :-P

acrylic-origami commented 8 years ago

I've stared at this a bit longer, and I think allowing +T-typed arguments in constructors is itself problematic, because functionality in B::__construct could still assume that/those argument[s] are constrained by Derived. Then new $v(new Base()) in C::foo alone would violate that assumption.

From my understanding, usually the combined fact of the constructor only being executed exactly once at instantiation, and the calling scope knowing exactly what class is being instantiated, implies that the constructor can't modify type. classname<T> allowing a cast before instantiation strips the constructor of its sacred first-type-related-thing-to-execute property, which lets the constructor mishandle types.

andrewjkennedy commented 8 years ago

Another good catch! Even without classname, there is a problem with the semantics of "private" (meaning accessible from the code of the class but at any instantiation). Consider the code below, which breaks the type system. (Scala has a notion of "object-private" to avoid the typehole. See http://www.scala-lang.org/files/archive/spec/2.11/04-basic-declarations-and-definitions.html#variance-annotations. I'm considering doing something similar in Hack.)

<?hh // strict
class Box<+T> {
  // OK, we've got a private field whose type involves the covariant T
  public function __construct(private T $elem) {
  }

  // As usual, a (safe) getter method
  public function get(): T { return $this->elem; }

  // Private gives us access to arbitrary instances of Box, even in static
  // methods. Note the use of covariant subtyping to put a string in
  // a Box<mixed>
  public static function updateAsString(Box<mixed> $x, string $s) : void {
    $x->elem = $s;
  }

  // We can now use this method to overwrite an integer with a string
  // but return it as an integer
  public static function morphIntToString(int $i) : int {
    $x = new Box($i);
    Box::updateAsString($x, 'hey you turned me into a string');
    return $x->get();
  }

  // Actually do it
  public static function useBox(): void {
    $i = Box::morphIntToString(23);
    echo('this should be an integer: ' . $i);
  }

}
andrewjkennedy commented 8 years ago

The issue with object-private above is orthogonal.

Your analysis is good. One small note: B itself does not have to be covariant. It's enough to have a constraint on the parameter and covariance in the superclass.

What if __ConsistentConstruct required the constructor signatures and constraints on type parameters occurring in the signatures to be preserved down the hierarchy? That way we wouldn't be able to even write A<Base>.

acrylic-origami commented 8 years ago

I think what you suggest with __ConsistentConstruct will work completely! Here's my thoughts on some of the weirder cases that it still covers:

  1. The absence of static classes in Hack disallows the constructor to violate type with something like:

    public function __construct(T $v) {
     StaticClass<T>::v = $v;
    }

    To borrow C#'s behaviour in this case: if this were possible, because generic methods retain their original parameterization, the new $v(new Base()); call in C::bar(classname<A<Base>> $v) actually modifies StaticClass<Derived>. Not a problem in Hack!

  2. Constraints can only use exactly one type to bound another: consequently, we can't make a type that is constrained on both Derived and T, so the other methods of manufacturing types, like type constants and generics via angle bracket syntax, pose no problems.
  3. There is no way for functionality in C::foo to act on members associated with the Base type, simply because there are no concrete objects to act upon. The static members of whatever classname<B<Derived>> $v is cannot be typed with generic types.

Still, in C::bar, $v's constructor is really a (function(Derived): void). It being passed a Base sends chills down my spine, but if provably no implementation couldn't be implemented as a (function(Base): void), I suppose that's good enough? I'm curious if there any implementation difficulties stemming from the invalidity in the general case.