Clarify Inference - Githubissues

mindplay-dk commented 6 years ago

Inference is mentioned, but not formally described in the RFC at this time.

To what extent will we support inference? (for example, from literals only, or from variables?)

orolyn commented 6 years ago

This section does rub me the wrong way, because it opens the possibility of create a reified type the developer does not know the details of.

class Foo<T>
{
    private $var;

    public function __construct(T $var)
    {
        $this->var = $var;
    }

    public function getVar(): T
    {
        return $this->var;
    }

    public function doSomething(T $something)
    {
    }
}

function bar($var): Foo<?????>
{
    $object = new Foo($var);
    ...
    ...
    return $object ;
}

$foo = bar(/* Some arbitrary value */);
...
// Somewhere further down the line.
$foo->doSomething(/* What type is allowed? */);

morrisonlevi commented 6 years ago

In my mind inference is dependent on a bunch of stuff from our optimizer making it into our core language, instead of an extension. I believe this is the goal for 8.0, but I've heard something like that before and it didn't happen...

Anyway, the reason it needs the optimizer is that without it the inference would be very brittle.

Inference should happen on inputs only - that's my opinion anyway. So we can infer type parameters for classes if the type parameters are used as constructor parameters, and we can infer stuff for functions based on function parameters. So your example with bar wouldn't work as it can't infer the return type based on parameters. However new Foo("arg1"); would work and be inferred as Foo<string>.

orolyn commented 6 years ago

That squashes all concerns with inference that come to mind.

natebrunette commented 6 years ago

I agree, I think what you'd want to do here is this instead

function bar<T>(T $var): Foo<T>
{
    $object = new Foo($var);
    // ...
    return $object ;
}

Then you can do

bar('foo');

And get a Foo<string> back.

arnaud-lb commented 2 years ago

@morrisonlevi

Anyway, the reason it needs the optimizer is that without it the inference would be very brittle.

Trying to paraphrase you, to make sure I fully understand your point: If the basis for inference is the runtime type of inputs, the resulting types may be unpredictable.

I've tried to make an example where this might be a problem:

interface A {}
interface B extends A {}
interface C extends A {}

class Collection<T> extends \Iterator<int,T> {}

class Box<T> { function __construct(T $e) {} }

$collection = new Collection<A>();

foreach ($collection as $elem) {
    $box = new Box($elem); // may be a Box<A>, Box<B>, Box<C>, or any Box<sub-type of A>
}

The unpredictability of the type of $box here would be a problem later when interacting with other Box objects or other generic types inferred from one of these Box objects.

So the alternative would be to use statically known type information to infer types. In the example above, we know statically that $elem is a A, so every $box is a Box<A>. The behavior is determined by the syntax and semantics rather than by runtime data, which is more stable, predictable, and easier to reason about.

Did I get your point correctly ?

I agree that using runtime types for inference would be very brittle. Using the optimizer's static analysis capabilities may be a solution. This would impose new constraints on the optimizer, though. I guess that currently it is possible to change/improve the optimizer without impacting end users beyond performance. If it was used for type inference, any change in the optimizer could be a breaking change.

arnaud-lb commented 1 year ago

Thinking more about this, inferring the type of class constructor arguments at compile time seems difficult / impossible, because we can not rely on type information about symbols declared in other compilation units.

In the following example, the optimizer can not rely on the return type of get_value because it can change independently of the current compilation unit:

function f() {
    new Box(get_value()); 
}

There are two solutions I can think of right now, but these don't seem practicable:

We could eager load everything at compile time and invalidate opcache when dependent compilation units change. Unfortunately there are cases that can not be eager loaded (include, conditional declarations, variable function calls, variable new, etc).

Alternatively we could infer the types lazily at runtime, based on static type signatures, but this could be too expensive.

Unless there is a good solution to this problem, type parameter inference on classes doesn't seem practicable, so we may have to specify them explicitly:

new Box<Foo>(get_value());

For functions it may be ok to infer the type parameters based on the runtime values of arguments, so we would be able to omit the type parameters when calling functions.

Girgias commented 1 year ago

Thinking more about this, inferring the type of class constructor arguments at compile time seems difficult / impossible, because we can not rely on type information about symbols declared in other compilation units.

In the following example, the optimizer can not rely on the return type of get_value because it can change independently of the current compilation unit:
function f() {
    new Box(get_value()); 
}
There are two solutions I can think of right now, but these don't seem practicable:

We could eager load everything at compile time and invalidate opcache when dependent compilation units change. Unfortunately there are cases that can not be eager loaded (include, conditional declarations, variable function calls, variable new, etc).

Alternatively we could infer the types lazily at runtime, based on static type signatures, but this could be too expensive.

Unless there is a good solution to this problem, type parameter inference on classes doesn't seem practicable, so we may have to specify them explicitly:
new Box<Foo>(get_value());
For functions it may be ok to infer the type parameters based on the runtime values of arguments, so we would be able to omit the type parameters when calling functions.

I didn't think that hard about generics, but when I did, that is more or less the basic conclusion I've also reached is that doing type inference is nearly impossible. And as far as I understand it, the only way to do it is to have complete knowledge of the use cases.

But even then, due to the nature of OOP, it may be impossible to have inferred reified generics (AFAIK Java's generics are erased at runtime). The reason I think this is that it is difficult to pin which class should the generic type be reified too, let's take the previous hierarchy with some concrete implementations:

class A {}
class B extends A {}
class C extends A {}

 $objects = [new C, new B, new A, new stdClass];

class Collection<T> extends \Iterator<int,T> {}

$collection = new Collection();

foreach ($objects as $elem) {
    $collection->add($elem);
}

Trying to infer what the collection T type should resolve to here seems impossible, as taking the first type added may make it too restrictive if we wanted a collection of Bs, but too lax if we decided to just accept any class-types which are supertypes (as we would get a collection of As), or if we decide to find the "smallest common supertype" to cover all the classes (which in this case would resolve to object due to the stdClass).

So I do think class types will need to be specified explicitly

PHPGenerics / php-generics-rfc

Clarify Inference #8