Draco-lang / Language-suggestions

Collecting ideas for a new .NET language that could replace C#
75 stars 5 forks source link

[WIP] Type inference #42

Open LPeter1997 opened 2 years ago

LPeter1997 commented 2 years ago

Goal of the document

This document describes the type-inference the language allows for and the general principle we would want to follow to make the type inference practical and predictable.

General principle

The general rule that the rules should follow are:

Type inference should not generalize further than what the user does/explicitly states.

This means, that we should not search further for a common base-type or generalize as far as possible while making the code compile, but only involve types, which the user involves specifically. We believe this makes the inference algorithm way more predictable and local to the inferred element.

Inference rules

The following subsections will describe the inference (informally) for different elements with some samples. Hopefully the examples and explanations will make the rules clear.

Return type

Functions with inline expressions should allow return-type inference and the inferred return type should be whatever the determined expression type is.

func f1(x: int) = x; // OK, inferred to be (int) -> int
func f2() = Console.WriteLine(""); // OK, inferred to be () -> unit
func f3() { // ERROR: functions with a body need explicit return type,
            // assumed to be unit otherwise
    return 1;
}

Assignment, variables

A bit formally: If a variable - without explicit type specification - is assigned values v1: T1, v2: T2, ..., vn: Tn, then the variable is inferred to have the type Tm, where Tm is one of T1, ..., Tn and for all 1 <= i <= n: Tm is a supertype of Ti. If there is no such Tm, there is a type error.

Informally, if a variable is assigned multiple values, then the variable is inferred to the type of the assigned value, that's the base (or supertype) of all other values. If none of the types are the base type of other (for example, all are different derived types of some not mentioned base type, or they are just plain incompatible) then there is a type error.

Let's use the following C# class hierarchy for the examples:

interface ICommonType {}
class Base {}
class Derived1 : Base {}
class SubDerived : Derived1 {}
class Derived2 : Base, ICommonType {}
class Derived3 : Base, ICommonType {}

Trivially, if there's a single assignment, the variable is inferred to be the type of that expression:

{
    var x;
    x = Derived2();
    var y = Derived1();
} // x: Derived2, y: Derived1

When there is a common base type assigned, that will be the inferred type:

{
    var x;
    x = Derived1();
    x = Base();
    x = Derived2();
} // x: Base

Not mentioning a common base type results in a type error:

{
    var x;
    x = Derived1();
    x = Derived2();
} // TYPE ERROR: Could not infer type of 'x'

// Correct:
{
    var x: Base;
    x = Derived1();
    x = Derived2();
}

There is no generalization, the inferred type will always be the most general mentioned type:

{
    var x;
    x = Derived1();
    x = SubDerived();
} // x: Derived1

Interfaces are also fine base types:

{
    var x: ICommonType = Derived2();
    var y;
    y = Derived2();
    y = Derived3(); // This would be an error, if there were no more assignments
    y = x; // This allows for generalizing to the interface
} // y: ICommonType

Literal types

The type of literals is always a tricky question: 12 could be int, uint, float32, float64, or any other integral type. C# greedily infers this to be int, meaning that we have to use suffixes, if we want inference to properly kick in:

var a = 1; // int
var b = 1U; // uint
var c = 1.0; // double
var d = 1.0f; // float

A system that does not greedily do this is not much more complex, but removes significant noise from the literals. There are languages that are smarter about this, so we could also try to imitate that.

An integer literal could put a constraint on the inferred type, that it has to be an integral type that is able to hold the specified value. If usage can uniquely determine the type, that's great. If not, then we can use some default, like int (which is a decent default, given that C# uses it for literals already).

Branches

Since if-else returns a value, we need to resolve ambiguities there too. Still using the hierarchy defined above, the following should be correct:

var x = if (true) Derived1() else Base(); // x: Base

But the following should not be correct, because a common base was never mentioned:

var y = if (true) Derived1() else Derived2(); // ERROR

Specifying the common base type from the outside should resolve this:

var x: Base = if (true) Derived1() else Derived2(); // Ok

func f(): Base = if (true) Derived1() else Derived2(); // Ok

Which means that type annotations always get priority, when doing inference.

Generics

TODO

WhiteBlackGoose commented 2 years ago

For generics it would be nice that type inferer could propagate inference of type arguments knowing the constraints.

Example:

// cannot infer
ConstrainedMethod(new Heh());

// works
ConstrainedMethod<Heh, int>(new Heh());

static U ConstrainedMethod<T, U>(T a) where T : Quack<U>
{
    return a.Hhh();
}

public interface Quack<T>
{
    T Hhh();
}

public class Heh : Quack<int>
{
    public int Hhh() => 5;
}

So the idea here is that

  1. We can infer T by provided argument. It is surely Heh.
  2. We don't know U - no arguments with this type were provided.
  3. Look at the constraints - T is constrained to IQuack<U>
  4. T is Heh, Heh implements one IQuack<int>, hence U should be int.
333fred commented 2 years ago

If you do want to infer from constraints, you will have to be very careful of performance and recursion issues (two of the main reasons C# doesn't infer from constraints). For example, CRTP is about to become a lot more common in .NET, and you want to make sure you're not going to SO on inferring constraints for it. You'll also want to think about ambiguity scenarios, which can happen particularly with extensions. You can't overload on constraints in .NET (another reason C# doesn't support inferring from them), but you can observe methods that differ by constraints via extensions.

WhiteBlackGoose commented 2 years ago

Regarding return type inference. It has become more or less clear that having the same syntax for unit return and inferred type is not a good idea, so I suggest explicit syntax for return type inference, which is a colon + discard symbol.

A few examples explain better:

func sqr(a: int32) = a * a;  // expected "unit", actual is "int32"

func sqr(a: int32): _ = a * a; // inferred to be int32

func sqr(a: int32): _ {
    print("$a is passed in");
    a * a
}

public func sqr(a: int32): _ = a * a; // public API must be explicitly typed

public func sqr(a: int32): int32 = a * a; // works, no inference

So we disallow return type inference for public api, but it's fine for the rest.

WhiteBlackGoose commented 2 years ago

The discard syntax seems also universal for inferring from target types. For example,

val a : List<int> = List<_>() // inferred to be int, because the target type is List<int>
val a = List<_>() // cannot be inferred
func something(arg: List<string>): _ = arg[0];

something(List<_>()) // inferred to string
func something(arg: Map<string, int>): _ = 0;

something(Map<_, _>); // inferred to be string and int respectively
func something<T>(arg: (T: IList<int>)): _ = arg.Sum();

something(List<_>()); // this one is a bit more complex. T is constrained to IList<int>, 
// and List<T> implements IList<T>, that means, that it can only be int.
func something<TList, T>(arg: (TList : IList<T>)): _ = 0

something(List<_>()); // cannot infer that
func something<T>(arg: T): _ = arg

val result = something<_>(5); // inferred from passed arguments
func something<T: new>(): _ = new T()

val result = something<_>(); // cannot be inferred
func something<T: new>(): _ = new T()

val result : int32 = something<_>(); // inferred to int32