dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.06k stars 4.04k forks source link

Proposal: Immutable Objects #7626

Closed alrz closed 7 years ago

alrz commented 8 years ago

Immutable Objects

This proposal (inspired by C++ const qualifier) can be considered as an extension to #159 utilizing #7561. I want to take that motivation further and suggest to enforce immutability through function purity and immutable variables. This would be yet another level of compile-time checking beside of #5032 — preventing mutation on variables, rather than null values.

Immutable variables

Immutable variables/fields/params are a superset of readonly variables/fields/params (#115), therefore, any immutable variable/etc is also readonly; so there is no need to explicitly declare them as readonly.

An immutable variable can be declared as follow:

immutable foo = ... ;
immutable Foo foo = ...;

This is consistent with from and let syntaxes which permit an optional type rather than var keyword.

This feature enables us to guarantee an immutable view of a mutable type, e.g. Array or ValueTuple to avoid yet another immutable variant of these types, like ImmutableArray or ImmutableTuple which is kinda nice.

// assuming mutable tuples 

// regular tuple
var tuple = (new Foo(), new Bar());

// readonly variable
let tuple = (new Foo(), new Bar());

// immutable tuple object
immutable tuple = (new Foo(), new Bar());

immutable int[] arr = {1,2};

Note that generated constructor and properties' getters of records and anonymous types are implicitly pure.

Pure methods

Original thread: #7561.

Therefore, marking a method or constructor as pure is not a breaking change if it is already pure.

With this feature we can, for example, define something like ImmutableList<T> on top of an existing List<T> with pure methods:

// assuming
public const List<T> Append(T item) {
    return new List<T>( ... )
}

// can only call pure methods
immutable List<T> list = new List<T>( ... );

// infers immutable List<T>
var newList = list.Append(item); 

If a type didn't provide such methods, an extension pure method can be used as long as ctor is pure.

We would use the existing PureAttribute with actual purity enforcement; this could make already annotated code considered to be pure which consequently lessens the work to extending existing code to be compatible with this feature.

Immutable types

Original thread: #159.

Following C#'s rule for abstract classes and methods, immutable types require every field to be declared as immutable and constructors and methods have to be pure.

immutable class Foo {
    immutable Bar bar;
    const Foo Baz() { ... }
}

Note that Bar is not essentially an immutable type, but bar is an immutable variable.

immutable class Foo { ... }

 // infers immutable Foo
var foo = new Foo();

There is no immutable generic constraint since you would be able to use immutable parameters.

HaloFour commented 8 years ago

So this proposes to solve the self-mutating readonly value type problem by making it a compile-time error to call said methods due to their impurity?

immutable Point point = Point.Empty;
point.Offset(2, 2);  // compiler error, Point.Offset is not pure
alrz commented 8 years ago

@HaloFour Yes. If Point is tend to be used in an immutable context, functions like Offset should be defined as pure and return a new object. (I presume Point is a class in your example because mutating structs is not a good idea, due to the boxing and unboxing, right?)

svick commented 8 years ago
  1. Can you declare a mutable variable of an immutable type?
  2. Does this proposal mean that marking an existing function as pure is a breaking change? E.g. var newList = list.Append(item); newList = newList.Append(anotherItem); would work only for non-pure Append(), right?

    And does it also mean that marking an existing immutable type as immutable is a breaking change too?

  3. Can you somehow convert between mutable and immutable variants of the same type? (It would probably require copying.)
  4. How is this going to be enforced? Only by the compiler, or does it require CLR changes?
alrz commented 8 years ago

@svick

  1. No. C++ even has a mutable keyword to be able to mutate variables inside pure functions but, to quote @jaredpar from #159 (comment)

    that is precisely the problem that immutable types attempt to solve. They can be used without any context on how the type was created or care about who else has a reference to them. Once the possibility of mutations are introduced, even in a specified context, that guarantee goes away and they are just another mutating value.

  2. (a) No, because pure methods only return immutable objects (by default) in an immutable context. (b) Probably yes, because Foo foo = new Foo(); implies that foo is mutable. I don't think that it can be transparent, because variables are either explicitly immutable or implicitly mutable. However, we can relax this restriction to not require immutable on variables for immutable types. But it might become inconsistent and problematic over time.
  3. For structs this is probably fine (with copying) but I don't think that it would apply to reference types.
  4. I think #159 already needs CLR support, so does this too.
HaloFour commented 8 years ago

@alrz I'm working with an existing example from the CLR. System.Drawing.Point being a struct with a self-mutating Offset method. It's a problem for readonly variables given that there is nothing that the compiler can really do to prevent that the underlying value type is actually being modified:

readonly Point point = Point.Empty;
point.Offset(2, 2);
Debug.Assert(point.X == 2 && point.Y == 2); // oops, point was changed

If that was combined with some enforcement of pure functions then this situation could potentially be avoided.

Where these kinds of proposals run into problems is that short of CLR changes all you have are compiler hints, which could very easily be ignored or faked. There's nothing that enforces that a method decorated with PureAttribute (or any other arbitrary metadata) to actually be pure.

alrz commented 8 years ago

@HaloFour So I think it is actually good to make it a compiler error in an immutable context. Because it doesn't make sense and it is dangerous! "It's a problem for readonly variables" this is not their intent. readonly just means a read-only variable. that's it. readonly doesn't and don't have to enforce immutability of the underlying type. makes sense to me. "CLR changes" Yes, I think so. As I said, I'm relying on #159 which has the same problem AFAIK.

HaloFour commented 8 years ago

@alrz I'd largely agree. Even without CLR enforcement I think that immutable variables (and I imagine fields) could bring a lot of value.

Now for the $64,000 question, since let appears to be taken would the shorthand syntax here be val? :wink:

alrz commented 8 years ago

@HaloFour I'd rather choose same keywords for variables and types, val class doesn't make much sense. I even wanted to use immutable for pure functions too, since they always return immutable values so immutable T F() {} looks good. but I preferred to use const from C++, otherwise it might be confusing. Actually it's nice as long as you don't mind 3 tightly related keywords. I'd prefer one: const.

const class Foo {
    const Bar bar;
    const Foo Baz() { ... }
}

Because there is no reason to declare an immutable variable of primitive types (currently valid for constants). As for const string str = null; with #5032 string would be a non-nullable so it is already needed to be written like const string? str = null; because immutable variables are required to be non-nullable it can be known that what was the user intention. This can also potentially introduce #2401.

HaloFour commented 8 years ago

const already has a meaning with local variables and fields. const string str = null; is already legal C# code. Even if the result would be functionally similar I think it would be very confusing if the same syntax could result in two different things and behaviors depending on use.

vladd commented 8 years ago

What is the value of an immutable view onto a mutable object? An optimizer cannot eliminate reads because of aliasing. Thread-safety is not guaranteed by the same reason.

C++'s const needs hacks like const_cast and mutable, are we going to have them as well?

About the technical side, how are we going to know for an existing object, which methods are safe to call? If we are going to require explicit marking, this seems to exclude the existing types from being used with the proposed feature.

alrz commented 8 years ago

@vladd Aliasing wouldn't be a problem, because every immutable object is indeed immutable. You cannot cast an immutable object to a mutable one. This is not an exact port of C++ const in the managed code. For an immutable object, all variables, fields and parameters shall be annotated with immutable or be inferred by var for variables. Existing ImmutableXXX types remain untouched because their intention is to provide immutability in an unchecked context. However, using immutable variables would be preferable because they provide deeper immutability guarantees via pure functions etc.

vladd commented 8 years ago

@alrz But if so, how would immutable List<T> list = new List<T>( ... ); work? The constructor is going to create a mutable object, and could have stored a mutable reference to it somewhere.

However if we are talking about the types which are immutable intrinsically, they wouldn't need immutable qualifier, right?

alrz commented 8 years ago

@vladd "The constructor is going to create a mutable object" creating a mutable object woudn't be a problem because pure functions cannot mutate them anyways. But it mustn't leak them beside of this obviously, meaning that to be able to create an immutable instance of a mutable type, the constructor in question must be pure. I'd rather leave the definition of pure to the #7561. "they wouldn't need immutable qualifier, right?" You do need to do so for ImmutableArray<T> for example (though via the type name itself), you can't pass it to a parameter of type T[] right? Same would apply to immutable types. You should be explicit about it — only an immutable variable can hold an immutable type instance.

vladd commented 8 years ago

@alrz Well, we cannot be sure that the constructor doesn't leak this -- at least for the classes which are not under our control (e.g., List<T>).

And what about just simple List<T> mut = new List<T>(); immutable List<T> imm = mut;? Are we going to prohibit this?

alrz commented 8 years ago

@vladd If List<T> is meant to be used in an immutable context it should provide a pure constructor. As I said, properties also should be pure to be callable via an immutable instance. The good thing about records is that the generated constructor and properties' getters are implicitly pure so you can use them with an immutable variable "for free". "Are we going to prohibit this?" You are assigning a mutable List<T> to an immutable variable. If this wasn't prohibited then mutation to the object mut practically mutates imm which is not desirable — this is due to the fact that List<T> is a reference type. Same with a struct would just work because you have a copy of the value itself rather than a copy of a reference.

vladd commented 8 years ago

@alrz Okay, this is already better. But this means that the existing classes all need to be extended to mark the immutable constructors (or to add immutable ones and mark them, too), right?

alrz commented 8 years ago

@vladd We've been already through this for the whole System.Collections.Immutable package, but in that case it was "duplicating" instead of "extending". I think making the existing types compatible with this feature would be much less work to do, and yet it'll worth it. I can think of this as the same path that has been taken to introduce async/await into the language — It indeed affects the whole .NET ecosystem.

vladd commented 8 years ago

@alrz As for me, I see this feature as well better in the form of classes being completely immutable from scratch. Making "two classes -- normal and immutable -- on the price of one" does not seem to work. E.g., for immutable classes you need a different API: ImmutableList<T>.Add must return a new list, etc., and not just be unavailable. Having impure methods just removed in the immutable version of class doesn't seem to be a good idea for me.

alrz commented 8 years ago

@vladd Then you woudn't be able to guarantee its immutability. This is the part that I directly stole from C++ and it seems OK.

vladd commented 8 years ago

@alrz If a class declares itself to be completely pure/immutable, there shouldn't be a big problem for the compiler to ensure/enforce this property, right? (At least I believe so.)

alrz commented 8 years ago

@vladd Yes, then every method has to be pure which will be verified in particular. And then all of them are callable from its instances.

axel-habermaier commented 8 years ago

@HaloFour: Actually, your example doesn't work that way:

    using System;
    using System.Drawing;

    class Program
    {
        static readonly Point point = Point.Empty;

        static void Main(string[] args)
        {
            point.Offset(2, 2);
            Console.WriteLine(point);
        }
    }

prints {X=0,Y=0}. The reason is that the compiler makes a copy of the readonly field precisely to prevent modifications of point:

IL_0000: ldsfld valuetype [System.Drawing]System.Drawing.Point ConsoleApplication12.Program::point
IL_0005: stloc.0 // The point is now stored in the local variable at index 0
IL_0006: ldloca.s 0 // The address of local at 0 is loaded onto the stack
IL_0008: ldc.i4.2
IL_0009: ldc.i4.2
// The method is called on the local variable
IL_000a: call instance void [System.Drawing]System.Drawing.Point::Offset(int32, int32)
// The field is loaded again
IL_000f: ldsfld valuetype [System.Drawing]System.Drawing.Point ConsoleApplication12.Program::point
// is then boxed and passed to WriteLine
IL_0014: box [System.Drawing]System.Drawing.Point
IL_0019: call void [mscorlib]System.Console::WriteLine(object)
// The function returns without ever writing the local back to the field
IL_001e: ret

This behavior is a minor source of inefficiency that could have been avoided had the proposal been implemented into the language from day one. I don't think that the behavior can be changed now, however, without any breaking changes.

If the field were not readonly, the compiler would use a ldflda instruction to load the field's address onto the stack, therefore modifying the field's value. As a side node, in the readonly case, Resharper warns that "impure method is called on readonly field of value type".

alrz commented 8 years ago

@axel-habermaier However, same woudn't work with a reference type. This rule exists to make static readonly fields on structs safe to some extent. I can imagine that readonly locals also follow this rule for value types. But there are some other places that cause to boxing, like using interfaces or calling virtual methods on structs, and it is very common. I wish there was a feature to prohibit implicit boxing on structs; or at least cause to cache the box (using some smart types like #int or totally transparently).

About Resharper, how it finds out about purity of functions? If it examines IL, well done! By the way, Return value of pure method is not used is something that can be considered for pure methods as a language feature.

HaloFour commented 8 years ago

@axel-habermaier I was referring to readonly locals, but I suppose that the compiler could always copy the local into another temporary local slot prior to invoking any instance methods.

I also wonder how R# detects this situation short of interpreting the IL of the method and any method it calls.

alrz commented 8 years ago

@svick I've updated the point (2), I didn't quite get it at first.

axel-habermaier commented 8 years ago

@alrz, @HaloFour: There is no magic involved with Reshaper: it relies on the [Pure] attribute (either its own one or the one provided by .NET). All methods marked with the attribute are considered not to change any fields of the struct. Jetbrains annotated most of .NET with the attribute (and others) where appropriate. Unfortunately, Resharper does not seem to check whether you're actually not modifying any fields when implementing such a method by yourself.

Consequently, Resharper's solution is not as safe and convenient as C++'s const, but it already helps capturing a large variety of common bugs, which is very helpful.

alrz commented 8 years ago

@axel-habermaier Re "Jetbrains annotated most of .NET with the attribute" you mean outside of .NET framework itself, right? meaning that they have a list of (presumably) pure methods somewhere. That was my first guess. But I thought that if it's relying on IL it is indeed magic.

axel-habermaier commented 8 years ago

@alrz: Yes, outside of .NET. No IL inspection is involved.

shahid-pk commented 8 years ago

i don't know if that would be plausible but it will be nice to expand "let" and "var" like swift. so we would get the type inferred version. let one = 1; // immutable and not read only var one = 1; // mutable , similar to what it is right now
And the none type inferred version. let int one = 1; // immutable int one = 1; // mutable, like what it is right now OR var int one = 1; // mutable, expand var for declaring mutability for sake of consistency

Generally i think it well be great if both "var" and "let" could be used for type inference as well as mutability and immutability respectively. Definition of immutable types can stay like what this proposal is proposing.

Note : I am just starting with Roslyn and language grammars for that matter . so i don't know if this would be plausible or possible without breaking people but still it will be great if possible.

alrz commented 8 years ago

@shahid-pk FYI, let is already proposed for destructuring statements (#6400). However, I think any other keyword wouldn't be better, because in practice there will be more (see #7561 for discussion).

mariusmg commented 8 years ago

Why not reuse the "readonly" in variable declaration and change it's behaviour to also include true immutability ?

readonly string ff;

Having both readonly and immutable/let/whatever will just needlessly complicate things.

HaloFour commented 8 years ago

@mariusmg I think it would be more confusing to have readonly mean something different for locals than it does for fields. Proposal #115 already aims to have readonly locals and parameters.

aluanhaddad commented 8 years ago

If existing classes such as System.Collections.Generic.List<T> can be extended with pure methods, can those methods differ from existing methods only in terms of purity? In other words, is immutablity taken into account when performing overload resolution?

alrz commented 8 years ago

@aluanhaddad No, because CLR doesn't allow methods with the same name and signature, and also, since non-immutable objects are allowed to call methods regardless of their purity, you can't have an overload resolution based on it.

aluanhaddad commented 8 years ago

@alrz Right, that's currently the case, I was asking if this proposal implied such a change.

MI3Guy commented 8 years ago

I believe you could use modopt and modreq to represent purity and have different methods with the signature only differing by purity.

alrz commented 7 years ago

Closing in favor of readonly receivers (https://github.com/dotnet/csharplang/issues/421) as it addresses use cases mentioned here.