dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.89k stars 4.01k forks source link

C# Language Design Review, Apr 22, 2015 #3910

Closed MadsTorgersen closed 8 years ago

MadsTorgersen commented 9 years ago

C# Language Design Review, Apr 22, 2015

Agenda

See #1921 for an explanation of design reviews and how they differ from design meetings.

  1. Expression tree extension
  2. Nullable reference types
  3. Facilitating wire formats
  4. Bucketing

    Expression Trees

Expression trees are currently lagging behind the languages in terms of expressiveness. A full scale upgrade seems like an incredibly big investment, and doesn't seem worth the effort. For instance, implementing dynamic and async faithfully in expression trees would be daunting.

However, supporting ?. and string interpolation seems doable even without introducing new kinds of nodes in the expression tree library. We should consider making this work.

Nullable reference types

A big question facing us is the "two-type" versus the "three-type" approach: We want you to guard member access etc. behind null checks when values are meant to be null, and to prevent you from sticking or leaving null in variables that are not meant to be null. In the "three-type" approach, both "meant to be null" and "not meant to be null" are expressed as new type annotations (T? and T! respectively) and the existing syntax (T) takes on a legacy "unsafe" status. This is great for compatibility, but means that the existing syntax is unhelpful, and you'd only get full benefit of the nullability checking by completely rooting out its use and putting annotations everywhere.

The "two-type" approach still adds "meant to be null" annotations (T?), but holds that since you can now express when things are meant to be null, you should only use the existing syntax (T) when things are not meant to be null. This certainly leads to a simpler end result, and also means that you get full benefit of the feature immediately in the form of warnings on all existing unsafe null behavior! Therein of course also lies the problem with the "two-type" approach: in its simplest form it changes the meaning of unannotated T in a massively breaking way.

We think that the "three-type" approach is not very helpful, leads to massively rewritten over-adorned code, and is essentially not viable. The "two-type" approach seems desirable if there is an explicit step to opt in to the enforcement of "not meant to be null" on ordinary reference types. You can continue to use C# as it is, and you can even start to add ? to types to force null checks. Then when you feel ready you can switch on the additional checks to prevent null from making it into reference types without '?'. This may lead to warnings that you can then either fix by adding further ?s or by putting non-null values into the given variable, depending on your intent.

There are additional compatibility questions around evolution of libraries, but those are somewhat orthogonal: Maybe a library carries an assembly-level attribute saying it has "opted in", and that its unannotated types should be considered non-null.

There are still open design questions around generics and library compat.

Wire formats

We should focus attention on making it easier to work with wire formats such as JSON, and in particular on how to support strongly typed logic over them without forcing them to be deserialized to strongly typed objects at runtime. Such deserialization is brittle, lossy and clunky as formats evolve out of sync, and extra members e.g. aren't kept and reserialized on the other end.

There's a range of directions we could take here. Assuming there are dictionary-style objects representing the JSON (or other wire data) in a weakly typed way, options include:

We'd need to think about construction, not just consumption.

var thing = new Thing { name = "...", price = 123.45 }

Maybe Thing is an interface with an attribute on it:

[Json] interface { string name; double price; }

Or maybe it is something else. This warrants further exploration; the right feature design here could be an extremely valuable tool for developers talking to wire formats - and who isn't?

Bucketing

We affirmed that the bucketing in issue #2136 reflects our priorities.

dsaf commented 9 years ago

There are additional compatibility questions around evolution of libraries, but those are somewhat orthogonal: Maybe a library carries an assembly-level attribute saying it has "opted in", and that its unannotated types should be considered non-null.

Agreed: https://github.com/dotnet/roslyn/issues/1648#issuecomment-87845381. Perhaps in that case IDE support of attributes should also be improved - could be a part of the #711 story. At the moment all assembly-level attributes are kind of hidden.

paulomorgado commented 9 years ago

Expression Trees

I agree that dynamic and async should be kept of, but everything else should, ideally, be in.

Nullable reference types

An extreme effort is being made to not change the CLR, but I started thinking like this:

  1. To add nullable value types, a new value type was added to store the value and nullability of the nullable value type.
  2. The CLR knows that nullabale wrapper and knows not to box null values and such.
  3. Value types are non nullable, therefore, when I want a nullable value type, I need to annotate it int?.
  4. It makes no sense to write int! because you can't change the nullability of int.

In this case, reference types are the dual of value types:

  1. Reference types are nullable, therefore, if I want a non nullable reference type, I need to annotate it - string!.
  2. To create non nullable reference types, a non nullable wrapper should be created. That wrapper should also be a value type in order to be non nullable.
  3. The CLR must know this type also enforce its rules and constraints.
  4. It makes no sense to write string? because you can't change the nullability of string.

That wrapper could be something like this:

public struct NonNullable<T>
    where T : class
{
    private  T value;
    public NonNullable(T obj)
    {
        value = obj;
        if (value == null)
        {
            throw new NullReferenceException();
        }
    }
    public T Value
    {
        get
        {
            if (value == null)
            // IF parameterless struct constructores
            // ever make into the language,
            // this test can be avoided.
            {
                throw new NullReferenceException();
            }
        }
    }
}
HaloFour commented 9 years ago

@paulomorgado

A struct wrapper has been discussed before. The inherent problem with such a type is that the default value is (and must be) null, e.g.: var oops = default(NonNullable<string>); You still end up with the same problem.

Unless something has changed recently all I've heard about non-nullability is that the types would remain exactly as they are and that attributes will be used to notate whether a parameter or whatever may be null and the compiler (or an analyzer) will perform flow analysis to show warnings if it detects the possibility that you will pass a null reference to that parameter. The language would add a syntax to simplify the annotations, but that's all they would be, e.g.:

public char FirstCharacter(string! value) {
    return value[0];
}

would be:

public char FirstCharacter([NotNull] string value) {
    if (value == null) throw new ArgumentNullException(nameof(value));  // potentially emitted by the compiler
    return value[0];
}
paulomorgado commented 9 years ago

@HaloFour, the real problem here is that there is no way (so far) to have a non null default value for reference types.

What you are showing in your very small example. can be achieved by code contracts.

More complex examples would require null checks on all assignments of value, which can be easy to do for varables, parameters and even properties but not for fields.

whoisj commented 9 years ago

We still need:

Object o(); // where o is a refernce and cannot be null due to RAII
object! myobject = o; // assignment works because o cannot be null
HaloFour commented 9 years ago

@paulomorgado Indeed, but an intermediate struct only moves the problem. And I agree that non-nullability does overlap with method contracts, particularly with the example given. But without massive CLR changes the best that I think could/should be accomplished is flow analysis as to how the variables used with notation to warn/error if it could lead to an NRE. Internally within a method that requires no additional metadata but anything exposed would require an attribute.

I think that I'm in the two-type camp, but I think I'd prefer T to denote a nullable reference and T! to denote a non-nullable reference. My problem with the approach of adding T? to denote a nullable reference and leaving T to denote a non-nullable reference is that it seems to create a dialect where the language means two things based on some kind of switch that is enabled somewhere. All existing code will just spew warnings and probably discourage people from adopting the feature. Then again, having to manually opt-into the feature would also likely discourage its use. Damned either way.

@whoisj Does the RAII syntax really buy anything beyond what var o = new object(); provides? The compiler should be able to infer non-nullability. My concern is that the RAII syntax might incorrectly lead people to think that the type is allocated on the stack or something.

whoisj commented 9 years ago

Does the RAII syntax really buy anything beyond what var o = new object(); provides? The compiler should be able to infer non-nullability. My concern is that the RAII syntax might incorrectly lead people to think that the type is allocated on the stack or something.

@HaloFour The RAII style could imply a local readonly object!, which would be useful. However, you're correct in that this is a non-owning reference and no collection will happen when it leaves scope. This is true with RAII if shared_ptr are used as well - and what are references but cousins of shared pointers?

vladd commented 9 years ago

@whoisj There's no such notion as "owning reference" in the language a the moment -- do you propose to introduce it? (And the difference between shared_ptr and references that having no more references doesn't trigger GC on the referenced object -- that's why there's no RAII in C#.)

whoisj commented 9 years ago

There's no such notion as "owning reference" in the language a the moment -- do you propose to introduce it? (And the difference between shared_ptr and references that having no more references doesn't trigger GC on the referenced object -- that's why there's no RAII in C#.)

@vladd I do not think RAII is really possible in C#, at least not without a lot of heavy lifting making hardly worth talking about (at least for now). No, my point was that RAII semantics can work here. Using something like object my(); to create a guaranteed not-null value would be nice. That value would leave scope and be up for collection unless a reference was added to it such as object! nevernull = my;.

It's just semantics.

paulomorgado commented 9 years ago

@HaloFour

I think that I'm in the two-type camp, but I think I'd prefer T to denote a nullable reference and T! to denote a non-nullable reference. My problem with the approach of adding T? to denote a nullable reference and leaving T to denote a non-nullable reference is that it seems to create a dialect where the language means two things based on some kind of switch that is enabled somewhere. All existing code will just spew warnings and probably discourage people from adopting the feature. Then again, having to manually opt-into the feature would also likely discourage its use. Damned either way.

+1

dsaf commented 9 years ago

@HaloFour

All existing code will just spew warnings and probably discourage people from adopting the feature.

Not if they opt out. This could be suggested by IDE/compiler.

Then again, having to manually opt-into the feature would also likely discourage its use.

Exactly. I have tried using JetBrains' [NotNull] but it just pollutes everything. Making it single character does not help much. It's the very definition of static typed language "ceremony".

Why would you want to use null anyway (legacy code and integration aside)? The best practice should become the implicit default.

PS: JavaScript managed to cope: http://www.w3schools.com/js/js_strict.asp

mattwar commented 9 years ago

For the null checking feature the default would be not-null. You would use a ? on your type to indicate that it can have null values. There would be no ! syntax.

paulomorgado commented 9 years ago

For the null checking feature the default would be not-null. You would use a ? on your type to indicate that it can have null values. There would be no ! syntax.

-1

HaloFour commented 9 years ago

@dsaf

Not if they opt out.

Thus creating a dialect. The issue #3330 was explicitly closed by @gafter because it proposed adding such options to make T be treated as non-nullable giving the reason, "We're not going to break backward compatibility, and we avoid creating dialects."

Why would you want to use null anyway (legacy code and integration aside)?

Because that is the language semantics over 6 versions, it's what people will expect and what the vast majority of shared knowledge on the subject specifies. I agree that the majority of cases the developer probably does not expect the value to be null and because of this it would require marking more types as non-nullable than vice versa. But we're also talking about a team who decided to rollback a change to add a warning about a situation involving legal but pointless and unexpected code, #663.

PS: JavaScript managed to cope

JavaScript has done a massive number of really stupid things over the years. But it also started out in an exceptionally bad place. Despite all of this, the latest browsers will still happily execute the most asinine JavaScript 1.0.

@mattwar

For the null checking feature the default would be not-null. You would use a ? on your type to indicate that it can have null values. There would be no ! syntax.

I noticed that from the proposal. I worded my preference for the opposite two-type system, although as stated I think both are not without their problems and I'm less strongly entrenched in my position than I probably come across. How does the team intend to reconcile a semantic change of that nature given those other language decisions that have been made?

dsaf commented 9 years ago

@HaloFour This is not directed at you, but I would argue that effectively killing Silverlight (which was used by enterprise among others) is an actual real breaking change and so are many other Microsoft's decisions in the dark 2009-2014 period. Being brave enough to fix an early design mistake while still giving a way for older projects to work is a different proposition.

Despite all of this, the latest browsers will still happily execute the most asinine JavaScript 1.0.

I don't propose that pre-C#7 versions will stop working.

HaloFour commented 9 years ago

@dsaf

Yeah, but unlike Silverlight it appears that MS wants to keep C# around for a little while longer. This isn't a design mistake, it's a paradigm, and a very common one. Either way this doesn't align with what has been stated by members of the team.

I don't propose that pre-C#7 versions will stop working.

No, you propose that pre-C#7 stop compiling, or at least require changing dialects and fixing a lot of new errors.

dsaf commented 9 years ago

@HaloFour

No, you propose that pre-C#7 stop compiling, or at least require changing dialects and fixing a lot of new errors.

Not really. The moment you open a legacy solution in Visual Studio 2015 / MonoDevelop it will ask you whether you want to upgrade to C# 7. If you pick 'Yes' it will then ask whether you want to opt in or out. If you pick 'Opt-out' it will then automatically modify your AssemblyInfo.cs with the corresponding assembly-level attribute. This option will of course be available at any moment later on. This is the approach I had suggested early in the previous discussion and it seems to be considered by the team now:

There are additional compatibility questions around evolution of libraries, but those are somewhat orthogonal: Maybe a library carries an assembly-level attribute saying it has "opted in", and that its unannotated types should be considered non-null.

HaloFour commented 9 years ago

@dsaf

Which depends on a tool to convert between dialects which not everyone uses, and prevents the copy&paste of existing code into the newly wrecked project. Doing this at the file or class makes a more sense, and is exactly what @gafter said would not happen.

dsaf commented 9 years ago

@HaloFour

I cannot see someone going through an effort of downloading a new version of compiler, skipping using mainstream IDEs and then ignoring compiler output and failing to find out which compilation flag to use. Not sure about class-level, but they seem to have a perfect keyword already - unsafe :).

HaloFour commented 9 years ago

@dsaf

I can easily see someone clicking through whatever they're asked accidentally converting their project over and then wondering what the heck is happening when anything that they try to write with the knowledge of the language that they have no longer compiles, or produces a plethora of new warnings. That person will find 13 years of existing C# knowledge on the Internet which will be counter to this behavior. C# 7.0 will simply be broken.

The unsafe keyword is not an example of this having happened. Beyond the obvious "it's been there since v1.0" it also only exposes new language features. unsafe has zero affect on any existing code. Apply it to an existing class or method and the IL generated will be exactly the same until the developer opts into unsafe behavior. The closest thing that could be an analog would be async, which of course only applies at the method level, and would only break copy&pasted code that happened to try to use await as an identifier. The potential for breaks and confusion there was exceptionally low. The potential for breaks and confusion by changing the semantics of all reference types project wide at the flip of a switch is exceptionally high.

mattwar commented 9 years ago

We are a bit behind posting language design notes. And the following is not an excerpt from the notes, just me trying to explain from what I recall.

The current thinking is that there will be an opt-in mode for compilation (the null checking feature won't be on by default). When on, potential null assignments to variables that cannot be null or dereferences of variables that can be null will generate warnings, unless the operations can be determined null safe via flow analysis.

If you upgrade your project to C# 7.0, this feature won't be on until you enable it manually. There will be separate controls to ignore null annotations in libraries in case you were using the feature with libraries that convert to exposing null annotations before you are ready. If libraries don't have null annotations (or you disable them) then the reference types from those assemblies will be treated as old-style references (we cannot draw inferences from them, and will not generate warnings.)

The null checking will only generate warnings, and explicitly null-able reference types will be encoded as custom attributes so signatures stay the same. The purpose of these annotations (and syntax) is to help you declare intent for your API's to pass null values (or not) and to find bugs in your code. The compiler is not expected to prove that nulls cannot exist in non-null reference types (there are many edge cases with static constructors, array allocation and concurrency), and it will not optimize IL based on this.

HaloFour commented 9 years ago

@mattwar As a glorified analyzer I guess that makes sense. It is certainly less disruptive than reversing the notation. I'd probably be a little less apprehensive if we didn't have six versions of C# semantics that will be somewhat altered by this change. Hopefully decent flow analysis will reduce the potential for unexpected warnings. Would that aspect of it be treated as a standard analyzer, complete with code fixes?

dsaf commented 9 years ago

@mattwar It sounds like Code Contracts 2.0 with one attribute that has a one-character language-integrated syntax. Could you consider providing a way of declaring a language-integrated synonym for any attribute? The low guarantee of T! or new T? will be quite misleading and polluting the core language, unless people know it's just a shorthand for an attribute. It should really be a library and/or a custom analyzer instead of a dedicated language syntax. Syntax-level features should be reserved for checks with strong compile-time guarantees.

mattwar commented 9 years ago

It almost seems appropriate to make it just an analyzer with an attribute, but we need the null-ness to flow through the type system in order to do the null checks properly, so we've been thinking about it as part of the language.

bbarry commented 9 years ago

It would be nice if one could hook into the type system in a similar manner to how you would need to in order to do these null checks via analyzers.

I can imagine other useful checks that may be able to be done with such analyzer capability:

dsaf commented 9 years ago

@mattwar

...so we've been thinking about it as part of the language.

Is it being designed as part of the general method contracts feature #119? It seems that they are conceptually related, especially if the guarantees are not strong.

mattwar commented 9 years ago

@dsaf, not yet, but that does make sense that we should consider compatibility with a possible contracts feature.

bkoelman commented 9 years ago

Regarding nullability, https://github.com/bkoelman/ResharperCodeContractNullability can be used today and may be of interest for this discussion. It empowers Resharpers nullability analysis engine by reporting a diagnostic/fix on all applicable members that would benefit from annotation. This way, the user is guided in annotating his/her codebase (and keeping it annotated).

dsaf commented 9 years ago

@bkoelman I tried those and they result in noisy signatures (so called "ceremony") without giving any strong contract guarantees. From my personal perspective I would almost never use nullable arguments, that is why I am always arguing for opt-in nullability.

tomlokhorst commented 9 years ago

@MadsTorgersen: We think that the "three-type" approach is not very helpful, leads to massively rewritten over-adorned code, and is essentially not viable.

As a Swift user (which uses the "three-type" approach), I disagree.

Granted, Swift is a bit different, it's a new language and it has the annotations swapped. T? is used for nullable, T for non-nullable, and T! for unspecified nullability (called an implicitly unwrapped optional). In the C# "three-type" approach, those last two would be swapped, which doesn't look as nice, but otherwise there's no semantic difference.

Also, Swift has explicit syntax for dealing with nullable values, which is nice:

let optionalPerson: Person? = maybeGetPerson();
if let person = optionalPerson {
  // `person` is always set in this if
}

In my Swift app, I use variables with unspecified nullability for dealing with old (UI) frameworks. Those frameworks have weird initialisation where (non-null) fields aren't set in the constructor but later on. In practice, it is great to have the implicitly unwrapped optional (the so called "third type"). Since these older frameworks have been in use for years and there's never any null, it's just not encoded in the types.

As of Swift 2, Apple has gone through most of the existing libraries and has annotated all types with either _Nullable, _Nonnull or _Null_unspecified. Which, I would argue, Microsoft needs to do as well. If the largest body of .NET code I'm dealing with (the base class library), isn't annotated with nullability specifiers, it doesn't help me much to add them to my own code.

Once all the code I'm using has been annotated, the experience using my existing code would be:

string foo = SomeAnnotatedBclMethod();
// ^ Warning: Throwing away nullability information, use `string?` or `string!`

In my opinion, nullability should be a fundamental part of the language. That includes proper interop with existing libraries with unspecified nullability (requiring a "third type"). But that also means you might need to make some difficult decisions, e.g. adding something like Swift's if let syntax for handeling multi-threading and disallowing non-nullability of array elements for types without public default constructors.

Also, I made a picture! https://twitter.com/tomlokhorst/status/633282420259880960

EDIT: To clarify: I'm arguing for the inclusion of the "unspecified nullability" case and for making nullability a fundamental part of the language. Not specifically for choosing the T! syntax for non-nullability, I don't like that either.

In principle, I dislike the idea of having a switch for two different dialects of the language, but it's arguable that it might be needed in this case.

This switch could include a code converter, where this existing C# 6 code:

string foo = GetFoo();
int x = foo.Length; // Might crash at runtime

Gets translated to this C# 7 new-style-syntax:

string! foo = GetFoo();
int x = foo.Length; // Still might crash at runtime, but now the user can clean this up

Incidentally, Swift also includes a (mandatory) code converter. Between Swift 1 and Swift 2, the do-while loop syntax was replaced with repeat-while (because they wanted to use the do keyword elsewhere). When opening a Swift 1 project, the IDE prompts the user to update to new style syntax, because the old syntax doesn't compile anymore.

dsaf commented 9 years ago

@tomlokhorst

...those last two would be swapped, which doesn't look as nice, but otherwise there's no semantic difference...

Well, as the person you quoted said:

...leads to ... over-adorned code...

Things rarely need to be nullable, so ! will be everywhere.

govert commented 9 years ago

My preference would be a switch that puts the language into null-safe mode - i.e. the two-type option where nullable reference types are indicated as string? and where string means a never-null type.

Legacy assembly references consumed in null-safe mode would have a nullness info wrapper generated automatically (either once-off and published, or generated at compile-time) that does static analysis on the compiled IL to wrap the assembly usage in null-safe behaviour where possible. I would guess for a useful proportion of methods, the correct nullness types can be derived. The aim of the nullness info wrapper is to guarantee (within the type system) correctness for as large a portion of code as possible. So legacy method compiled as returning string would be converted to string? by the wrapper, for use in the null-safe mode.

Maybe additional annotation (either via attributes in the legacy assembly, or externally defined) could inform how the wrapper is generated. So if the ReSharper [NotNull] annotation is present, then the wrapper includes a runtime check if static analysis fails.

The compatibility wrappers might work a bit like TypeScript type annotation files.

Code compiled in null-safe mode might target both a null-safe version of the runtime, with proper non-null reference types, and older runtimes through a wrapper assembly that includes runtime null checks. Nullness type safety must be baked into the runtime, with backward compatibility possibly involving a runtime performance cost.

I would prioritize some path that in the long term will take both the runtime and C# to the right place (which to me means the two-type approach, without explicit ! annotations) even if that takes 10 years. The road there would be filled in with tooling, generated wrappers etc. to bridge existing compiled code both ways.

paulomorgado commented 9 years ago

I'm OK with just code contracts and declaration expressions.

OJacot-Descombes commented 8 years ago

We do not really want a new kind of type. We want to be sure that a parameter, field, local variable or return value and so on will never be null when we use it. Therefore in T! x; x is still a T as usual that is basically nullable (x.GetType() == typeof(T)). So ! really means "null-checked" and not "non-nullable". If it is a class field it will be null when the class is created! However, the compiler must ensure that x will not be null when it is used. Therefore the complier must check all the code paths and check that a non-null value has been assigned to it before it is used. Very much like it is checking for unassigned values today.

If a null-checked value is used, we know that we can use it safely without checking for null first. A warning should be issued if we use the null-conditional operator (?.), null-coalesce operator (??) or an explicit check (if(x != null)) on it.

alrz commented 8 years ago

@MadsTorgersen I didn't know how this would be received, so I just write it here as a comment on wire formats. Since JSON is not only a wire format as you mentioned, it's now a successor to XML for configuration and such use cases. My idea is about a JSON-compatible syntax for lists and dictionaries.

var dict = new {
  "list": [ 1, 2, 3 ],
  "key1": "value",
  "key2": {
    "key1": "value",
    "key2": 123
  }
}

The whole expression returns a Dictionary<string,object> (in this case).

rsmckinney commented 8 years ago

How will static types materialize? There's no standard format for JSON typing, so even if .NET/C# were to adopt one, it wouldn't do much good since most JSON is provided without a schema. So I assume the plan is to infer the type from sample JSON like everyone else, right? That being the case, how will C# bootstrap this process? Will you rely on tooling, such as VS, to generate types based on a JSON URI?

If C# were to provide structural typing a la Gosu, you could generate a nesting of structural interfaces mirroring a JSON object and back the interfaces using a late bound dictionary object, probably via dynamic typing already built into the language.

It seems to me a nesting is necessary given the anonymous nature of nodes in a JSON object tree. Otherwise, how will you distinguish between identically named fields having different inferred type structures?

Just thinking out loud here as I also tackle this subject independently.

Cheers.

gafter commented 8 years ago

Design notes have been archived at https://github.com/dotnet/roslyn/blob/future/docs/designNotes/2015-04-22%20C%23%20Design%20Review.md but discussion can continue here.