dotnet / csharplang

The official repo for the design of the C# programming language
11.49k stars 1.02k forks source link

`in` usable as `ref` in older compiler versions causes source break if recompiled with v7.2 #1308

Open tannergooding opened 6 years ago

tannergooding commented 6 years ago

This bug actually requires compiling an assembly with v7.2+ of the compiler, referencing the created assembly from v7.1 or prior, and then moving that assembly forward to v7.2+.

Repro

  1. Using v7.2+ of the compiler, create a new class library
  2. Add both a regular and an in overload for a method

    public class C
    {
    public void M(int i)
    {
    }
    
    public void M(in int i)
    {
    }
    }
  3. Create another class library and ensure it uses a v7.1 or prior compiler A. Downgrading the language version of the compiler will not reproduce this bug B. Referencing the Microsoft.Net.Compilers v2.4.0 NuGet package is probably the easiest way
  4. Set the code of the library to the following

    public class Test
    {
    public void Scenario()
    {
        int i = 0;
        C c = new C();
    
        c.M(i);
        c.M(ref i);
    }
    }
  5. Observe that the code compiles fine A. The first call, c.M(i), calls the void M(int) overload B. The second call, c.M(ref i), calls the void M(in int) overload
  6. Move the class library forward to v7.2 or higher
  7. Observe that the code will not compile A. CS0121 - The call is ambiguous between the following methods or properties: 'C.M(int)' and 'C.M(in int)' B. CS1615 - Argument 1 may not be passed with the 'ref' keyword
tannergooding commented 6 years ago

FYI. @VSadov, @jcouv, @jaredpar

tannergooding commented 6 years ago

This repro's going all the way back to the v2.0 of the native compiler (non-Roslyn). I assume it also repros on v1.1 and v1.0 of the native compiler, but I don't have those installed locally.

The first issue is already resolved for 7.3, as the byval overload will win by default and users will have to specify in to call the other overload.

The second issue is much more complicated.

VSadov commented 6 years ago

‘in’ parameters are not always “poisoned”. That was a deliberate choice so that libraries and callers could be updated in stages. There is always a workaround.

In the case “B”, once caller switches to 7.2, can he use ‘in’ keyword?

tannergooding commented 6 years ago

That probably depends. Some users may expect it to continue working without a workaround required.

It is definitely strange, however, that a new language feature is consumable in a lower version of the compiler and in such a way that it is a breaking change later on (especially given how strict breaking changes normally are).

sharwell commented 6 years ago

I would expect to be able to use ref to pass an argument for an in parameter, since the resulting conversion is narrowing. I wonder if removing CS1615 would be a viable solution to this.

tannergooding commented 6 years ago

I would expect to be able to use ref to pass an argument for an in parameter, since the resulting conversion is narrowing. I wonder if removing CS1615 would be a viable solution to this.

There are other benefits to doing this as well, such as making the transition from ref to in for existing code possible in large frameworks.

Today, that is a binary compatible, but not a source compatible change (provided you are not dealing with interfaces or virtual methods, in which case it adds a modreq and is always breaking). This prevents some frameworks (such as CoreFX) from making the transition even if in is more appropriate.

sharwell commented 4 years ago

If we allow the use of ref to pass an argument to an in parameter, it may become possible to fix the definition of Volatile.Read. Currently if you have an in int parameter, you cannot use Volatile.Read to read from it without going through Unsafe.AsRef.

jaredpar commented 4 years ago

If we were doing in parameters over again I think it's likely we'd allow ref to work for ref or in parameters. The source compatibility case is fairly compelling. The issue though is that we're not starting from a clean slate and hence have to consider the compat implications. For example it would break the following code:

class Widget { }
static class Extensions1{
    public static void M(this Widget w, in int i) { }
}

static class Extensions2 {
    public static void M(this Widget w, ref int i) { }
}

class Program
{
    public static void Main()
    {
        var w = new Widget();
        int i = 0;
        w.M(ref i);
    }
}

In order to allow this and preserve compatibility it would require us to add significant complications to overload resolution.

tannergooding commented 4 years ago

In order to allow this and preserve compatibility it would require us to add significant complications to overload resolution.

This complexity would effectively be preferring the explicit ref version if there are both in and ref choice, correct? (I would guess this would include choosing ref, as a more exact match, if there is an in instance method and a ref extension).

I'd still be in favor of something like this happening eventually as it would reduce the number of places we have to use ref T Unsafe.As(in T) to workaround older APIs that took an ref but treat it as in.

jaredpar commented 4 years ago

This complexity would effectively be preferring the explicit ref version if there are both in and ref choice, correct?

Essentially that is the case. It's a bit more complicated though once you start mixing instance and extension methods here and various combinations of in and ref arguments in the same function call.

I'd still be in favor of something like this happening eventually as it would reduce the number of places we have to use ref T Unsafe.As(in T) to workaround older APIs that took an ref but treat it as in.

I think the only way this is going to move forward if there is a compelling set of existing APIs for which this is a problem. I agree Volatile.Read is the poster child for doing this feature but you could also just add Volatile.ReadEx, add a fixer to change code from Read to ReadEx and it would be significantly cheaper than a language feature.

tannergooding commented 4 years ago

In the framework, we are primarily using Unsafe.AsRef with other unsafe APIs: https://source.dot.net/#System.Private.CoreLib/Unsafe.cs,8abf048672bbc8cd,references

The primary use-cases I can think of are Volatile.Read, Unsafe.Read, MemoryMarshal.CreateReadOnlySpan (none of which are particularly compelling, especially given their low-level nature).

I'll open an API proposal for the above three cases and see what API review thinks about secondary methods that take in.

mattleibow commented 4 years ago

This may not be a typical use case for supporting the ref with the in, but I have an old-ish graphics API that in some places uses a struct. I sometimes have two overloads Op(Matrix) and then Op(ref Matrix). Reasons why, and all that.

But, it would be nice to switch the ref variants to in so that the user can type the nice non-ref versions, but the complier makes sure to do the ref part.

Basically, I want to be able to type:

var result = Matrix.Concat(Matrix.Rotation(10), Matrix.Scale(3));

and then the compiler does the nasty work of:

var rot = Matrix.Rotation(10);
var scale = Matrix.Scale(3);
var result = Matrix.Concat(ref rot, ref scale);

I could just change my code from the ref overloads to the in, but this will cause the issue from the OP.

I understand this is work, but it would be really nice to have even though

it would be significantly cheaper than a language feature

We could potentially change existing code to use in and then get the benefits of ref. I am assuming that we could take some graphics library, change the refs to in and then we keep all the features, but the code can be nicer.

mattleibow commented 4 years ago

I was just thinking about this some more with regards to the issue with ambiguous in vs ref. It is not really too far out there to expect this to work.

Assume:

static class Extensions1 {
    public static void M(this object w, int i) { }
}
static class Extensions2 {
    public static void M(this object w, in int i) { }
}

If I write:

var j = new object();
var t = 3;
j.M(t);

The compiler picks Extensions1. If I slap an in there:

var j = new object();
var t = 3;
j.M(in t);

Now we are talking Extensions2. How hard is it to assume that we may want to add to this and do the same for ref but distinguish between ref and in instead of in and <nothing>?

mattleibow commented 4 years ago

Also, the OP's observation of:

Downgrading the language version of the compiler will not reproduce this bug

sounds like a bug - the language version should be respected, right?

In fact, I think this whole thing is a bug. Here I am building a library using C# 8 because of the power I have :). What if my library is used by a enterprise on C# 7? This is a breaking change for them, and all they did was update? Or, is is that a library now also has some minimum language version to be used?

jaredpar commented 4 years ago

sounds like a bug - the language version should be respected, right?

The language version flag is a heuristic to alert the developer when they're using language features defined above the limit they specify. While the heuristic is very strong and the compiler team goes to lengths to make the flag as robust as possible there are edge cases that will always fall through. The most notable is subtle changes to how overload resolution change between language releases.

The language version flag though does not control how the compiler bind items. It for instance won't revert back to older overload resolution rules, name lookup, etc ...

In fact, I think this whole thing is a bug.

This is not a bug but instead a deliberate design decision.

In retrospect I wish we'd pushed harder and gotten it such that ref could be used for in or ref parameters. But the decision was explicitly against allowing that at the time and this was one of the fall outs of that decision.

What if my library is used by a enterprise on C# 7? This is a breaking change for them, and all they did was update?

Changing a ref parameter to an in parameter is an API breaking change. That's conceptually no different than changing from ref to out.

The more interesting behavior is if the API was defined from day one to be M(in i). In that case a C# 7.0 customer could take a dependency on the API using M(ref someInt). Then at the point the customer moves to 8.0 it would become a break as the compiler would require in in that scenario.

The case for this being a "breaking change" is much stronger. This hits at our core goal of your code should be forward compatible (new compilers don't break your code). At the same time there are cases where consuming C# X+1 features in C# X can lead to breaking changes when you update to C# X+1. Concrete example is the following:

// Library compiled with C# 5
public class Util {
  public static async Task M() { throw null; }
}

// App compiled with C# 4
static void Repro() {
  Util.M();
}

In this case the App will compile cleanly with C# 4 but once you upgrade to C# 5 it will get a warning.(CS4014). These cases do exist but they're very rare and typically occur because we can't find a better way of making the new scenario. But yes they do classify as breaking changes.

But, it would be nice to switch the ref variants to in so that the user can type the nice non-ref versions, but the complier makes sure to do the ref part.

The language team explicitly does not want to allow this. The downside of silently passing a struct by ref (inviting unintended mutability) is not outweighed by the requirement of making ref explicit. The C# 1.0 team made this decision and it was re-affirmed when we discussed whether or not to allow in to be implicit.

tannergooding commented 4 years ago

I think the customer/library author issue here is that there are a number of APIs that were added which use ref but they don't mutate the reference. They exist like this because they were added before in existed. Functionally speaking, ref -> in could be a "narrowing" conversion (in the same way Span<T> can implicitly convert to ReadOnlySpan<T>, because the functionality is just a subset).

There are a number of features only available with in (such as being able to pass a static readonly field by reference without copying) and so customers would like to be able to take their existing APIs, switch them to be in (which is currently a binary compatible, but source breaking change in most scenarios -- outside the places where modreq is inserted, like virtual members).

On the other hand, there are "downsides" with in (such as allowing rvalues which may cause a hidden copy) and there are now compat issues to consider, so a compelling set of APIs or use cases need to be provided to justify the cost.

Is that an accurate summary @jaredpar ?

jaredpar commented 4 years ago

@tannergooding

Functionally speaking, ref -> in could be a "narrowing" conversion (in the same way Span can implicitly convert to ReadOnlySpan, because the functionality is just a subset).

That doesn't really help the compat case though. Yes it fixes the specific example I had above but doesn't fix other cases. The compat problem can only really be solved by special casing in and ref here, not be relying on existing conversion tricks.

There are a number of features only available with in (such as being able to pass a static readonly field by reference without copying) and so customers would like to be able to take their existing APIs, switch them to be in

Understood but this is also not a complete description. Assuming we allowed ref to bind to in then only APIs which were non-virtual could be changed in a compatible fashion. APIs which are virtual can't ever be changed in a compat fashion due to the modreq's we attach (for type system integrity reasons). That limits the usefulness of the language work.

a compelling set of APIs or use cases need to be provided to justify the cost.

Yep that's the most interesting piece of data at this point. Essentially APIs which would benefit from changing a parameter from ref to in but also