dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.69k stars 3.98k forks source link

Proposal: Multiple Source Name Lookup (subsumes static extension methods) #15750

Closed qwertie closed 7 years ago

qwertie commented 7 years ago

Motivation

For the last several years it has been an irritation in my LoycCore project that I cannot write things like Math.Min(x, y) - I must write System.Math.Min(x, y) instead. Why? Because one of the namespaces in my project is Loyc.Math, and inside namespace Loyc the compiler assumes that Math refers to the namespace and not the type (even though, in an expression of the form X.Y(), X cannot possibly refer to a namespace!)

Then, yesterday, I tried to split a large assembly in half: lower-level features in one assembly and extra features in an higher-level assembly that some clients will not use. As part of that I wanted to split a static class in half. In fact I must split the static class, because some features refer to types that only exist in the higher-level assembly, yet the lower-level assembly uses other methods in the same type. But I do not want to invent a new name for the "higher-level" static class; I very much want to use the same name - preferably in the same namespace - because there is no "better" name than the one it has already. Besides, I don't want to break source-level compatibility with existing clients.

Meanwhile, many people have asked for "static extension methods". Can we solve this problem at the same time? I think so!

Proposal: Multiple-Source Name Lookup (MSNL)

1. Class lookup as fallback after namespace lookup

Currently C# prioritizes namespaces over classes, so Math.Min(x, y) fails to resolve if there is a namespace called Math. I propose that if the namespace-based lookup fails, that the compiler try looking for types by the same name afterward. Perhaps this should involve backtracking, because in an expression like X.Y.Z (where X is a namespace), the lookup of X.Y might succeed while X.Y.Z fails. But if we then find a class called X, the resolution might succeed in its entirety.

2. Multiple-type name lookup

By itself, this class is legal in C# 3.0:

namespace Extensions {
    public static class String {
        public static string Join(IEnumerable things, string delimiter)
        {
            StringBuilder sb = new StringBuilder();
            bool first = true;
            foreach (object thing in things)
            {
                if (!first)
                    sb.Append(delimiter);
                first = false;
                if (thing != null)
                    sb.Append(thing.ToString());
            }
            return sb.ToString();
        }
    }
}

But actually using it is another matter entirely. After you do

using System;
using Extensions;

Referring to String gives the error "'String' is an ambiguous reference between 'Extensions.String' and 'string'".

The first problem to solve is this:

String hello = "hello"; // "ambiguous reference"

This ambiguity can be resolved by observing that Extensions.String is a static class and cannot possibly be the type of hello.

Secondly, a lookup procedure should be added to resolve other ambiguity errors, so that the following code works:

void TwoJoins()
{
    // Currently an error; should resolve to MSNL.String.Join()
    string join1 = String.Join(new List<string> { "A", "B", "C" }, ",");
    // Currently an error; should resolve to string.Join()
    string join2 = String.Join(",", new string[] { "A", "B", "C" });
}

Notice that the order of parameters is reversed in Extensions.String compared to String.Join. This is deliberate, because the lookup procedure is designed to prevent "hijacking" (see below).

Both of these calls can already resolve successfully in C#, but only in special cases. All three of the following examples compile today:

using System;
using Extensions;
namespace Extensions {
  class Foo {
    void Method() {
      String.Join(new[] { 1, 2, 3 }, ",");
    }
  }
}
namespace System {
  class Foo {
    void Method() {
      String.Join(",", new[] { 1, 2, 3 });
    }
  }
}
namespace Other {
  using Extensions;
  class Foo {
    void Method() {
      String.Join(new[] { 1, 2, 3 }, ",");
    }
  }
}

If I understand correctly, C# has a concept of priority, where inner usings (and the name of the namespace in which they are located) have higher priority than outer usings.

My proposal would apply to classes that are seen as having "equal" priority.

I call it "multiple-type name lookup", and the way it would work is that, when multiple types with the same name are in scope, the compiler resolves the member after the "." separately in both classes. In this example it would try to find the best applicable overload for System.String.Join(...) and separately search for the best overload of Extensions.String.Join(...). Overall, overload resolution succeeds if it succeeds in exactly one of the classes found.

Note: In part 1 of the proposal I suggested "backtracking" from namespaces to classes. Without backtracking, Foo.Property.X would fail when Property exists in two different Foo classes, even if X exists in only one of the properties. From an implementation perspective, the decision of whether to backtrack in that case seems orthogonal to this case; but users may find it confusing if one part of MSNL uses backtracking and another part does not.

Note: This proposal would not let you write string.Join(new ArrayList(), ",") since string specifically refers to System.String.

Multiple-type name lookup should also be used for new expressions like new String(charArray). Since static classes conceptually have no constructors, this new expression should resolve to the non-static System.String.

Note: This proposal does not allow "extension constructors": if Extensions.String is changed to non-static, you could create a constructor like String(double) and call it with new String(1.5), but this object would not be a string. Plus, String x = new String(1.5) would fail because the first reference to String would become ambiguous.

The hijacking problem

The anti-hijacking rule is that a set of arguments should not match two methods in two different classes (even if one match is clearly "better" under the normal rules of C# overload resolution). The purpose of the rule is to avoid accidents in which new code (perhaps written by a third party, perhaps not) silently "hijacks" a function call. For example, imagine that we have a function "Foo.Lish" and we're trying to call it:

using NS1;
using NS2;
namespace NS1 {
    static class Foo {
        public static void Lish(IEnumerable<char> x) {}
    }
}
class Caller {
  void Call() { Foo.Lish("Hello!"); } // OK
}

All is well. But now, suppose that there is another "Foo" class elsewhere. Another developer, who may be unaware of NS1.Foo, decides to make his own Foo.Lish() method:

namespace NS2 {
    class Foo {
        public static void Lish(string x) {}
    }
}

Without an anti-hijacking rule, the code in Call() that used to call NS1.Foo.Lish() would silently start calling NS2.Foo.Lish() instead. This is dangerous if no one finds out that the meaning of the code has changed.

Anti-hijacking (which is built into the resolution procedure decribed above) is a safety feature to prevent this kind of silent change to a program's semantics. Once an error has been issued, the user can fix the problem by adding qualification (NS1.Foo.List()).

One could argue that the anti-hijacking rule should be suspended if the two classes exist in the same namespace. Consider this:

namespace System
{
    static class String 
    {
        public static string Join(string delimiter, IEnumerable things) { ... }
    }
}

Given that this new String class exists in the same namespace as the System.String in mscorlib, the author is probably aware of the existing class and really wants their Join method to participate in overload resolution with the original System.String - even at the risk of silently changing the meaning of existing code. But this should be discouraged, since the expression typeof(global::System.String) now becomes ambiguous!

Instead, we could define an attribute to resolve the ambiguity:

namespace Extensions
{
    [StaticExtensions(typeof(string))]
    static class String 
    {
        public static string Join(string delimiter, IEnumerable things) { ... }
    }
}

Perhaps StaticExtensionsAttribute should not allow hijacking (i.e. merging overload resolution in System.String and Extensions.String). If not, its purpose would be to resolve the conflict in a case like String.Join(",", new[] { "a", "b" }), where the two separate overload resolutions in System.String and Extensions.String both succeed. Then [StaticExtensions(typeof(string))] would mean "in case of conflict, the member from string is preferred".

This attribute would not necessarily allow you to write string.Join(",", (IEnumerable)e). To me it is not important whether it does or not. If not, then [StaticExtensions] would have no effect when the two class names do not match. Finally, remember that this attribute would not be required, in general, for multiple-type name lookup to work.

3. Optional feature: field/class/namespace conflict resolution

C# already allows you to write this:

public Class Class;
public void Method()
{
    Class.StaticMember(Class.NonStaticProperty);
}

This seems to be special-cased in the compiler, because this similar scenario is illegal:

public IEnumerable<object> Enumerable { get; set; }
public void Method()
{
    var r = Enumerable.Repeat(Enumerable.First(), 100); // `Repeat` fails to resolve
}

and static members normally can't be reached through a field reference:

public Class Class2;
public void Method()
{
    // `StaticMember` cannot be accessed with an instance reference
    Class2.StaticMember(Class2.NonStaticProperty);
}

It's also worth noting that this code is currently illegal:

public string System;
public void Method()
{
    // 'string' does not contain a definition for 'Console'
    System.Console.WriteLine("Hello");
}

I would propose, then, to eliminate that existing special case and replace it with another. When a name appears to refer to a field or property, but lookup fails within that field or property, ignore the field and perform the lookup again. Again, I suspect that this feature would be best implemented with a backtracking approach.

iam3yal commented 7 years ago

@qwertie

For the last several years it has been an irritation in my LoycCore project that I cannot write things like Math.Min(x, y) - I must write System.Math.Min(x, y) instead. Why? Because one of the namespaces in my project is Loyc.Math

You can solve it in two ways:

A. Define an alias like so using SysMath = System.Math;.

B. You can use the new using static statement available since C# 6.0 like so using static System.Math this will import all the static functions so you won't even need to write Math.Min(...) but just Min(...).

Meanwhile, many people have asked for "static extension methods". Can we solve this problem at the same time? I think so!

Check this out #11159 --Extension Everything

qwertie commented 7 years ago

@eyalsk I know. There's a third solution: move my using statements (or just using System) inside the namespace Loyc.Whatever {} block. But since I only occasionally use the Math functions - maybe twice in a given source file - I generally find it easier to type System.Math than to interrupt my coding to visit the top of the file. But it is odd to me that adding an empty namespace called Math or Trace or Console - which are not unreasonable names for namespaces - will break a great many projects.

iam3yal commented 7 years ago

@qwertie

But it is odd to me that adding an empty namespace called Math or Trace or Console - which are not unreasonable names for namespaces - will break a great many projects.

Eric Lippert wrote some articles about it:

Do not name a class the same as its namespace, Part One Do not name a class the same as its namespace, Part Two Do not name a class the same as its namespace, Part Three Do not name a class the same as its namespace, Part Four

Also related:

No backtracking, Part One No Backtracking, Part Two

Just as an advice before you make proposals take the time to read some articles that Eric Lippert wrote or some other members of the design team that work on Roslyn/C# today, as well as follow valuable community members like Jon Skeet.

It can give you some insights to how the design team work, what decisions they take and why certain features aren't implemented by design or were declined by design, what things are non-goals etc...

Finally, you can add these RSS links to your RSS reader, there's many interesting articles on these blogs:

  1. https://blogs.msdn.microsoft.com/dotnet/feed/
  2. https://blogs.msdn.microsoft.com/ericlippert/feed/
  3. https://blogs.msdn.microsoft.com/lucian/feed/
  4. https://ericlippert.com/feed/
  5. http://gafter.blogspot.com/feeds/posts/default
  6. http://blog.paranoidcoding.com/atom
  7. https://codeblog.jonskeet.uk/feed/

p.s. Eric Lippert no longer part of the design team and no longer works at Microsoft but many of his posts are never dated!

qwertie commented 7 years ago

@eyalsk If you prefer to read this proposal as one that doesn't backtrack but merely iterates, that's fine by me. I want this MSNL very much, and if it is restricted to a single level of "lookahead" so that X.Y.Z quits searching after finding a X.Y, it would still have most of the benefit of full backtracking.

Anyway, since I knew C# already had (recursive) backtracking for lambdas - which in pathological cases can be very expensive - it seems clear that the C# team will consider backtracking if the benefits are large enough. Eric Lippert's post "No Backtracking, Part Two" considers the costs of backtracking name resolution without mentioning any substantial benefit. If a feature's costs is known and its benefits are unknown (Edit: and believed to be low) then rejecting it is the correct course of action. Neither Eric nor any of the people replying to his post appears to have noticed that this feature would provide (1) 'static extension methods' without additional syntax, (2) the ability to split static classes across assemblies and (3) compatibility between third-party static classes that are unaware of each other but happen to have the same name.

Also on that post, Eamon Nerbonne points out that backtracking is not the only approach to implementation:

It's all a matter of perspective: C# performs extensive backtracking all over the place. For example; overload resolution is equivalent to a form of (non-recursive) backtracking: when you see a method name, you could optimistically resolve the first match, attempt to compile further and upon failure rollback and try another match, typical backtracking. In practice, the decision to resolve the method name is postponed until the arguments to the call have been analyzed. Why do you choose to think about that form of backtracking as simply a set of possible matches from which the most specific wins (if unique) yet choose to think of other problems as backtracking even though they too could be implemented without backtracking?

Backtracking is an implementation detail. What you're talking about is essentially ambiguity that cannot be resolved without extra information. Backtracking is a poor way of doing so; using lookahead or sets is clearly superior in all but the simplest of situations.

For instance, the compiler could resolve your example DEF.GHI.JKL by maintaining and processing sets of matching symbols rather than single symbols, and only give an error if the final symbol is ambiguous; if memoized, that approach will scale reasonably even in complex cases, even without memoization it's not as bad as backtracking. That feature may not be desirable, but the implementation no longer involves backtracking. Also, as you say, the language's parser uses lookahead – which is also equivalent to backtracking, yet implemented without it.

Indeed, I suspect that a non-backtracking approach might end up working better in an IDE context - even if the algorithm selected ends up being equivalent to backtracking.

qwertie commented 7 years ago

But it is odd to me that adding an empty namespace called Math or Trace or Console - which are not unreasonable names for namespaces - will break a great many projects.

Eric Lippert wrote some articles about it:

Do not name a class the same as its namespace, Part One

Also, I was not thinking that these namespaces should contain a class with the same name as the namespace - although it seems like, for the most part, the reasons given in Part One and Part Two are both based on the fact that feature I'm proposing here does not exist. And consider that the reason in part 3 doesn't always apply:

The point of namespaces is to organize types into a hierarchy that is easy to understand.

That is, the point of namespaces is not just to keep similarly-named things separated, but rather, to group things that have something in common together so that you can find them. If you don’t think that there are two or more things that could go into a namespace, then it is probably not a good namespace.

I made a namespace called LLParserGenerator to group together the many classes of my LL(k) parser generator. But one of those classes - the main one that coordinates all the others - is naturally enough named LLParserGenerator. But I digress.

Thanks for the blog links @eyalsk. I've been meaning to find a good RSS aggregator, any recommendations? Some RSS feeds contain really crappy or truncated renditions of their blog so I've been wanting to find something that could also read certain sites by screen-scraping...

qwertie commented 7 years ago

Hmm, I definitely like the proposal in #11159 - it doesn't do anything for third-party libraries that are unaware of each other, but at least it would solve my most pressing problem. It looks like I'd have to write something like

public static extension class MyStaticClass2 : MyStaticClass { ... }

and introduce a useless extra name for my class, but whatever. Luckily it was proposed by @gafter so I expect it has a real shot in C# 8. :) I may as well close my proposal.

iam3yal commented 7 years ago

@qwertie

considers the costs of backtracking name resolution without mentioning any substantial benefit. If a feature's costs is known and its benefits are unknown (Edit: and believed to be low) then rejecting it is the correct course of action.

Well, just because he didn't write about this substantial benefit doesn't mean he didn't know about it or people didn't notice the possibility but maybe he didn't thought it's useful enough to include it in his post.

Neither Eric nor any of the people replying to his post appears to have noticed that this feature would provide (1) 'static extension methods' without additional syntax, (2) the ability to split static classes across assemblies and (3) compatibility between third-party static classes that are unaware of each other but happen to have the same name.

Maybe they did notice it and there are other implications with it that opens a whole can of worms that you don't know about? I mean this is certainly reasonable, you make the same assumption about them so this is just a different side of the coin.

Also on that post, Eamon Nerbonne points out that backtracking is not the only approach to implementation:

All roads lead to Rome so in this case the path you take is meaningless, the point is that the idea of backtracking can confuse people because it might not always be obvious what was resolved.

Thanks for the blog links @eyalsk. I've been meaning to find a good RSS aggregator, any recommendations? Some RSS feeds contain really crappy or truncated renditions of their blog so I've been wanting to find something that could also read certain sites by screen-scraping...

I'm using QuiteRSS for a very long time but it doesn't support screen-scraping, at least I don't think it does. 😄

qwertie commented 7 years ago

the point is that the idea of backtracking can confuse people

It seems I got a little confused and imagined that backtracking was to be avoided for performance. Sorry about that. But most people code in Visual Studio, which has multiple features to help humans to not be confused. (Edit: hold on, how could one claim that my proposal is more confusing than extension methods, let alone "extension everything"? At least my proposal has the virtue of putting 'static extension methods' in a class with the actual name that people will use to refer to it. To me, writing StaticClass.Foo() which actually turns out to mean ExtensionClass.Foo() is more, not less, confusing.)

you make the same assumption about them so this is just a different side of the coin

I had the impression you were citing those blog posts as reasons why my proposal should not or would not be implemented - I'm just arguing that the content of Eric's posts doesn't support such a conclusion. I do not need to prove that Eric was not thinking a certain thing. I will say this: his posts tend to have a lot of detail. I'd be surprised if he was aware of the benefits of something like this proposal and chose not to mention them.

iam3yal commented 7 years ago

@qwertie

(Edit: hold on, how could one claim that my proposal is more confusing than extension methods, let alone "extension everything"? At least my proposal has the virtue of putting 'static extension methods' in a class with the actual name that people will use to refer to it. To me, writing StaticClass.Foo() which actually turns out to mean ExtensionClass.Foo() is more, not less, confusing.)

Extension everything solves more than just the problem of extending existing static classes so the benefit here just worth it.

Sometimes you have an intuition for how things work, backtracking makes you think even more about it because you have more cases to think about and for complex cases you might lose track and it might be harder to reason about the code, especially for the new guy that needs to maintain it.

You're also asking to relax some rules that might break some assumptions for some people and finally this may or may not have impact on performance.

I had the impression you were citing those blog posts as reasons why my proposal should not or would not be implemented

Not at all, I just advised you to check these blogs so you would get more information in the hope that they might be useful to you and help you get some answers.

I'll just tell you this when a proposal is marked with Discussion it means chances are extremely slim for anything to happen, however, the discussion might lead a new issue that will get tag with one of the Workflow tags as described here.

I do not need to prove that Eric was not thinking a certain thing.

I didn't say you need to prove anything to anyone. 😉

I'd be surprised if he was aware of the benefits of something like this proposal and chose not to mention them.

I wouldn't be so surprised because I don't think that when you discuss details about one thing whether it's naming or backtracking you don't go ahead and say "Hey! this can also allow us to implement static extension methods" because this can take the focus out of the point he's trying to make with these posts.

They do so many design discussions and think about design many hours a day, every single day, I have hard time to believe no one ever thought about it, if you did, chances are that someone thought about it too.