dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.96k stars 4.02k forks source link

Enhanced C#: a friendly hello #11324

Closed qwertie closed 7 years ago

qwertie commented 8 years ago

I'm terribly embarrassed.

For the last few months I've been working on a tool called LeMP that adds new features to C#. I recently published its "macro" reference manual. This month I was going to start publicizing my "Enhanced C#" project when I discovered that the design of C# 7 had already started well before C# 6 was officially released - and even more shocking, that this design work was being done "in public" right on GitHub!

It kills me that I didn't realize I could have participated in this process, and that "my" C# was drifting apart from C# 7 for over a year. Oh well - it is what it is, and I hope that something useful can still be salvaged out of my work.

So, this post is to inform you about Enhanced C# - where it came from, and what it offers that C# 7 does not.

A brief history

As a class project in my final year of university, I extended a compiler with a new feature (unit type inference with implicit polymorphism), but (to make a short story shorter) the authors of the language weren't interested in adding that feature to their language. This got me thinking about our "benevolent dictatorship" model of language development and how it stopped me, as a developer, from making improvements to the languages I relied on. Since I had already been coding for 15 years by that time, I was getting quite annoyed about writing boilerplate, and finding bugs at runtime that a "sufficiently smart compiler" could have found given a better type system.

So in 2007 I thought of a concept for a compiler called "Loyc" - Language of your choice - in which I wanted to create the magical ability to compile different languages with a single compiler, and also allow users to add syntax and semantics to existing languages. This system would democratize language design, by allowing third parties to add features to existing languages, and allowing language prototypes and DSLs to seamlessly interoperate with "grown up" languages like C#. But my ideas proved too hard to flesh out. I wanted to be able to combine unrelated language extensions written by different people and have them "just work together", but that's easier said than done.

After a couple years I got discouraged and gave up awhile (instead I worked on data structures (alt link), among other things), but in 2012 I changed course with a project that I thought would be easier and more fun: enhancing C# with all the features I thought it ought to have. I simply called it Enhanced C#. It started as a simple and very, very long wish list, with a quick design sketch of each new feature. Having done that I reviewed all the feature requests on UserVoice and noticed a big gaping hole: I hadn't satisfied one of the most popular requests, "INotifyPropertyChanged". So at that point I finally went out and spent three weeks learning about LISP (as I should have done years ago), and some time learning about Nemerle macros. At that point (Oct. 2012) I quickly refocused my plans around a macro processor and called it EC# 2.0, even though 1.0 was never written. I realized that many of the features I wanted in C# could be accomplished with macros (and that a macro processor doesn't require a full compiler, which was nice since I didn't have one) so the macro processor became my first priority.

So "Loyc", I eventually decided, would not be a compiler anymore, but just a loose collection of concepts and libraries related to (i) interoperability, (ii) conversions between programming languages, (iii) parsing and other compiler technology, which I now call the "Loyc initiative"; I've had trouble articulating the theme of it... today I'll say the theme of Loyc is "code that applies to multiple languages", because I want to (1) write tools that are embedded in compilers for multiple langauges, and (2) enable people, especially library authors, to write one piece of code that cross-compiles into many langauges. One guy wants to call it acmeism but that doesn't seem like the right name - I'd call it, I dunno, multiglotism or simply, well, loyc.

EC# and Roslyn

Roslyn's timing didn't work out for me. When I conceived EC#, Roslyn was closed source. I researched it a bit and found that it would only be useful for analysis tasks - not to change C# in any way. That wasn't so bad; but I wanted to explore "radical" ideas, which might be difficult if I had to do things the "Roslyn way". That said, I was inspired by Roslyn; for instance the original implementation of "Loyc trees" - the AST of EC# - was a home-grown Red-Green tree, although I found my mutable syntax trees to be inconvenient in practice (probably I didn't design them right the first time) and rewrote them as green-trees-only (immutable - I thought I might rewrite the "red" part later, but I got used to working with immutable trees and now I don't feel a strong need for mutable ones.)

By the time MS announced they were open-sourcing Roslyn (April 2014), I had been working on Enhanced C# and related projects (LLLPG, Loyc trees and LES) for well over a year, and by that point I felt I had gone too far down my own path to consider trying to build on top of Roslyn (today I wish I could have Roslyn as a back-end, but I don't think I have time, nor a volunteer willing to work on it).

LeMP

EC# still is not a "compiler" in the traditonal sense, but it's still useful and usable as-is thanks to its key feature, the Lexical Macro Processor, or LeMP for short. It is typically used as a Visual Studio extension, but is also available as a command-line tool and a Linux-compatible GUI.

Through macros, I implemented (in the past few months) several of the features that you guys have been discussing for more than a year:

(They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros and because I'm just one guy.)

It also has numerous other features:

The other parts of EC# that exist - the parser and "pretty printer" - support some interesting additional features such as symbols, triple-quoted string literals, attributes on any expression, etc. However, the majority of the syntactic differences between EC# and C# 6 are designed to support the macro processor.

An important theoretical innovation of Enhanced C# is the use of simple syntax trees internally, vaguely like LISP. This is intended to make it easier to (1) convert code between programming languages and (2) to communicate syntax trees compactly.

What now?

Well, I'm not 100% decided about what to do now, knowing that the C# open design process exists and that C# 7 is shaping up to be really nice.

I don't intend to throw the whole thing away, especially since there are major use cases for EC# that C# 7 doesn't address. So in the coming weeks I will change the pattern matching syntax to that planned for C# 7, implement the new syntax for tuple types (minus named parameters, which cannot be well-supported in a lexical macro), and add those "record class" thingies (even though I don't think the C# team has taken the right approach on those.)

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

In fact, those are far from my only options - I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs. And I'd love to make the world's most widely useful programming language (which is not EC#, because we all know how hard it is to improve a language given the backward compatibility constraint). The main reasons to keep going with EC# are (1) that I have a large codebase already written, and (2) that after 8 years alone I finally have a volunteer that wants to help build it (hi @jonathanvdc!)

I do suspect (hope) there are some developers that would find value in EC# as a "desugaring" compiler that converts much of C# 7 to C# 5. Plus, LeMP is a neat tool for reducing boilerplate, "code find and replace" operations, and metaprogramming, so I really want to polish it up enough that I finally win some users.

There is so much more I could say, would have liked to say, and would still like to say to the C# design team... but in case this is the first you've heard of Enhanced C# or LeMP, you might find this to be a lot to take in - just like for me, C# 7 was a lot to take in! So I'll avoid rambling much longer. I hope that, in time, I can win your respect and that you will not "write me off" in a sentence or two, or without saying a word, an eventuality I have learned to emotionally brace for. I definitely have some opinions that would be opposed by the usual commentators here - but on the other hand, I think the new C# 7 features are mostly really nice and I'll be glad to have them.

So if this wasn't TLDR enough for you, I hope you'll enjoy learning about EC# - think of it as how C# 7 might have looked in a parallel universe.

Links:

CyrusNajmabadi commented 7 years ago

I'm trying to wrap my head around what the actual syntax for your macros are. How does the parser find them? What does it do with them? In the code you presented, you have:

    ImplementNotifyPropertyChanged
    {
        public string CustomerName { get; set; }
        public object AdditionalData { get; set; }
        public string CompanyName { get; set; }
        public string PhoneNumber { get; set; }
    }

So presumably a macro usage is, what:

Macro:
     Identifier       {            MacroElement_list       }

?

If that's the case, how does the parser know what it can parse within the braces?

I still don't see how debugging, navigation, refactoring, etc, work with any of this... For example, any code that referenced "FieldName" would be broken in any sort of refactoring scenario. Can you help clarify how this would work?

CyrusNajmabadi commented 7 years ago

Also, you've given an example of your macros for declaration level constructs, can you show it for statement/expression level constructs. For example, you mentioned:

Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;

Can you show how you did that?

qwertie commented 7 years ago

I'm trying to wrap my head around what the actual syntax for your macros are.

Sorry if I wasn't clear. In my system, syntax is completely orthogonal to the macro system; the parser knows nothing about macros, and the macro processor knows nothing about syntax (in fact, it knows nothing about C#).

The parser produces a programming-language-independent tree called a Loyc tree, and the macro processor is looking at the target of every "call" in that tree ("calls" include both method calls and everything else except identifiers and literals.) A macro can target any call to an identifier (or any plain identifier). So a macro can target methods, classes, properties, calls, constructors, variables, multiplications, or almost anything else. The only thing macros can't target is literals, or "everything".

Obviously a macro system implemented on Roslyn would have to be somewhat different.

qwertie commented 7 years ago

For example, you mentioned:

Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;

Once you understand the orthogonality of macros to the programming language in which they are used, you can see that there is no difference between declaration-level macros, statement-level macros and expression-level macros.

Supporting "out variables in-situ" was very hard to do, by the way, because C# doesn't let you write sequences like int x; int.Parse(s, out x) as an expression.

Therefore, creating ‘out’ variables in-situ required what is essentially an entire compiler pass, implemented as a 605-line macro written in EC#, that eliminates 'sequence expressions' and variable declarations in expressions. Its output in this case is

int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;
int? Parse(string s) { int x; return int.Parse(s, out x) ? (int?) x : null; }

Note that writing out var x doesn't work, since var x; becomes a compiler error.

To invoke this "compiler pass" one can invoke #useSequenceExpressions at the top of the source file; but as a shortcut one can write #ecs; which enables all EC# features that require a macro at the top of the file (there are currently two such macros).

vladd commented 7 years ago

@qwertie Well, if macros really know nothing about the language, how would a macro processor distinguish between a code line and a comment? Isn't it forced to use just text replacement with all its awkwardness (akin to C preprocessor)?

qwertie commented 7 years ago

@vladd the macro processor is given a language-independent syntax tree. That tree may have come out of the EC# parser, or some other parser.

@CyrusNajmabadi

I still don't see how debugging, navigation, refactoring, etc, work with any of this... For example, any code that referenced "FieldName" would be broken in any sort of refactoring scenario. Can you help clarify how this would work?

We've talked about this. Renaming the field would have to either fail or force expansion of the macro that generated it... I just considered the latter problem briefly and the problem of limiting how much expansion occurs (avoiding expanding all/most macros in the file) is a bit vexing too.

Let's take stock of my current thinking about "what if EC#-style macros were redesigned for Roslyn?", keeping in mind that I don't know much about Roslyn internals. I don't see how I can give firm answers on some of these issues without building some kind of prototype.

The EC# macro processor doesn't currently have a concept of attribute macros, which is a performance issue (e.g. contract macros have to scan every method signature for contracts even though perhaps no contracts exist anywhere) and also a composability issue (two macros that recognize two different attributes on methods pretty much have to be specifically designed to get along or else it's not possible to combine them on a single method.) I originally planned (but did not implement) a syntax [[foo]] which resembles an attribute but is meant to call macros; however, such a feature doesn't solve a certain problem with contract attributes that I won't get into, because this is getting long.

My first wedding anniversary just started, so TTYL :)

qwertie commented 7 years ago

Btw there are some things that would greatly improve the experience of using EC# in Visual Studio. Where should I go for help?

CyrusNajmabadi commented 7 years ago

Sorry if I wasn't clear. In my system, syntax is completely orthogonal to the macro system; the parser knows nothing about macros, and the macro processor knows nothing about syntax (in fact, it knows nothing about C#). ... The parser produces a programming-language-independent tree called a Loyc tree, and the macro processor is looking at the target of every "call" in that tree

Sorry, this is getting more confusing. Again, how would the parser actually parse:

ImplementNotifyPropertyChanged
    {
        public string CustomerName { get; set; }
        public object AdditionalData { get; set; }
        public string CompanyName { get; set; }
        public string PhoneNumber { get; set; }
    }

If the parser knows nothing about macros, then the above code would end up with very broken syntax. It certainly wouldn't create 'call' nodes that a macro processor could handle.

CyrusNajmabadi commented 7 years ago

Supporting "out variables in-situ" was very hard to do, by the way, because C# doesn't let you write sequences like int x; int.Parse(s, out x) as an expression.

I still don't know how you even got pass the parsing phase. How did you parse the actual code that contains an 'out var'. i don't care how you transformed it. I just care how you actually parsed it. You said that your system required adding no new syntax. So i'm trying to wrap my head around how you handled a construct which is, by its very nature, new syntax :)

CyrusNajmabadi commented 7 years ago

Mitigations: There could be two expansion modes, "IDE mode" for speed and "build mode" for the "real" expansion.

This violates a core design goal of roslyn. That the experience you get in the IDE is the same as you get when you build. We want to ensure that you never experience a situation where the IDE tells you one thing, but the build tells you another. If Macros make that unfeasible, then it would have to provide an absolutely enormous amount of value to make them worthwhile.

CyrusNajmabadi commented 7 years ago

Btw there are some things that would greatly improve the experience of using EC# in Visual Studio. Where should I go for help?

You're treading entirely new ground. You can stay here for help. I'll try the best i can, but what you're asking for may take an enormous amount of work**

--

** Which is what i was saying before. As we tried going down this path ourselves, we realized it would be many devs spread over many teams in order to accomplish this. This is not a small amount of work. It needs design and resources spread over the entire product. And it needs a huge amount of buy-in in order to get the minimal-viable-product developed.

CyrusNajmabadi commented 7 years ago

Debugging: I don't know, lots of issues to consider, but I'm optimistic that a fairly good experience debugging the original code is possible much of the time. If debugging proves difficult, the user should have a way to debug expanded code

You've now made the design space much larger. Saying things like "should have a way" effectively means we need to design and cost precisely that solution. If that solution is necessary for a "minimum viable product" then that has to be factored in. If that solution requires work from other teams (like the debugger team), then that has to be established up front so we can know if we can get all work approved before starting anything.

CyrusNajmabadi commented 7 years ago

That tree may have come out of the EC# parser, or some other parser.

Presumably for Roslyn it would come from the Roslyn parser. So i ask again, what is the syntax for Macros that Roslyn would have to recognize and parse in C# code (we'll get to VB later)?

CyrusNajmabadi commented 7 years ago

Navigation: no doubt I haven't considered all the issues in this area.... anyway, IDE mode implies slow macros leave out some of their output, so "Find All References" could miss references appearing in expanded code.

This would be very concerning. A core value proposition of Roslyn is that is enables the types of accurate features that people can depend on. If that value prop goes out when people use Macros, then it very much undermines core principles and values that we're trying to deliver and that people expect to have.

We do not introduce new language feature without strongly considering the experience they will have in the IDE experience. Indeed, that's exactly the role i serve on the language design team. My primary purpose there is to ensure that we can introduce new features in a manner whereby they serve both language goals and IDE goals. Right now the Macros, as you've described them, come with an enormous number of 'take backs' in terms of the bar we've set for C#/VB. We would have to either resolve those issues, or decide that macros were worth lowering our bar. Both of these seem difficult :)

CyrusNajmabadi commented 7 years ago

A pleasant macro system would require a number of syntax changes similar to those in EC#. I'd suggest especially: merging the syntax of top-level statements, declaration statements, executable statements and property statements

I remain very confused. You previously said that your system required no syntax changes. i.e. "Enhanced C# does not allow new syntax." If you do not allow new syntax... why are you now saying that you would require a number of syntax changes. And if you don't allow new syntax, how did you accomplish things like supporting out-var?

I really can't reconcile many of these statements that seem contradictory. For now, i'm going to assume you do require new syntax (as indicated in the first line). If so, please indicate what your syntax additions actually are. For example, what is your syntax addition that enable the INotifyPropertyChanged code that you mentioned already. What is your syntax addition that enables a user to provide 'out-var's through macros? etc. etc.

CyrusNajmabadi commented 7 years ago

@qwertie Currently, i find this 'proposal' to be far too massive, disorganized, and unclear. I think a way forward would be to start over with new proposals that have very small scope. i.e. "I would propose these specific syntax changes to the language. Here are the grammar changes for it, and what purpose it would serve."

We could then discuss each individual piece fully, ensuring htat we'd thought through all the issues and concerns of each one. Right now the enormity of everything you're discussing here, and the jumping around from ideas and issues is clouding any progress here. (I mean...i still don't actually know what you're actually proposing, let alone how to deal with all the issues that could arise).

Starting just with syntax will be helpful as it will ground things and will help understand waht sort of code the user could create and then what later processing systems could do with it.

CyrusNajmabadi commented 7 years ago

Final note: It's unclear to me what value these macros have over our original SourceGenerator proposals. The benefit of the SourceGenerator approach was that you could take in C# code, manipulate it (using normal Roslyn APIs) and just produce new trees that the rest of the pipeline would operate on. There was no need for a new 'macro language' for manipulating trees. The macro language was just any .net code that wanted to operate on Roslyn's object model.

Such an approach was possible without adding any new syntax to C# at all. Your proposal seems to indicate that you would be able to do things that would traditionally require new syntax (like primary-constructors, or out-vars), but it's still unclear to me how that would work. And, if your approach does not allow for new syntax, it's unclear to me what value your system would have over what we were looking at.

qwertie commented 7 years ago

How did you parse the actual code that contains an 'out var'. i don't care how you transformed it. I just care how you actually parsed it. You said that your system required adding no new syntax.

I feel like you must have missed the message in which I talked about the fact that I added lots of new syntax to EC#. Some of that syntax would make sense without a macro system; some of it would not.

CyrusNajmabadi commented 7 years ago

I'm definitely quite confused (as several messages seem contradictory)**. But, for now, i'm going to go with the explicit claim that new syntax is required and that you introduced new syntax to support these features.

If that's the case, and you required syntax changes to be able to support 'out-var' then why would i need macros in order to support out-var? What do macros buy me? Since i had to introduce the new syntax for out-var in the first place... why would i then use macros to implement out-var?

--

** (Again, this is why i'd lke a new thread that starts with precisely the set of syntactic changes you want in the language to support your proposal).

qwertie commented 7 years ago

I probably confused you by saying "the parser knows nothing about macros". Sorry about that. In my own mind the syntax is independent, because the parser can do whatever, it's just making a tree, and whether there's a macro system running after it or some other system doesn't matter to the parser. But understandably you don't think about it the same way - you think of C# as a single integrated thing, where certain changes to the parser were designed for the macro system and therefore the parser "knows" about macros. So, sorry for that. Still, note that in principle the macro processor could work (but not support things like "out var") without changes to the parser. Edit: e.g. one of the things I'd like to do someday is take various other parsers - Python, C++ - and hook them up to the macro processor.

CyrusNajmabadi commented 7 years ago

Still, note that in principle the macro processor could work (but not support things like "out var") without changes to the parser.

HOW? If the parser does not change, then how do you handle things like your INotifyPropertyChanged example?

The syntax you presented would be rejected by the C# parser. And if it was rejected any sort of 'processor' would have a heck of a time trying to do anything with the tree we produced.

CyrusNajmabadi commented 7 years ago

In my own mind the syntax is independent

How can the syntax be independent? If Macros run on the tree hte parser produces, then the parser has to understand some sort of Macro syntax so it can generate the right sort of nodes that the Macro processor will run on. If it doesn't, then the tree is going to be massively broken, and it will be enormously painful for any sort of processor to have to work on that tree.

qwertie commented 7 years ago

Without changes to the parser, you'd have to make do and write it with syntax that already exists, maybe something like

class ImplementNotifyPropertyChanged {
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

The replace macro would similarly have to be designed to "make do". It would be pretty ugly, but doable.

qwertie commented 7 years ago

To make an analogy, List<T> can hold objects of type Foo without having any awareness of Foo. The macro processor doesn't have generic type parameters, but it does process LNode objects, which are language-independent. So in that sense it processes C# without knowing anything about C#.

CyrusNajmabadi commented 7 years ago

What are LNode objects? What information do they contain? How does one get one?

qwertie commented 7 years ago

LNode is the .NET implementation of Loyc trees. The API is described here.

iam3yal commented 7 years ago

@qwertie I tried to read this post few times myself and while I understand most of what you're saying I really, strongly recommend you to start fresh and create a new issue, explaining things in the following manner:

  1. This is the problem.

  2. This is the solution.

  3. This is the syntax.

  4. This is an example of a macro.

  5. This is how it's used at the callsite.

  6. This is the generated code.

In my opinion you shouldn't even think about EC# when describing this at all, this would make it a lot easier to understand and a allow @CyrusNajmabadi and others to see how this fits within Roslyn, if ever.

CyrusNajmabadi commented 7 years ago

I'm confused again. Are you saying Roslyn would be translating nodes into some other API and calling into that to do work? That sounds quite expensive. Trees can be huge, and we already do a tremendous amount of work to not realize them, and to be able to throw large parts of them away when possible.

CyrusNajmabadi commented 7 years ago

Agreed. I'm getting high level ideas and concepts. But when i try to dive deeper, i'm seeing contradictions and not-fully-fleshed-out ideas.

Many of your ideas also seem predicated on a whole host of assumptions. i.e. "we could do X, (with the implication that Y and Z are also done). And to do Y and Z, we'd need these other things as well." I can't wrap my head around a clear set of concepts and work items that you're actually proposing, and how each one of them would work.

Most of this feels like you have grand ideas in your head, and you're giving quick sketches based on assumptions that are scattered around in a whole host of places :)

Condensing and focusing would make this conversation much simpler.

qwertie commented 7 years ago

I've been switching back and forth between two tasks - if I thought you were asking me about how EC#/LeMP works then I described EC#/LeMP. But you've also been asking about the IDE experience and things like that, so for those questions I've switched gears and tried to figure out (mostly on the fly) how one would, in broad strokes, translate concepts from LeMP to Roslyn. So this conversation is sort-of two conversations interleaved, and which would be bewildering if you're not mentally distinguishing the two or if you haven't understood the EC#/LeMP side of things. Probably at certain points I didn't explain some things well enough, and I'm sorry about that. This got pretty long so I think we should start a new thread, but right now I need to go on a anniversary trip with my wife.

CyrusNajmabadi commented 7 years ago

I've been switching back and forth between two task

I think that switch was not clear enough for me :D And it would be better to just discuss specifically what we would want to do with Roslyn and C# here.

but right now I need to go on a anniversary trip with my wife.

Congrats! I look forward to hearing from you once you get back!

jonathanvdc commented 7 years ago

Hi everyone. I'm a small-time EC# contributor, and I'm currently working on ecsc, a command-line EC# compiler. I'm not as knowledgeable about EC# and LeMP as @qwertie, but I thought I'd try and shed some light on how macros work in EC# – perhaps a different perspective can be helpful. I'll try to explain what LNodes are, what the parser does, and what the macro processor (LeMP) does.

LNodes

EC#'s syntax trees are represented as LNode instances. An LNode can be one of the following:

Every LNode also has a list of attributes, which are also encoded as LNode instances. Attriibute lists are empty most of the time, though.

It is worth noting at this point that there is no such thing as an "invalid" LNode. For example, #if(f(x)) makes no sense – it's an if statement with neither a 'then' nor an 'else' clause – but it's a perfectly legal LNode, because an LNode is just a data structure. It does not have some implicit meaning.

In ecsc, nonsensical syntax trees like #if(f(x)) are only caught by the semantic analysis/IRgen phase. This differs from how C# traditionally operates, i.e., every statement has well-defined semantics from the get-go.

The parser

Let me get this out of the way first: you seem to be under the impression that the EC# parser is aware of which macros have been defined. That is not the case; there is no such magic.

The EC# parser is a relatively simple tool. It takes source code as input, and produces a list of LNodes as output. It does this according to a number of rules. These make the statement below legal (though they don't assign any semantics to it).

ImplementNotifyPropertyChanged
{
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

ecsc has a pair of options (-E -syntax-format=les) that can be used to coerce it to print the syntax tree. Technically speaking -E will expand macros first, and then print the syntax tree. But I haven't defined ImplementNotifyPropertyChanged in this context, so it won't get expanded.

$ ecsc ImplementNotifyPropertyChanged.ecs -platform clr -E -syntax-format=les -fsyntax-only
'ImplementNotifyPropertyChanged.ecs' after macro expansion: 
ImplementNotifyPropertyChanged({
    @[#public] #property(#string, CustomerName, @``, {
        get;
        set;
    });
    @[#public] #property(#object, AdditionalData, @``, {
        get;
        set;
    });
    @[#public] #property(#string, CompanyName, @``, {
        get;
        set;
    });
    @[#public] #property(#string, PhoneNumber, @``, {
        get;
        set;
    });
});

ImplementNotifyPropertyChanged.ecs:1:1: error: unknown node: syntax node 'ImplementNotifyPropertyChanged' cannot be analyzed because its node type is unknown. (in this context)

    ImplementNotifyPropertyChanged
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Again, let me stress that ImplementNotifyPropertyChanged gets parsed fine. The compiler only flags it as an error when it notices that all macros have been expanded and it doesn't know what an ImplementNotifyPropertyChanged node's semantics are.

The macro processor, LeMP

LeMP takes a list of LNodes as input, and produces a list of LNodes as output. Macros are used to do this transformation, but it might as well be a black box from a compiler pipeline perspective – it's not tied to any other component in the compiler.

Anyway, the basic idea is that LeMP's input contains nodes which the semantic analysis pass doesn't understand, and macros then transform those nodes. LeMP's output (hopefully) consist of nodes that semantic analysis understands completely. So the way it works is: the parser produces a syntax tree which need not have fixed semantics, and macro expansion is that syntax tree's one and only chance to get its act together before the semantic analysis pass converts it into compiler IR.

I'd love to show you the expanded version of @qwertie's ImplementNotifyPropertyChanged example, but I can't do that at the moment because ecsc relies on the Loyc NuGet package instead of the EC# master branch; replace inline macro definitions are a relatively new feature in LeMP. Sorry about that.

I can show you how an ADT is expanded though. Consider the following example:

public abstract alt class Option<T>
{
    public alt None<T>();
    public alt Some<T>(T Value);
}

Without macro expansion, this gets parsed as:

@[#public, #abstract, @[#trivia_wordAttribute] #alt] #class(#of(Option, T), #(), {
    @[#public] #fn(alt, #of(None, T), #());
    @[#public] #fn(alt, #of(Some, T), #(#var(T, Value)));
});

We can force macro expansion by adding using LeMP; to the top of the file. That'll make LeMP import its standard macros. The resulting syntax tree is

#import(LeMP);
@[#public, #abstract] #class(#of(Option, T), #(), {
    @[#public] #cons(@``, Option, #(), {
        });
});
@[#public] #class(#of(None, T), #(#of(Option, T)), {
    @[#public] #cons(@``, None, #(), {
        });
});
@[#public] #class(#of(Some, T), #(#of(Option, T)), {
    @[#public] #cons(@``, Some, #(#var(T, Value)), {
        #this.Value = Value;
    });
    @[#public] #property(T, Value, @``, {
        get;
        @[#private] set;
    });
    @[#public] #fn(#of(Some, T), WithValue, #(#var(T, newValue)), {
        #return(#new(#of(Some, T)(newValue)));
    });
    @[System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never), #public] #property(T, Item1, @``, {
        get({
            #return(Value);
        });
    });
});
@[#public, #static, @[#trivia_wordAttribute] #partial] #class(Some, #(), {
    @[#public, #static] #fn(#of(Some, T), #of(New, T), #(#var(T, Value)), {
        #return(#new(#of(Some, T)(Value)));
    });
});

How is this in any way relevant to Roslyn?

¯\_(ツ)_/¯

I just thought I'd give you some background. That's all. :)

CyrusNajmabadi commented 7 years ago

Let me get this out of the way first: you seem to be under the impression that the EC# parser is aware of which macros have been defined. That is not the case; there is no such magic.

I can't reconcile this with the code examples given. If the parser is unware of 'macros' how could it successfully parse:

ImplementNotifyPropertyChanged
{
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

This is not legal C#. If you tried to parse this today then the parser would 'go off the rails', creating tons of skipped tokens and missing tokens. If that's the case, then the transformation step would have a heck of a time trying to figure out what happened. If you want a good tree, then the parser is going to need to know about macros.

Or, alternatively, we can use the approach we took with SourceGEnerators. Namely that we used an existing piece of syntax (i.e. '[attributes]'), to mark where we wanted generators to run. But if it isn't an existing piece of syntax, then i'm not sure how the system can work without the parser having to know about the syntax of these guys.

jonathanvdc commented 7 years ago

Right. So the thing is that the EC# parser doesn't think about what it's parsing in the same way a traditional parser – like Roslyn's C# parser – does.

IIRC, the EC# grammar defines something called block-calls, and what you're seeing is really just an example of that. Basically, anything that looks like identifier { ... } gets parsed as a call node: identifier({ ... }). The parser doesn't stop and consider if the syntax tree is meaningful: only macros and semantic analysis can define a syntax tree's semantics.

Macros don't define new syntax. They merely transform the parse tree in a way that assigns semantics to constructs that don't have semantics yet. The EC# parser was designed with macros in mind – which is exactly why it successfully parses source code that is meaningless without a macro processor – but it doesn't interact with the macros. It just builds a syntax tree, and leaves the task of transforming said tree to the macros.

So the EC# parser will parse the example you listed as exactly this.

ImplementNotifyPropertyChanged({
    @[#public] #property(#string, CustomerName, @``, {
        get;
        set;
    });
    @[#public] #property(#object, AdditionalData, @``, {
        get;
        set;
    });
    @[#public] #property(#string, CompanyName, @``, {
        get;
        set;
    });
    @[#public] #property(#string, PhoneNumber, @``, {
        get;
        set;
    });
});

And that will work even if no macro called ImplementNotifyPropertyChanged is defined – in fact, the parser isn't even aware of which macros are defined when it is parsing away at the source code.

I understand that this can be hard to wrap your head around. But you should really try to think of the EC# parser as something that parses data rather than code, akin to an XML parser. An XML parser will happily parse <CompilerOption key="out" value="bin/Program.exe" />, despite the fact that it has no idea of what a CompilerOption node's semantics are. It's entirely up to the program that runs the XML parser to make sense of what a CompilerOption node is.

Similarly, the EC# grammar defines legal constructs whose semantics are to be defined by the user, in macro form. The parser mindlessly parses its input according to the grammar, and then hands the syntax tree off to the macro processing phase. That's all there is to it, really. Conceptually, it's a pretty dumb system, but it works beautifully.

CyrusNajmabadi commented 7 years ago

IIRC, the EC# grammar defines something called block-calls

...

Ok. So there is new syntax defined, and the parser does need to be aware of this :)


Macros don't define new syntax.

I don't understand. You just said the syntax for macros was: identifier { ... }. That's new syntax. C# doesn't have that syntax today.

CyrusNajmabadi commented 7 years ago

The parser doesn't stop and consider if the syntax tree is meaningful:

By and large, neither does Roslyn's parser**. But the parser still needs to know what syntax is valid or not. It needs to know what language constructs are in the language. And so it needs to know what the syntax is for macros. Otherwise, it will be completely thrown off when it see these constructs. I mean, you don't have to take my word for it. Just toss the above syntax into a file and you'll get errors like:

Severity    Code    Description Project File    Line    Suppression State
Error   CS1022  Type or namespace definition, or end-of-file expected   
Error   CS1022  Type or namespace definition, or end-of-file expected   
Error   CS0116  A namespace cannot directly contain members such as fields or methods   
Error   CS0116  A namespace cannot directly contain members such as fields or methods   
Error   CS0116  A namespace cannot directly contain members such as fields or methods   

--

** Technically not true. But all the cases where Roslyn's parser does this, should be moved out to higher layers. This is what i did when i wrote the TS parser. There's no need for that stuff to live in hte parser. It's just there for legacy reasons.

jonathanvdc commented 7 years ago

Ok. So there is new syntax defined, and the parser does need to be aware of this :)

Yes, absolutely. EC# defines new syntax. But that has nothing to do with the ImplementNotifyPropertyChanged macro in particular.


I don't understand. You just said the syntax for macros was: identifier { ... }. That's new syntax. C# doesn't have that syntax today.

Yeah. As far as I can tell, @qwertie crafted the identifier { ... } syntax specifically for macros. But macros can operate on any syntax node. Heck, a macro can even transform syntax nodes that already have well-defined semantics today. In fact, ecsc implements foreach as a macro.

So I'd much rather say that identifier { ... } is a syntax to make using macros easier, but it's not the syntax, because EC# macros can operate on any syntax.

Does that clarify things a little? :)

qwertie commented 7 years ago

Macros don't define new syntax.

I don't understand. You just said the syntax for macros was:

Well, "macros" is not what defined new syntax. It was "Enhanced C#" (a.k.a. me) that defined the new syntax.

As far as I can tell, @qwertie crafted the identifier { ... } syntax specifically for macros.

That's basically true, but indirectly. So here's the whole story.

I decided that, unlike some existing languages with LISP-style macro systems, I wanted a macro system in which macros would not add new syntax, because I believed parsers should be able to succeed without awareness of macros. Also, as a C++ programmer I was well aware that the C++ parser was linked to the symbol table - in general, C++ is ambiguous and requires a symbol table to resolve those ambiguities. Even if C++ didn't have #define macros, the situation would be analogous to languages where macros define syntax. For example, the statement X * Y; may be a multiplication or a pointer declaration depending on whether X is a type. This has at least two disadvantages:

Also, if macros can define new syntax then their meaning can be slightly harder to guess. By analogy, we could view unknown macros the way we view foreign languages. Consider trying to read Spanish vs Tagalog. You don't really understand either language, but Spanish has both words and grammar that are more similar to English, so you can glean more information from a Spanish text than a Tagalog text - perhaps you can even guess the meaning correctly. If macros can add arbitrary syntax, then when you look at an unknown macro you don't even know to what extent custom syntax has been added. So if you see something like "myMacro foo + bar;" then probably the macro accepts an expression, but you can't be sure; it's really just a list of tokens, and usually in these systems, you can't even know whether the semicolon marks the end of the macro or if it keeps going after that.

So instead I decided to preserve C#'s tradition of "context-free" parsing by ensuring every source file can be parsed without knowledge of macros. However, if macros wouldn't be allowed to add syntax then they would require changes to the language, such that the existing syntax was usually sufficient for them. This new syntax should be useful for multiple unforeseen purposes, and also consistent with the existing flavor of C#.

My main strategy was to "generalize" C#. Part of this generalization was taking the existing syntactic ideas of C# and extending their patterns in a logical way. Here are some examples:

As I designed this, I had very few actual macros in mind. For instance, remember alt class BinaryTree<T>? I generalized "contextual keywords" long before I thought of creating alt class. The historical precedent seemed compelling enough by itself, e.g. partial, yield, async (not to mention add, remove, etc.) demonstrate the value of contextual keywords. And obviously, the C# team would always design new syntax in a way that is consistent with old syntax, so it made sense to "entrench" any obvious patterns that were developing - making them available both to future features in the compiler itself, and macro authors as well.

Another part of "generalizing C#" was "squashing" multiple grammar productions together. In part this was to give macros flexibility, but I also wanted to make the EC# parser simpler, or at least no more complex, than the C# parser. (Currently it totals 2500 lines including about 500 lines of comments - or 5600 lines including 800 comments after LeMP expands it. Roslyn's C# parser is about 10,000 lines with 800 comments, though it's not fair to directly compare since, for example, my parser still lacks LINQ, while Roslyn has more blank lines and is more paranoid due to its use in an IDE.)

Finally, I realized that "generalized C#" by itself isn't sufficient for all macros, so I added a few more things:

CyrusNajmabadi commented 7 years ago

Macros don't define new syntax. I don't understand. You just said the syntax for macros was: Well, "macros" is not what defined new syntax. It was "Enhanced C#" (a.k.a. me) that defined the new syntax.

...

If you defined new syntax for macros... then macros did indeed define new syntax.

c# does not contain this grammar production. In order for the c# parser to parse out macros, it would need to understand this new syntax. I do not see how we can do macros (like you do them) without defining new syntax here.

CyrusNajmabadi commented 7 years ago

All the "space" constructs - namespace, class, struct, interface, enum - have similar syntax, so I combined them.

There is a tension here. We've avoided overlapping things when there are significant deviations between the forms. For example, namespaces can have dotted names. The rest can't. If the node supports dotted names here, that means that all downstream consumers either need to figure out what to do when they encounter a dotted name in any of these other entities. Alternatively, the parser might never accept dotted names for the rest, but now everyone needs to know that they should assume the name is never dotted. The node is no longer a source for confident information about what you might get.

There's the question of 'when does this end' as well? After all, methods/properties creates 'spaces' (i.e. where locals and whatnot live). Should we merge methods with the above list? You could just have the above list then have an optional parameter list before the braces...

At the end of the day, you could try to merge everything into one type (i've seen systems that do this). Pros are that you only ever deal with one type. Cons are the amount of information you need to handle.

CyrusNajmabadi commented 7 years ago

Finally, i guess i'm just not seeing what purpose macros actually serve over the SourceGenerator proposal. As you've mentioned, they cannot introduce new syntax. So all they can do is take existing syntax and manipulate it, to produce new syntax. But that's what SourceGenerators did. That's something Roslyn is optimized for, as it allows very extensible transformation of Syntax.

The problem was not in making it possible for people to manipulate syntax (we have plenty of experience and features that do that today). The problems stemmed from how you make a cohesive, fast, and trustworthy set of tools when this is a fundamental building block of your system.

Because source-transformation is now a core primitive, we have to assume it will be used pervasively by many. And that means every single feature we build into the product needs to work well with these features.

gafter commented 7 years ago

We are now taking language feature discussion in other repositories:

Features that are under active design or development, or which are "championed" by someone on the language design team, have already been moved either as issues or as checked-in design documents. For example, the proposal in this repo "Proposal: Partial interface implementation a.k.a. Traits" (issue 16139 and a few other issues that request the same thing) are now tracked by the language team at issue 52 in https://github.com/dotnet/csharplang/issues, and there is a draft spec at https://github.com/dotnet/csharplang/blob/master/proposals/default-interface-methods.md and further discussion at issue 288 in https://github.com/dotnet/csharplang/issues. Prototyping of the compiler portion of language features is still tracked here; see, for example, https://github.com/dotnet/roslyn/tree/features/DefaultInterfaceImplementation and issue 17952.

In order to facilitate that transition, we have started closing language design discussions from the roslyn repo with a note briefly explaining why. When we are aware of an existing discussion for the feature already in the new repo, we are adding a link to that. But we're not adding new issues to the new repos for existing discussions in this repo that the language design team does not currently envision taking on. Our intent is to eventually close the language design issues in the Roslyn repo and encourage discussion in one of the new repos instead.

Our intent is not to shut down discussion on language design - you can still continue discussion on the closed issues if you want - but rather we would like to encourage people to move discussion to where we are more likely to be paying attention (the new repo), or to abandon discussions that are no longer of interest to you.

If you happen to notice that one of the closed issues has a relevant issue in the new repo, and we have not added a link to the new issue, we would appreciate you providing a link from the old to the new discussion. That way people who are still interested in the discussion can start paying attention to the new issue.

Also, we'd welcome any ideas you might have on how we could better manage the transition. Comments and discussion about closing and/or moving issues should be directed to https://github.com/dotnet/roslyn/issues/18002. Comments and discussion about this issue can take place here or on an issue in the relevant repo.

You may find that the original/replace code generation feature tracked at https://github.com/dotnet/csharplang/issues/107 is related to this proposal.