Enhanced C#: a friendly hello

dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.

https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/

MIT License

19.11k stars 4.04k forks source link

Enhanced C#: a friendly hello #11324

Closed qwertie closed 7 years ago

qwertie commented 8 years ago

I'm terribly embarrassed.

For the last few months I've been working on a tool called LeMP that adds new features to C#. I recently published its "macro" reference manual. This month I was going to start publicizing my "Enhanced C#" project when I discovered that the design of C# 7 had already started well before C# 6 was officially released - and even more shocking, that this design work was being done "in public" right on GitHub!

It kills me that I didn't realize I could have participated in this process, and that "my" C# was drifting apart from C# 7 for over a year. Oh well - it is what it is, and I hope that something useful can still be salvaged out of my work.

So, this post is to inform you about Enhanced C# - where it came from, and what it offers that C# 7 does not.

A brief history

As a class project in my final year of university, I extended a compiler with a new feature (unit type inference with implicit polymorphism), but (to make a short story shorter) the authors of the language weren't interested in adding that feature to their language. This got me thinking about our "benevolent dictatorship" model of language development and how it stopped me, as a developer, from making improvements to the languages I relied on. Since I had already been coding for 15 years by that time, I was getting quite annoyed about writing boilerplate, and finding bugs at runtime that a "sufficiently smart compiler" could have found given a better type system.

So in 2007 I thought of a concept for a compiler called "Loyc" - Language of your choice - in which I wanted to create the magical ability to compile different languages with a single compiler, and also allow users to add syntax and semantics to existing languages. This system would democratize language design, by allowing third parties to add features to existing languages, and allowing language prototypes and DSLs to seamlessly interoperate with "grown up" languages like C#. But my ideas proved too hard to flesh out. I wanted to be able to combine unrelated language extensions written by different people and have them "just work together", but that's easier said than done.

After a couple years I got discouraged and gave up awhile (instead I worked on data structures (alt link), among other things), but in 2012 I changed course with a project that I thought would be easier and more fun: enhancing C# with all the features I thought it ought to have. I simply called it Enhanced C#. It started as a simple and very, very long wish list, with a quick design sketch of each new feature. Having done that I reviewed all the feature requests on UserVoice and noticed a big gaping hole: I hadn't satisfied one of the most popular requests, "INotifyPropertyChanged". So at that point I finally went out and spent three weeks learning about LISP (as I should have done years ago), and some time learning about Nemerle macros. At that point (Oct. 2012) I quickly refocused my plans around a macro processor and called it EC# 2.0, even though 1.0 was never written. I realized that many of the features I wanted in C# could be accomplished with macros (and that a macro processor doesn't require a full compiler, which was nice since I didn't have one) so the macro processor became my first priority.

So "Loyc", I eventually decided, would not be a compiler anymore, but just a loose collection of concepts and libraries related to (i) interoperability, (ii) conversions between programming languages, (iii) parsing and other compiler technology, which I now call the "Loyc initiative"; I've had trouble articulating the theme of it... today I'll say the theme of Loyc is "code that applies to multiple languages", because I want to (1) write tools that are embedded in compilers for multiple langauges, and (2) enable people, especially library authors, to write one piece of code that cross-compiles into many langauges. One guy wants to call it acmeism but that doesn't seem like the right name - I'd call it, I dunno, multiglotism or simply, well, loyc.

EC# and Roslyn

Roslyn's timing didn't work out for me. When I conceived EC#, Roslyn was closed source. I researched it a bit and found that it would only be useful for analysis tasks - not to change C# in any way. That wasn't so bad; but I wanted to explore "radical" ideas, which might be difficult if I had to do things the "Roslyn way". That said, I was inspired by Roslyn; for instance the original implementation of "Loyc trees" - the AST of EC# - was a home-grown Red-Green tree, although I found my mutable syntax trees to be inconvenient in practice (probably I didn't design them right the first time) and rewrote them as green-trees-only (immutable - I thought I might rewrite the "red" part later, but I got used to working with immutable trees and now I don't feel a strong need for mutable ones.)

By the time MS announced they were open-sourcing Roslyn (April 2014), I had been working on Enhanced C# and related projects (LLLPG, Loyc trees and LES) for well over a year, and by that point I felt I had gone too far down my own path to consider trying to build on top of Roslyn (today I wish I could have Roslyn as a back-end, but I don't think I have time, nor a volunteer willing to work on it).

LeMP

EC# still is not a "compiler" in the traditonal sense, but it's still useful and usable as-is thanks to its key feature, the Lexical Macro Processor, or LeMP for short. It is typically used as a Visual Studio extension, but is also available as a command-line tool and a Linux-compatible GUI.

Through macros, I implemented (in the past few months) several of the features that you guys have been discussing for more than a year:

Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;
Code Contracts via annotations on the method signature
Tuples (positional only) with deconstruction
Algebraic data types
Patern matching

(They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros and because I'm just one guy.)

It also has numerous other features:

A maximally flexible alternative to "primary constructors"
Code quotations and code pattern matching (comparable to LISP and Nemerle), which is useful for writing macros, for code analysis, code generation and even (potentially) reading/writing JSON and LES files (a use case I haven't written about yet).
Method forwarding, for doing the decorator pattern more easily.
Declaring variables and writing code sequences in expressions (like the ; operator that was sadly not added to C# 6)
The "quick binding" operator
on_finally which works like Swift's defer, and related macros (on_return, on_throw)
replace and unroll for generating boilerplate (although after reading about Nim, I think there's a better way to do the unroll feature)
A with statement based on the With statement in Visual Basic
An LL(k) parser generator called LLLPG (massive chicken-and-egg problem there: you write LLLPG grammars in EC#, while the EC# grammar is written in LLLPG)
And last but not least, users can write their own macros.

The other parts of EC# that exist - the parser and "pretty printer" - support some interesting additional features such as symbols, triple-quoted string literals, attributes on any expression, etc. However, the majority of the syntactic differences between EC# and C# 6 are designed to support the macro processor.

An important theoretical innovation of Enhanced C# is the use of simple syntax trees internally, vaguely like LISP. This is intended to make it easier to (1) convert code between programming languages and (2) to communicate syntax trees compactly.

What now?

Well, I'm not 100% decided about what to do now, knowing that the C# open design process exists and that C# 7 is shaping up to be really nice.

I don't intend to throw the whole thing away, especially since there are major use cases for EC# that C# 7 doesn't address. So in the coming weeks I will change the pattern matching syntax to that planned for C# 7, implement the new syntax for tuple types (minus named parameters, which cannot be well-supported in a lexical macro), and add those "record class" thingies (even though I don't think the C# team has taken the right approach on those.)

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

In fact, those are far from my only options - I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs. And I'd love to make the world's most widely useful programming language (which is not EC#, because we all know how hard it is to improve a language given the backward compatibility constraint). The main reasons to keep going with EC# are (1) that I have a large codebase already written, and (2) that after 8 years alone I finally have a volunteer that wants to help build it (hi @jonathanvdc!)

I do suspect (hope) there are some developers that would find value in EC# as a "desugaring" compiler that converts much of C# 7 to C# 5. Plus, LeMP is a neat tool for reducing boilerplate, "code find and replace" operations, and metaprogramming, so I really want to polish it up enough that I finally win some users.

There is so much more I could say, would have liked to say, and would still like to say to the C# design team... but in case this is the first you've heard of Enhanced C# or LeMP, you might find this to be a lot to take in - just like for me, C# 7 was a lot to take in! So I'll avoid rambling much longer. I hope that, in time, I can win your respect and that you will not "write me off" in a sentence or two, or without saying a word, an eventuality I have learned to emotionally brace for. I definitely have some opinions that would be opposed by the usual commentators here - but on the other hand, I think the new C# 7 features are mostly really nice and I'll be glad to have them.

So if this wasn't TLDR enough for you, I hope you'll enjoy learning about EC# - think of it as how C# 7 might have looked in a parallel universe.

Links:

EC# for normal coders (because EC# for programming language nerds is a bit old)
LeMP macro reference manual

aL3891 commented 8 years ago

You did all this stuff by yourself? That's pretty darn impressive! I for one hope you stick around these repos, sounds like you have some good insights!

dsaf commented 8 years ago

You might be interested in https://github.com/JetBrains/Nitra . I think it will eventually allow "extending" C# in an IDEA-grade IDE (https://www.jetbrains.com/rider ?).

...I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs.

Sadly, Microsoft has not yet described the future toolchain for C# - WebAssembly development. Saying "we have LLILC" is not really an answer. Hopefully they understand that TypeScript is just a temporary work-around.

HaloFour commented 8 years ago

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

I think that depends a lot on what you want.

For acceptance in the mainstream I'd think that you'd have more impact with Roslyn, both lobbying and participating. While I believe that any feature must be championed by an LDM member to be considered for acceptance, having someone with experience in proving out the feature and who can actually develop it would reduce their burden and likely lower the barrier a bit which may allow for a faster evolution of the language.

But you would have to endure the politics of the committee and for someone who has gone their own for so long that might not be ideal for you. If you wanted to keep it on your terms it might be worthwhile to consider forking Roslyn. You'd have a lot to relearn but at least in theory you can keep your changes up to date with the evolution of C#.

Note that several features that you've mentioned (pattern matching, records) got punted to beyond C# 7.0, and they are very likely to change. So rather than adopting what has already been proposed here I'd suggest using EC# as a proof of concept for an existing syntax which can have an impact on how the feature will shape up for potentially C# 8.0.

qwertie commented 8 years ago

@aL3891 Thanks very much! Though I did it all myself, I'd stress that I didn't want to do it alone (I mean, think about your colleagues, have you learned anything from them? I've missed that by not having any).

@dsaf Thanks for the information! Nitra is an impressive project that maybe I ought to learn about (though I guess it could be hard to fit it in with the work I've already done). I wonder what Rider offers that, say, Xamarin Studio doesn't (because competing directly with VS Community seems ... impractical)

P.S. I don't really get how LLILC is different from the AOT compilation that Mono had already.

@HaloFour I'm definitely looking to have some kind of real-world impact, but I'm not sure if the C# team would be interested in replicating the main feature of EC#: a macro system or a compiler plug-in system. Plus, the design of EC#/LeMP would probably be difficult to adapt to Roslyn, so ... I'm not sure how to actually get a real-world impact. :confused:

aL3891 commented 8 years ago

I suggest you open issues for the individual features of EC# that you'd like to see in c# and reference the work you've done in each area and then the discussions can go from there :) It may not always be possible to adapt your implementation directly but i'm sure the team will find it interesting none the less, as @MadsTorgersen said (I think it was) said on channel 9 one time, There aren't a whole lot of people out there designing languages, so its nice to stay together!

dsaf commented 8 years ago

@qwertie

...wonder what Rider offers that, say, Xamarin Studio doesn't...

Built-in ReSharper obviously :).

qwertie commented 8 years ago

It did not escape my notice that no one from Microsoft was interested. I took my leave, tail between legs... progress on EC# since then has been minimal, but it's not cancelled, I'm still working on it.

CyrusNajmabadi commented 8 years ago

i'm interested :)

But, as Halo pointed out, the entirety of what's going on in this issue is enormous. It's simply too large to do anything with in its current state. Extracting out useful pieces and working toward getting them implemented is likely the best path forward.

Note that this bit concerns me:

They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros

We've looked into areas like this before, and a large issue is that things often work well for more 'toy' scenarios, but fall over when you need to really deal with the full complexity of the language. For us to do anything it really needs to be designed so that it will work well in that context.

Thanks!

qwertie commented 8 years ago

@CyrusNajmabadi first of all, thank you very much for saying something (and also thanks to aL3891, dsaf & HaloFour - I appreciated your replies; it's just that I really had my heart set on some kind of response from an 'insider'.)

I am curious what you mean that things "fall over when you need to really deal with the full complexity of the language"? I have found that macros work well for much more than just 'toy' scenarios. Let's see...

I have noticed some difficulty in composability and conflict resolution of macros written by different people that operate on the same construct (e.g. two macros modify a method - what order should they run in?), but at least a set of "standard" macros can be designed together and compose in the right way.
I'm also aware of the challenge of integrating macros with refactoring, but it seems solvable. Some operations could fail when using macros that do fancy things, though, and renames probably shouldn't be done in realtime like VS2015 does.
I expect that a macro system would be much harder to implement in Roslyn than in Enhanced C# due to the complexity of syntax trees in the former. [EDIT: Hmm... there's a good chance I'm wrong about that.] An alternative to actually implementing a macro system would be some sort of change to the compiler to allow alternate front-ends. This could allow interested parties to use my existing macro system by switching the extension on a source file to 'ecs'. I bet someone would also write a VB front-end that converts to a C# syntax tree so that you could mix languages in one project (albeit not seamlessly - if the front end can only deal with syntax, the VB code would end up being case-sensitive).

dsaf commented 8 years ago

@qwertie

...I really had my heart set on some kind of response from an 'insider'.)

You have actually received a response from Gafter straight away - marking something as "Discussion" means that a suggestion is being rejected on the spot.

My opinion on this topic:

EC# cannot be widely popular because C# doesn't suck. The situation with TypeScript - JavaScript for example is entirely different and even then TypeScript is kind of "meh" unless a front end is predicted to be quite complex.
It's important to point out that C# is open-source but not community-driven. The only viable way of directly contributing to C# design is reduced to this:

https://github.com/dotnet/roslyn/issues?q=is%3Aopen+is%3Aissue+label%3A%22Up+for+Grabs%22+label%3A%22Feature+Request%22+label%3A%22Area-Language+Design%22

Alternatively consider this (not sure if this one is still alive):

https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=3&jid=208941&jlang=EN&pp=SS

CyrusNajmabadi commented 8 years ago

How do you define an 'insider'?

CyrusNajmabadi commented 8 years ago

I am curious what you mean that things "fall over when you need to really deal with the full complexity of the language"

i mean things like properly working in complex constructs like async/await or 'yield'. Or in constructs where variables are captured into display classes. Or with constructs that need to understand the intricacies of reference/value types, especially across complex generic constraints. etc. etc.

After this, you also have to figure out how this impacts the IDE/editing-cycle. We put many months into exploring a system that would do nothing but just allow tree transforms, along with generating the results of those into files that could be introspected, and the problem space was still enormous. How does debugging work? How do IDE features (like 'rename') work? How do safe transformatoins of code work?

Think about it this way:

We want intellisense to be extremely accurate and very fast. How do you accomplish that in systems that allow arbitrary transformation without bounds on transformation cost?

Finally, arbitrary extensibility is also a major concern for us in terms of being able to rev the language ourselves. Now, anything we do in the language has the potential to stomp on someone's arbitrary extensibility plugin. What if some company internally created their own 'async/await' plugin. What happens now when the next version of C# comes out?

qwertie commented 8 years ago

How do you define an 'insider'?

Someone on one of the Roslyn teams. But I would have been happy with any Microsoftie.

constructs that need to understand the intricacies of reference/value types, especially across complex generic constraints. etc. etc.

Well, the beauty of user-defined stuff is that it doesn't have to be perfect because MS isn't responsible for supporting it. Also, many macros do something simple enough that there's little that could go wrong and few feature interactions to consider. Plus, a lot of macros would be one-off things made by one user for one project; those things need not work beyond that one little context they were made for.

We put many months into exploring a system that would do nothing but just allow tree transforms

Interesting. Are discussions about it available to read?

We want intellisense to be extremely accurate and very fast. How do you accomplish that in systems that allow arbitrary transformation without bounds on transformation cost?

In general you can't, but note that we technically have this problem already with WinForms controls. In theory they can misbehave on the design surface; in practice most people are happy, and happier than they would be if the design surface didn't run custom code. There are mitigations:

Decouple updating the program tree (the directory of classes, methods, etc.) from most user-facing operations (this is done already, I think)
Provide a hint to macros (or other units of custom transformation) that they are running in an IntelliSense context, to help slow macros avoid expensive parts (I'm thinking of my parser generator, which could skip grammar analysis and generate methods without bodies in that case.)
Measure the running time of all macros: per-macro aggregate time and slowest single invocation. If there's a performance problem, the IDE can put up tips like "[FooMacro]() is slowing down Intellisense" so VS doesn't take the blame. And of course you'd need to inject a thread abort if a macro enters an infinite loop. You'd want to watch their memory usage too (is there a mechanism in the CLR for that?) The build process would also need some way of informing users about performance problems.
Have a dialog box for "intellisense performance" which, in addition to a profile of built-in intellisense, would summarize macro performance and allow users to disable badly-behaved macros at design time.
Typically a slow macro would only be used in one or two files, so the IDE could learn to process those files last for the purpose of passive look-up (e.g. dot-completion). Refactoring does require full processing though.

Roslyn doesn't do incremental parsing, does it? I wouldn't know how to mix that with a macro system.

Refactoring is the biggest challenge I know of. Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned. Others (renames and parameter reorder) could update the final output, then map those changes to the original source code. I seems doable, but it requires the transformation be performed again immediately in order to find side effects (things that changed other than the requested thing) and failures (where the requested refactoring didn't work properly) and those problems would have to be brought to the user's attention.

What if some company internally created their own 'async/await' plugin. What happens now when the next version of C# comes out?

Then that company would have two ways to do async, I guess? Sorry for being naïve, but so far I'm not seeing a major practical problem. To me it's like the problem of "what if we allow users to define their own classes, and then we add a new class to the BCL with the same name? Hello ambiguity errors!" I knew that was a risk back when I defined my own WeakReference<T>, but I did it anyway. It seems to me it should be the user's decision whether to take that risk. (BTW my macro system has a prioritization feature for some scenarios like this.)

CyrusNajmabadi commented 8 years ago

Someone on one of the Roslyn teams.

That would be me :)

CyrusNajmabadi commented 8 years ago

Well, the beauty of user-defined stuff is that it doesn't have to be perfect because MS isn't responsible for supporting it.

One of the arguments i thought you were making was that by implementing this, we could then provide many of the features we've been working on for C# 7 and onwards by layering on this system. That's only true if this subsystem if capable enough to handle all the complexity that we'd need to manage with all our features.

CyrusNajmabadi commented 8 years ago

Roslyn doesn't do incremental parsing, does it? I wouldn't know how to mix that with a macro system.

Yes, Roslyn does fairly extremely incremental parsing. It tries to reuse, down to the token level, all the data it can :)

CyrusNajmabadi commented 8 years ago

Refactoring is the biggest challenge I know of. Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned. Others (renames and parameter reorder) could update the final output, then map those changes to the original source code. I seems doable, but it requires the transformation be performed again immediately in order to find side effects (things that changed other than the requested thing) and failures (where the requested refactoring didn't work properly) and those problems would have to be brought to the user's attention.

Yes. And you've now taken a system that should take a few seconds max, and made it potentially take minutes (depending on how many transformations are being done, and how costly they all are). :)

CyrusNajmabadi commented 8 years ago

Then that company would have two ways to do async, I guess? Sorry for being naïve, but so far I'm not seeing a major practical problem.

We've now released a new version of C# that they can't use. Or which may break their code.

"what if we allow users to define their own classes, and then we add a new class to the BCL with the same name? Hello ambiguity errors!"

We've actually implemented language features to help avoid that. Both through things like namespaces, as well as aliasing (::) (which people do use to ensure that names won't collide).

Allowing for abitrary new syntax to be introduced is problematic. Consider that you introduced something like "out-vars" before we did. But perhaps you did it with different semantics than the ones we're putting the language. Now, what happens when someone upgrades? Does the core language take precedence? Could we subtly change code without anything catching it?

CyrusNajmabadi commented 8 years ago

Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned.

The problem with this is that features themselves are complex. Extract method, for example, needs a fine-grained understanding of data flow and control flow to make appropriate decisions. How does it do this over code that may change arbitrarily because of macros?

Consider just something simple:

void Foo()
{
    <SomeMacro1>

    Normal...
    CSharp...
    var result = Code...

    <SomeMacro2>
}

The user wants to extract out the code in the middle. But maybe <SomeMacro2> ends up using 'result'. Normally extract method would see that 'result' was unused after the extracted region, and it would pull it entirely into the new method. Now, it would need to know that the value was actually used by <SomeMacro2> in order to make sure the value got passed out.

And that's just a simple case :)

qwertie commented 8 years ago

@dsaf Thanks. What's the importance of the 'up for grabs' tag?

consider this (not sure if this one is still alive)

Thanks for the heads up; too bad it has no date on it. If it's more than a few months old, I probably applied for it already.

CyrusNajmabadi commented 8 years ago

There are mitigations:

For certain. But now the problem space has gotten much larger.

This is a primary concern here: the value produced by this has to warrant the enormous amount of work that needs to happen here. And it has to justify all that work and not have major downsides that we would have to absorb.

Or, in other words, there are limited people resources to be able to do all of this. A suggestion like this would take a massive amount of effort to thread through the compiler (just the infrastructure), and then would have all that additional work to get workign properly in the IDE. Just the testing would be massive difficult as each feature would now have to deal with not only arbitrary code, but arbitrary macros.

To given some examples. We did something like 'Analyzers', and that was vastly smaller than what you're discussing in scope. Analyzers themselves too several devs an entire product cycle to fit into Roslyn. And it's still getting tons of work because of the deep impact is has on the system, and all the perf issues we need to address.

In order for us to take on this work, we'd need clear understanding of exactly what value we'd be getting once we finished. Right now that value isn't clear. For example, as mentioned earlier, we likely would not be able to use this system for our own language features. That would mean we'd be investing in something with very little payoff for our own selves. It also means we wouldn't be directly utilizing (dogfooding) our own features. Which means ensuring a high enough level of quality would be quite difficult. etc. etc.

CyrusNajmabadi commented 8 years ago

Are discussions available ot read?

Work was done here: https://github.com/dotnet/roslyn/blob/master/docs/features/generators.md https://github.com/dotnet/roslyn/issues/5292

CyrusNajmabadi commented 8 years ago

What's the importance of the 'up for grabs' tag?

It means we're happy with anyone taking it on and providing a solution.

Technically, anything is 'up for grabs', but the ones with that particular label are things we think non-full-time developers could reasonably take on.

qwertie commented 8 years ago

Yes, Roslyn does fairly extremely incremental parsing.

Wow! Somehow I overlooked the incrementalness of the parser when I looked at its code.

And you've now taken a [refactoring] system that should take a few seconds max, and made it potentially take minutes.

Hmm. If the solution is big enough and the macros are slow enough, yes. But no one has to use macros, and if the user is informed of what's slowing down the process, they will be encouraged to do something about slow macros.

One of the arguments i thought you were making was that by implementing this, we could then provide many of the features we've been working on for C# 7 and onwards by layering on this system.

Ah, I see why you would think that, since I had done exactly that with my system. And if C# were a new language then yes, you'd want to design it so that core features would be part of some grand extensibility scheme. But in the case of EC#, part of the reason I did so many features as macros was so that I'd have a payoff without the trouble of writing an actual compiler! Plus I wanted to explore just how much can be accomplished with lexical macros (= syntax-tree-processor macros) alone. And it's a lot.

While some built-in features of C# could be done as macros, I see a macro system more as

an incubator for ideas - just to see what power users do with it
as a way of reducing pressure to add new features to the language - you prioritize the features that are served least well by macros
a way to give developers features that will never meet the team's famous threshold for adding features to C#, or that don't have a single best solution. Classic examples: things that auto-implement INotifyPropertyChanged; parser generators; and since you mentioned dogfooding, macros for code analysis and generation, which should be handy in Roslyn itself.
a replacement for T4 templates that is far more convenient to use.

We've actually implemented language features to help avoid that. Both through things like namespaces, as well as aliasing (::) (which people do use to ensure that names won't collide).

My macro system uses namespaces pretty much the same way (if it had more users, I'd add support for :: too.)

Allowing for arbitrary new syntax to be introduced is problematic.

I agree; Enhanced C# does not allow new syntax. I edited C#'s grammar to make it flexible enough that new syntax wouldn't be needed in most cases. For example, there are several macros now that have the syntax of a method definition, like replace Square($x) => $x * $x;.

Consider that you introduced something like "out-vars" before we did. But perhaps you did it with different semantics than the ones we're putting the language. Now, what happens when someone upgrades? Does the core language take precedence? Could we subtly change code without anything catching it?

Yeah... I recognize the tension. Probably it's better to show an "ambiguity" error rather than risk subtly changing the meaning of existing code. If the macro author knows the new feature is coming (and has the same semantics) he could mark it as having a low priority so that the new feature takes priority when it becomes available; and for end-users there could be another mechanism to prioritize, or at least import selectively.

Now, it would need to know that the value was actually used by in order to make sure the value got passed out.

True, there would be cases where 'extract method' might do the wrong thing... although in this example, if SomeMacro2 does something with result without the variable having been passed to it explicitly, it's probably either a badly designed macro (because why would it do that?) or one for which the dev doesn't need/want the refactoring engine to care, because the change in behavior is expected, like some debug/logging/profiling macro that doesn't affect user-facing behavior.

I understand MS has high standards... but I think if a feature provides a lot of value, it should be done even if interactions with other features is imperfect. I suspect you're looking at this as "if the UX is not 100% rock-solid, we can't do it." Whereas I'm looking at it more like "few things have a worse user experience than generating C# with T4 templates. Let's make something akin to T4 that's pleasant, if not quite perfect, see what people do with it, and learn from that experience when we make our next new language in 10 years." To me, as a 'power user', I hate how repetitive my code often is, and wonder if I'd be happier switching to Rust (though Rust drops OOP and GC, both of which I'd rather have than not have) or Nemerle (which, er, I can't recall why I didn't. Maybe because I wanted so much to write a self-hosting compiler!)

So to me, it would be enough to put up a warning. It could detect if any macros are used within the body of a method and say "Caution: this method uses user-defined macro(s). In the presence of certain macros, 'extract method' could produce code that is invalid, or that behaves differently. You may need to verify manually that the refactored code is correct."

qwertie commented 8 years ago

Having said all that, point taken, any kind of compile-time metaprogramming is a big, difficult feature.

I just thought of something that I never think about, because I don't use ASP.NET. You know how you can write blocks of C# code in <% %> in an aspx file and intellisense works in there? How do they do that? Is the solution necessarily tied to the Roslyn C# parser, or could I somehow write a VS plugin that would work like aspx, but use my EC# parser instead? And if so, who out there has the knowledge of how to do that - and may be willing to share it with me?

jnm2 commented 8 years ago

Before I speak bluntly, LeMP is jaw-droppingly impressive. It has features that pull me, from method forwarding to the accessible implementation of the build-your-own-language philosophy. Even though it's clearly not possible for Roslyn to adopt the same methodology as EC#, I absolutely think that it's worth examining all the concepts that Roslyn can take away from project. The work you've done is cool and highly intelligent.

One thing does bother me. As a consumer of C# and Visual Studio, who dreams of the ability to add my own pet language features like await? in a similar way to writing Roslyn analyzers, I have always imagined hooking into the parser and then transforming an already-parsed syntax tree. The thought of having to implement the language extension as a text preprocessor horrifies me. Text processing is full of edge cases that are factorially hard to forsee and harder to get right in a maintainable way. I want to deal with the purest semantic level possible. I was similarly frustrated every time I tried to use ReSharper's Custom Pattern search or code analysis. I don't want to operate on text, I have experienced that to be brittle and dangerous and at best a workaround, but rather on a semantic model of the C# language which goes straight to the IL compiler.

CyrusNajmabadi commented 8 years ago

And you've now taken a [refactoring] system that should take a few seconds max, and made it potentially take minutes. Hmm. If the solution is big enough and the macros are slow enough, yes. But no one has to use macros, and if the user is informed of what's slowing down the process, they will be encouraged to do something about slow macros.

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

they will be encouraged to do something about slow macros.

This necessitates two things. First, we need a system to be presenting this to the user. That has to be designed and built into the entire products. Second, you are adding features now that can take away from the experience and force users into unpleasant choices. Say, for example, a team takes a dependency on some macro that they find really useful. They're using it for months as they grow their codebase up. Then, at some point they find htat things have gotten slower and slower and it's the fault of this macro. What do they do now? Removing the macro is devastating for them, as they'll have to go and change all of that code in their system that depends on it. And giving up on all these features they care about is equally disappointing for them.

CyrusNajmabadi commented 8 years ago

I see a macro system more as an incubator for ideas - just to see what power users do with it

In that regard, people can and do just use Roslyn. You don't need a macro system when you literally can fork things and just implement the new feature you want. I mean, that's literally how we incubate all our own ideas :)

CyrusNajmabadi commented 8 years ago

a way to give developers features that will never meet the team's famous threshold for adding features to C#, or that don't have a single best solution. Classic examples: things that auto-implement INotifyPropertyChanged; parser generators; and since you mentioned dogfooding, macros for code analysis and generation, which should be handy in Roslyn itself. a replacement for T4 templates that is far more convenient to use.

Note: this is precisely what we were working on. And even trying to do basic stuff here turned out to be massively complex once you took the whole experience into account. Take, for example, simple features like:

Navigation. Say a macro introduces symbols. How can the user introspect and understand the symbols? Navigation would likely only take them to the invocation of the macro, leaving them to have to try to decipher what had actually happened. We'd need somethign to help out here, and that would spike costs.
Debugging. How do you debug these sorts of things? We've gotten enormous amount of feedback over the years that people find it massively difficult to debug things like type generators. Now we'd be taking that same problem and pushign it to the normal coding cycle. In order to do this sort of feature we would have to have some sort of realistic Debugging story. And that means another large spike in costs.
Refactorings. Already mentioned. But this would numerous problems and could easily lead to 'safe' refactorings (like 'rename') now breaking your code. That's both a big issue with our goals for refactorings and can definitely erode user trust.

These are just three of many areas we found impacted when we started investigating this space. And each of those three area breaks up into many other areas we'd have to look at and consider.

This is not me saying the idea is bad. This is me saying: the costs are huge. Ergo, the rewards must warrant it.

CyrusNajmabadi commented 8 years ago

I agree; Enhanced C# does not allow new syntax. I edited C#'s grammar to make it flexible enough that new syntax wouldn't be needed in most cases.

I'm a little confused as to what you're proposing C# adopt in future releases. I went through your enormous gist and it's got a lot of ideas and sketches, but is somewhat scattershot.

Could you clarify which parts, precisely, of ec# you'd like us to add? And could you give core examples of that addition so we can direct the discussion around them? Thanks!

CyrusNajmabadi commented 8 years ago

a way to give developers features that will never meet the team's famous threshold for adding features to C#, or that don't have a single best solution.

So this is tricky. If this doesn't meet our own bar, then we'd be very hesitant to add it into the language. After all, if we weren't living and breathing this feature every day, then the chance for it to have major issues would be quite high. As we've discovered, it's only through day-to-day dogfooding that we really can shake down a feature effectively. This has been the case with everything we've produced. Someone, like me, will create a feature and test the heck out of it. Then, a month later when the team starts really using it in our day-to-day development, i'll get a wave of subtle issues reported that i missed originally.

We absolutely must have that in order to ship a high quality enough feature. If people aren't using this as part of their core cycle, and seeing how their debugging experience is impacted, or how their LiveUnitTesting experience is impacted, or how their intellisense experience is impacted, or how their refactoring experience is impacted, or how this impacted customers who use "Open Folder" and open 100MB of source :)) then we won't get the critical mass we need to ensure that the feature is going to effective solving problems for customers in the real world.

CyrusNajmabadi commented 8 years ago

Whereas I'm looking at it more like "few things have a worse user experience than generating C# with T4 templates. Let's make something akin to T4 that's pleasant, if not quite perfect, see what people do with it, and learn from that experience when we make our next new language in 10 years."

So, to be clear, this was an exact scenario that our investigation was attempting to make better. :)

And, i want to also be clear: We did not scrap this idea. The idea is still there, and we are still interested in it. It's just that as we started doing work here, we quickly realized the enormity of the scope this would have, and that this would involve many developers over many months. That was simply too high a cost for our schedule, and we deprioritized it against the other work we're doing.

I'm seriously hoping we pick that up again for C# 8.0. But i also think that what we'd deliver would be a lot less 'ambitious' than what i think i'm seeing you desire. We're going to go with scenarios and schemes we think we can nail across many of the axes that i outlined above. If we can succeed on that, we'll ship, and then use the feedback we get to judiciously improve things moving forward. i.e. very similar to what we did with analyzers. We started with a core kernel that had a design we could believe in. And, over time, we continually enhance based on our own needs and the needs of the community.

If we start this back up again during C# 8, it would be great to have your input!

CyrusNajmabadi commented 8 years ago

I just thought of something that I never think about, because I don't use ASP.NET. You know how you can write blocks of C# code in <% %> in an aspx file and intellisense works in there? How do they do that?

At a high level, it's a rather simple system (though the devil is in the details). Here are the broad strokes on how it works. First, ASP opens the HTML file and parses out all the HTML structure. During this it identifies all the <%%> regions. It then spits out a second 'code-behind' file that is a normal C# file with #line and #default regions where it spits in scaffolding code, as well as the code in the <%%>. This is the code-file that the Roslyn system interacts with.

Now, in the Editor lots of amazing work happens. A ITextBuffer is created for that crazy C# file. Roslyn is powering the experience for that file. An ITextBuffer is created for the HTML file. ASP powers the experiences outside the <%%> blocks, and it leaves the <%%> blocks alone. Then we use IProjectionBuffers. (https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.text.projection.iprojectionbuffer.aspx) to grab portions of each buffer which we stitch together into one final buffer which gets presented to the user. This 'projected' buffer should be character identical to the original file. But it's actually a projection of other files, which have IDE experiences driven by different components and different teams.

Overall this works really well, but there's a lot of complexity at some points. For example, the new Razor syntax which just uses "@" to move into the embedded language, and which has no 'end' delimiter :)

Is the solution necessarily tied to the Roslyn C# parser, or could I somehow write a VS plugin that would work like aspx, but use my EC# parser instead? And if so, who out there has the knowledge of how to do that - and may be willing to share it with me?

Technically, you could 'host' C# yourself. Using these interfaces:

https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.textmanager.interop.ivscontainedlanguage.aspx

But, these interfaces are ANCIENT and PAINFUL. :(

I'd love for them to be rev'ed in the future to be more modern (i.e. using the modern editor concepts that came with the WPF editor), and more debuggable. Right now we basically know that only ASP hosts us so the code (on both sides) makes huge amounts of assumptions. Assumptions that would almost certainly break if you tried this yourself :(

CyrusNajmabadi commented 8 years ago

I want to deal with the purest semantic level possible. I don't want to operate on text, I have experienced that to be brittle and dangerous and at best a workaround, but rather on a semantic model of the C# language which goes straight to the IL compiler.

There are a few problem with that.

First: Note that you talk about the information flowing one direction. From semantics to IL. However, in a system like ours, information needs to flow in the reverse direction as well. i.e. to the toolchain that is sitting on top of the compiler. Take, for example, a feature like FindAllReferences. It would somehow need to be aware of how you've impacted the semantics so that it could find references to symbols that you were now using. How would we do this? We don't want FindReferences to have to call into every macro as that could be staggeringly slow. So now we need a system where you can inject, and yet flow information out in a performant (and indexable) fashion.

This would impact things like CodeLens. You wouldn't want it saying "0 references" to some symbol when it was actually referenced by some code you injected. That would make people believe they could remove code, when really that might break things, or subtly change semantics.

This could impact things like 'rename'. Today rename can check to ensure that semantics did not change across runs. The moment you have arbitrary semantic injection, how can we tell if your rename was safe?

@qwertie mentioned "So to me, it would be enough to put up a warning. It could detect if any macros are used "

But that would essentially impact all features. Would every feature basically have a warning saying "sorry, you're using Macros... so all bets are off"? That would be a terrible experience and many people would be complaining that we were not providing a suitable feature with the level of integration they expected.

Second: This presumes that Roslyn even has a 'pure' clean semantic layer that you can plug into. Trust me when i say that right now, it doesn't. Indeed, this lack of a clean semantic layer is one of the reasons that IOperation got delayed. This was our attempt to expose the semantic layer in a very clean way only for querying (i.e. definitely not for mutation). Even just exposing that layer for clean querying turned out to be problematic and we discovered that we're going to need to invest a lot there before we can expose that.

Once we get to that point, then we can start considering what it might be like to allow for some sort of mutation ability to be provided. But note that providing such an ability is also staggeringly difficult when we just barely start poking at the surface of things. For example, say you have someone who says "hey, if i see pattern XXX in my method body then i want to run some special code that changes semantics". However, that code that runs then wants to inject a Class into the system. however, when it injects that class, it changes the semantics of everything (including the semantics that the current generator cares about).

The introduction of that Class changes all type binding and means that any reference to any type needs to be recomputed. Most of the interesting sorts of code generation use cases that customers have asked for end up doing this.

And this is just the case where you have one generator. Say you have many generators (i.e. one for contract validation, one for logging, one for ensuring certain patterns and company practices in your code). How do these all run? Are they ordered? What if you need them to loop? Do you somehow have to keep running them all until they reach a fixed point?

Auto-mutation is an area where things get complex SUPER fast. :-/

qwertie commented 8 years ago

I'll respond to @jnm2 first because it's easier :)

I have always imagined hooking into the parser and then transforming an already-parsed syntax tree. The thought of having to implement the language extension as a text preprocessor horrifies me.

Me too. The D language has a lot of great features, but the way some of them work makes me wince a little. Like when you do compile-time codegen, it has to be done by generating strings of source code and I very much disliked that design. In EC#/LeMP you only deal with syntax trees; you can do some custom syntax with "token literals" which are trees of EC# tokens.

I was similarly frustrated every time I tried to use ReSharper's Custom Pattern search

I haven't used that, but that reminds that I've always wanted a non-regex text search option that would implicitly insert a whitespace regex [ \t\n]* at every apparent word boundary so that searching for void Foo can find void Foo... I think I would leave the option on ALL the time for all file types. Sometimes I marvel that MS does these impressive massive features, but misses some of the little things.

But it occurs to me that doing a syntax search - like class $name : $(.._), IFoo, $(.._) { $(..body) } to find any class derived from IFoo - is very easy in Loyc and perhaps a more limited form of that would be straightforward in Roslyn too.

[@CyrusNajmabadi] But that would essentially impact all features. Would every feature basically have a warning saying "sorry, you're using Macros... so all bets are off"?

I'm pretty sure rename - the world's most important refactor IMO - can be done well much of the time despite macros. Perhaps I could make a prototype to explore the idea... but if the symbol appears within the output of a macro, you do have to run macros again and see if there are side-effects or failures. Now, such effects can't really be avoided, I mean, given the algebraic data type macro:

public abstract alt class BinaryTree<T> where T: IComparable<T>
{
    alt Leaf(T Value);
    alt Node(T Value, BinaryTree<T> Left, BinaryTree<T> Right);
}

it produces "withers" methods like Node<T>.WithLeft(). Renaming the Left property should succeed but with the side effect of changing the "wither", and renaming WithLeft should fail outright since it doesn't exist in the original code.

Would every feature basically have a warning saying "sorry, you're using Macros... so all bets are off"?

That reminds me of a third option for 'extract method' beyond (1) ignoring macros, (2) ignoring macros but going to the work of analyzing the result to figure out whether the refactor produced a semantically identical result. Option (3) is to offer to expand macros before doing the extract method operation. I suspect the same could be done in general for any refactor involving macros. We could call option (4) "do the extract method on the result of macro expansion, then magically reverse macro expansion" - that may not be possible in general [edit: but maybe some candidates for reversal can be detected and attempted...].

The introduction of that Class changes all type binding and means that any reference to any type needs to be recomputed. Most of the interesting sorts of code generation use cases that customers have asked for end up doing this.

Yes, letting a program analyze/use itself, using that information to expand itself arbitrarily is a case of "there be dragons!", at least when declaration order is not supposed to matter. I noticed this when I was learning D, and constructed the following example to illustrate one of the subtle problems that can occur:

const int C1 = Overloaded(3);

int CallFunction(int x) { return (int)Overloaded(x); }
long Overloaded(long i) { return i*i; }
static if (CallFunction(3) != 3) // compile-time if
{
  int Overloaded(int j) { return j; }
}

const int C2 = CallFunction(3);

In this case, C1 is 3, but C2 is 9, which is weird because C1 and CallFunction both call Overloaded(3). I assumed this was not the only paradox D could have lurking in it, so I kept the problem in mind when I designed the "original" EC# - a series of design sketches that made C# more like D. I solved the paradox (the equivalent EC# code would have produced two compiler errors) but my system was a bit limited in the metaprogramming department. Without something like macros, my language wouldn't solve a variety of problems that I thought it should, such as the INotifyPropertyChanged problem, so I switched gears and really focused on learning about Lisp macros for awhile - putting the rest of my ideas on the back burner.

Lexical macros don't produce paradoxes like you see in D, since they have no access to semantic information and cannot affect the syntax tree outside themselves. Nemerle allows macros that can look up semantic info; earlier I had some trouble finding info about such macros, but I just found this and I'll maybe read that now.

qwertie commented 8 years ago

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

What do you have in mind?

Say, for example, a team takes a dependency on some macro that they find really useful. They're using it for months as they grow their codebase up. Then, at some point they find htat things have gotten slower and slower and it's the fault of this macro. What do they do now? Removing the macro is devastating for them, as they'll have to go and change all of that code in their system that depends on it. And giving up on all these features they care about is equally disappointing for them.

Well, in most cases, hiding it from IntelliSense might be sufficient. But if the macro (or other language extension) makes a lot of changes, losing visibility of those changes would be annoying. If it's open source, they could submit optimizations or fork the code. If it's both complex and closed source (and I'd be wary of using closed-source macros), they could write their own macro whose job is to run a fast (and simple) approximation of what the original macro does. The macro would be set up to override the original macro in an IntelliSense context only.

In that regard, people can and do just use Roslyn. You don't need a macro system when you literally can fork things and just implement the new feature you want. I mean, that's literally how we incubate all our own ideas :)

I'm surprised you say that, because I see two massive barriers:

It sounds difficult - I wouldn't have a clue how to isolate my fork from the original Roslyn in VS such that both versions remain usable; how to set up my project to use a modified Roslyn; or how best to distribute my forked version of Roslyn to others. (And wouldn't it be a big download?) And since C#'s parser isn't currently designed with any macro-friendly features, there's a good chance one would want to modify the parser... that's not easy even for me.
Forked versions by different people aren't easily combined.

Most ideas are too small and don't provide enough benefit to try overcoming these barriers.

qwertie commented 8 years ago

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

Let's see, here are some macros in LeMP that might work the way you're thinking:

namespace Foo;: all it does is wrap the file in a namespace decl. The macro itself is fast, but I guess it could disrupt any 'incremental updating' that Roslyn does. Since I'm not familiar with Roslyn's incremental update process, I didn't think about this at first.
#useSequenceExpressions - EC# supports arbitrary executable statements inside expressions. This macro does its best to translate such funky code into plain C#, which is needed by one of my favorite features, quick-binding variables. If C# 8 supported macros, I'm certain it would also have a feature that eliminates the need for this macro.
#useSymbols - helps translate @@symbol literals to plain C#. This macro must process the entire source file. If C# 8 supported macros but not symbols, this is one macro you could safely shut off. It doesn't affect any top-level declarations and could perhaps be shadowed by a using static method to keep intellisense working.

The using System(.Collections, .Text, .Linq) macro comes to mind - it only has a local effect, but in turn it affects how the rest of the file is interpreted. However, I suppose this is not the kind of "wide effect" you were thinking of.

CyrusNajmabadi commented 8 years ago

What do you have in mind?

Anything that introduces a top level declaration. It will mean reanalysis of everything in that compilation and any downstream compilations.

Well, in most cases, hiding it from IntelliSense might be sufficient.

This doesn't actually solve the problem. Indeed, it just makes the user think that something is broken.

If it's open source, they could submit optimizations or fork the code.

This is a lot of presumptions. Even if it's their own code, they might be unable to optimize things in the manner you're specifying. We have to consider these situations and we have to have an answer that is acceptable :)

I'm surprised you say that, because I see two massive barriers: It sounds difficult

Precisely teh issue with Macros as well :D

CyrusNajmabadi commented 8 years ago

all it does is wrap the file in a namespace decl.

That means the semantics of everything in that file change. It means the semantics of everything in that compilation need reanalysis. It means every downstream project needs reanalysis :)

here are some macros in LeMP that might work the way you're thinking: namespace Foo;

I'm confused. You mentioned that your system added no new syntax. But this is new syntax that you're saying is a LeMP macro. I'm not sure how to reconcile this.

If this is new syntax, that's very problematic. What happens if we end up adding that syntax in the future? Now this code is broken for the user. If it's not new syntax, then how is this working?

CyrusNajmabadi commented 8 years ago

it only has a local effect, but in turn it affects how the rest of the file is interpreted.

This is, by definition, not a local effect... If a file can be reinterpreted, then that can effect that compilation and all downstream compilations.

qwertie commented 8 years ago

Precisely teh issue with Macros as well :D

Huh? Macros are easy. In many languages you can write one in a couple of lines of code. Even the EC# and Nemerle models where you have to create a separate assembly to hold your macros - one might argue that the added difficulty is a good thing since macros should be considered a last-resort solution when other mechanisms won't solve a given problem.

This is, by definition, not a local effect... If a file can be reinterpreted, then that can effect that compilation and all downstream compilations.

Right, but it's a known quantity and the work is done: Roslyn is already designed to deal with this precisely this kind of cascading effect.

CyrusNajmabadi commented 8 years ago

Renaming the Left property should succeed

Yes, it should. But how do you verify that it actually has? Let's use the simple example you mentioned (Withers). If we have the following code:

class BinaryExpression {
     Expression Left, Right;
}

class Whatever {
   void Foo() {
       BinaryExpression e = null;
       var v = e.WithLeft(...);
   }
}

If the user renames 'Left', then 'WithLeft' needs to update. Otherwise, their code will be broken post-rename.

CyrusNajmabadi commented 8 years ago

Huh? Macros are easy.

Clearly not. As i mentioned, a whole host of areas become majorly problematic. Again, cases like: Debugging, Navigation, Refactoring.

CyrusNajmabadi commented 8 years ago

Right, but it's a known quantity

How can it be a known quantity? We have no idea what the macro will produce.

and the work is done: Roslyn is already designed to deal with this precisely this kind of cascading effect.

Roslyn is designed to deal with precisely the cases we know about for C#. Indeed, Roslyn was designed, at every level, to deal with the problem space given the constraints of the language. We take great advantage of our knowledge of waht can/can't happen in the language. Macros throw most (if not all) of our optimization opportunities out the window.

They also introduce areas that we have no known solution or design for. Or, they would require complete redoing of certain areas. Take, for example, 'FindReferences' (as i mentioned before). How does FindReferences work in a world with Macros? How do we know if the Macro generated code ends up referencing the variable in question? The only way to know is to have to actually perform and analyze all macros before doing the FAR operation. As any edit might impact any macro, we have to do this. That means that every FAR now takes a hit as we have to reanalyze the entire solution.

Note: this is the issue LiveUnitTesting faces. Almost any edit can have an effect on tests. So they always have to re-execute all tests. This is fine in a world where that's providing background/ambient information. It's not ok when the user expects FindReferences to return in seconds, and it takes minutes as all macros are reexecuted and reanalyzed.

qwertie commented 8 years ago

How can it be a known quantity? We have no idea what the macro will produce.

Yes, but we know where a macro produces its output: right in the same spot. So processing a change to using System(.Collections, .Text, .Linq) has no more complexity than if the user selected the line and pasted in

using System.Collections;
using System.Text;
using System.Linq;

As any edit might impact any macro, we have to do this.

Hold on. If you restrict your view to EC#-style lexical macros (as I have been doing implicitly, as I haven't thought much about semantic-level macros) this is not true: lexical macros can't look at the contents of other files, and incidentally, macro expansion is highly parallelizable as a result (I defined exactly one macro that looks at another file - includeFile - and it seems fair to make that a built-in macro for IntelliSense purposes. Plus if you make it illegal for user macros to access the outside world, you could potentially run them in some kind of security sandbox).

Roslyn is designed to deal with precisely the cases we know about for C#. Indeed, Roslyn was designed, at every level, to deal with the problem space given the constraints of the language. We take great advantage of our knowledge of waht can/can't happen in the language. Macros through most (if not all) of our optimization opportunities out the window.

I see. Supposing you change Roslyn to expand macros as a matter of course, can you mention an example of a particular optimization that might be lost as a result?

If the user renames 'Left', then 'WithLeft' needs to update. Otherwise, their code will be broken post-rename.

Ahh, right! My thinking was flawed, I didn't actually think of that. I mean, I realized that of course the name of WithLeft would change as a side effect, and that this change could be detected. But somehow I didn't think about the fact that any code calling WithLeft would be broken unless the rename operation also figures out how to rename all uses of WithLeft. And even if we can figure out how to do that, I'm not certain it's a good idea. Can the rename option know for certain that the new "WithLeftRenamed" method is really "the same method" as the original "WithLeft" method? Maybe. But if it tries to rename WithLeft too, there's the potential for cascading effects on other macro expansions. I haven't thought it through, but it's a little scary. A system that gives up and says "side effect detected: renaming Left caused the WithLeft method to disappear from a macro expansion" would at least avoid such complications...

...and yes, irritate the user a little. I would say "but at least they're getting some benefit from a language feature that wouldn't have otherwise existed." Then I guess you would say "well, if we hadn't spent all this time writing a macro system, we might have spent the time instead adding a new ADT-like language feature with withers built right into the core of C#, and our version of the feature wouldn't have this problem". And, well, that's true. But then again, you might have decided not to do the feature after all. With macros devs can get lots of features very quickly that the Roslyn team either won't ever do, or won't do right now, which is when they want it done. Plus, the macro feature can be seen as a data-gathering exercise, as you'll see which ideas become the most popular. If a popular macro great as-is, you can just let people keep using it; whereas if the macro system imposes annoying shortcomings, you can prioritize making it a built-in feature without those annoyances. (meanwhile some shops will disagree with the whole philosophy of macros and outlaw them... you won't see me working at one of those places. I couldn't believe all the people complaining about var back in the day!)

I'm confused. You mentioned that your system added no new syntax. But this is new syntax that you're saying is a LeMP macro. I'm not sure how to reconcile this.

Ahh. Now you mentioned you read through my "gist" - that is, the EC# 1.0 design sketches. I guess you focused less on "EC# for PL nerds" which, though out-of-date, explains my thinking for the macro system.

There are basically three categories of syntactic changes in EC#. All categories are hard-coded in the parser:

First, there are changes designed to make possible the features of EC# 1.0. So right now the EC# parser accepts things like alias Foo = Bar; or trait TFoo { } that I sketched out but didn't implement. It turned out, though, that some of these features could be implemented as macros! So that's what I did, e.g. in the case of method forwarding (==>) and @@symbol literals (which, in the sketch document, has the syntax $symbol). Doing this made sense for me, but of course, it would be a bit silly for Roslyn to do that.
Second, there are syntactic changes designed specifically for macros. Notable items in this category are block-call expressions like match (expr) {...} and quote { code }, and token literals.
Third, there are a series of "regularizations" to C# which make the syntax more regular and uniform. I basically "squashed" different kinds of syntax together into "one thing". For instance, there is no separate syntax for "declaration space" (inside a class) and "executable space" (inside a method) and "property space" (immediately inside a property); all those context accept the same kind of syntax, which is simply called a "statement" (the biggest challenge with this was constructors, which look a lot like method calls). Also, a method's formal argument list has the same grammar as the syntax you would use to call a method. The purpose of all this regularization is to (1) simplify the parser (at the cost of adding more complexity in validation later on) and (2) give macros freedom to accept interesting syntax with no changes to the parser. Most notably, if you write foo { stuff } there is no way for the parser to know if stuff should be parsed like a class body or like a method body. Good thing, then, that there's no difference between the two! (no doubt C# Interactive had to grapple with this same problem - kudos to the team on that fantastic work btw. I LOVE IT SO MUCH!).

namespace Foo; is a bit special since it kind-of fits in both categories 1 and 3. (1) It wasn't in the EC# 1 design sketches but I think of it as something that could reasonably be built in to the language, and (3) namespace Foo is a regularized construct in the sense that you can also write class Foo : Bar; or enum Foo; (or even namespace Foo : Bar, Baz). Edit: just to be clear, the syntax is regularized but the macro is not. class Foo; can be parsed, but has no meaning, as no macros have been written to give it a meaning.

~~For more information, please read PL nerds part 3.~~ EDIT: actually don't, it's too out of date.

CyrusNajmabadi commented 8 years ago

Could you break down simply the following:

here are some macros in LeMP that might work the way you're thinking: namespace Foo; I'm confused. You mentioned that your system added no new syntax. But this is new syntax that you're saying is a LeMP macro. I'm not sure how to reconcile this.

Do macros add new syntax or not? If they don't add new syntax, can you give a very simple explanation of what the grammar of your macros are?

Finally, can you state specifically what limitations there are on what macros can use as input to their work, and exactly what they can produce as the results of their operation?

For example, you mentioned INotifyPropertyChanged. How would your macro system help out here?

CyrusNajmabadi commented 8 years ago

I haven't thought it through, but it's a little scary. A system that gives up and says "side effect detected: renaming Left caused the WithLeft method to disappear from a macro expansion" would at least avoid such complications...

I would be very loathe to add such a feature with such a limitation. It goes against a core principle we have in terms of what the user experience should be for our language.

If we did a feature like this it would be precisely because we would want the entire experience around it to be great. And that means that it should work great at a minimum for the scenarios like Debugging, Navigation, Refactoring, etc.

qwertie commented 8 years ago

Do macros add new syntax or not?

They do not. Was my previous answer on that topic helpful?

Finally, can you state specifically what limitations there are on what macros can use as input to their work, and exactly what they can produce as the results of their operation?

EC# macros take the Lisp concept of macros and apply it to C#. So, the macro processor proceeds independently on each source file, top-to-bottom and outside-in (I think more parallelism could be squeezed out some of the time, but conceptually it's top-to-bottom, outside-in.) Each macro invocation is replaced with its result; so typically a macro takes one syntax tree as input and produces one tree as output.

My system also has some extra features: for avoiding and dealing with conflicts between macros; to allow macros to produce multiple (or zero) nodes as output, rather than just one ("splicing"); to allow macros to scan code below it (not just children) and optionally "drop" the code below it; and to allow macros to process child nodes first, in violation of the usual outside-in ordering.

Specifically: for each node, the macro processor looks up all macros that are registered to process nodes of that name (if this were implemented in Roslyn, I guess Roslyn would look for macros associated with the current type of SyntaxNode, and macros could also be limited to a particular name, e.g., a call to Foo() but not a call to Bar()). If any are found, the list of macros is grouped by priority (although most macros have the same priority, PriorityNormal) and the highest-priority macros are executed first.

Each macro can "accept" by returning a syntax tree or "decline" by returning null; macros are also passed an IMacroContext which, among other things, tells them about the ancestor nodes of the current node and allows them to write warning and error messages.

From the macro processor's perspective, a macro invocation "succeeds" if exactly one macro does not return null. If two or more macros return a result, a message is normally printed that the invocation was ambiguous (normally an error, but this is downgraded to a warning if the two macros produced the same output, and a macro can further request that the warning be suppressed.) If all macros return null, errors and warnings produced by all macros are delivered to the user. (I'll skip other subtleties about warnings/errors).

If there are multiple priority groups, lower priorities execute only if higher priority macros all return null. If all macros returned null and there are no warnings/errors, a generic warning is produced ("2 macro(s) saw the input and declined to process it: namespace1.macro1, namespace2.macro2") unless those macros use Passive mode (which means "it's normal for this macro to produce no output").

A macro can request that children be processed first. (This is optimized to happen once if competing macros ask for the same thing.) Sometimes this is necessary, but there's a performance risk; if a macro processes children first but then returns null, some other macro that didn't process children first might accept and then the work must be repeated (my system also wastes the effort if all macros decline, but atm I can't think of any reason it must be that way.)

Finally, a macro can "drop" all nodes after itself and incorporate them into its own results; that's how the macro for namespace Foo; works, and other macros like on_finally and LLLPG.

Oh, and I just added a feature where macros can define other macros in the current scope (i.e. new macros disappear at }). Just one macro uses the feature so far, which I will demonstrate below.

CyrusNajmabadi commented 8 years ago

What is your syntax for macros?

qwertie commented 8 years ago

For example, you mentioned INotifyPropertyChanged. How would your macro system help out here?

(Sorry for taking so long - I hit a couple of bugs in LeMP while writing this.)

While certainly we could envision a macro specifically designed to implement INotifyPropertyChanged, I think it would be slightly niftier to show how EC#'s replace macro can do the job. We start with something like this and would like to factor out the repetitive code, namely ChangeProperty<T> (potentially repeated once per class) and the repetition between the properties.

public class DemoCustomer : INotifyPropertyChanged
{
    public event PropertyChangedEventHandler PropertyChanged;

    /// Common code shared between all the properties
    protected bool ChangeProperty<T>(ref T field, T newValue, 
        string propertyName, IEqualityComparer<T> comparer = null)
    {
        comparer ??= EqualityComparer<T>.Default;
        if (field == null ? newValue != null : !field.Equals(newValue))
        {
            field = newValue;
            if (PropertyChanged != null)
                PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
            return true;
        }
        return false;
    }

    private string _customerName = "";
    public  string CustomerName
    {
        get { return _customerName; }
        set { ChangeProperty(ref _customerName, value, "CustomerName"); }
    }

    private object _additionalData = null;
    public  object AdditionalData
    {
        get { return _additionalData; }
        set { ChangeProperty(ref _additionalData, value, "AdditionalData"); }
    }

    private string _companyName = "";
    public  string CompanyName
    {
        get { return _companyName; }
        set { ChangeProperty(ref _companyName, value, "AdditionalData"); }
    }

    private string _phoneNumber = "";
    public  string PhoneNumber
    {
        get { return _phoneNumber; }
        set { ChangeProperty(ref _phoneNumber, value, "PhoneNumber"); }
    }
}

We can factor out the common stuff like this:

replace ImplementNotifyPropertyChanged({ $(..properties); })
{
    // ***
    // *** Generated by ImplementNotifyPropertyChanged
    // ***
    public event PropertyChangedEventHandler PropertyChanged;

    protected bool ChangeProperty<T>(ref T field, T newValue, 
        string propertyName, IEqualityComparer<T> comparer = null)
    {
        comparer ??= EqualityComparer<T>.Default;
        if (field == null ? newValue != null : !field.Equals(newValue))
        {
            field = newValue;
            if (PropertyChanged != null)
                PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
            return true;
        }
        return false;
    }

    // BTW: This is a different `replace` macro that can pattern-match any syntax tree.
    // The [$(..attrs)] part of this example is supposed to put all attributes into a list
    // called `attrs`, but it doesn't actually work because pattern matching on attributes 
    // isn't implemented yet. This is relevant because in EC#, modifiers like "public" are 
    // considered to be attributes.
    replace ({
        [$(..attrs)] $Type $PropName { get; set; }
    } => {
        replace (FieldName => concatId(_, $PropName));
        private $Type FieldName;
        [$attrs]
        $Type $PropName {
            get { return FieldName; }
            set { ChangeProperty(ref FieldName, value, nameof($PropName)); }
        }
    });

    $properties;
}

This defines a macro called ImplementNotifyPropertyChanged that accepts a list of properties within a braced block. Although LeMP itself doesn't let users define macros on-the-fly, it does allow macros to define other macros, which is the technique used here. replace is a standard macro that creates a new macro, scoped to the current block, that outputs a specified syntax tree and performs replacements. (If the "current block" is at the top level of a source file, the macro is available at any lower point in the file.)

You can use the macro like this:

public class DemoCustomer : INotifyPropertyChanged
{
    public DemoCustomer(string n)
    {
        CustomerName = n;
    }

    ImplementNotifyPropertyChanged
    {
        public string CustomerName { get; set; }
        public object AdditionalData { get; set; }
        public string CompanyName { get; set; }
        public string PhoneNumber { get; set; }
    }
}

Naturally people would want to use ImplementNotifyPropertyChanged in all source files that implement INotifyPropertyChanged, so they could put its defnition in a common file, say, ImplementNotifyPropertyChanged.ecs and then use includeFile("ImplementNotifyPropertyChanged.ecs") to import it in a given file.