dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.1k stars 4.04k forks source link

[Proposal] enable code generating extensions to the compiler #5561

Closed mattwar closed 7 years ago

mattwar commented 9 years ago

Often when writing software, we find ourselves repeatedly typing similar logic over and over again, each time just different enough from the last to make generalizing it into an API impractical. We refer to this type of code as boilerplate, the code we have to write around the actual logic we want to have, just to make it work with the language and environment that we use.

One way of avoiding writing boilerplate code, is to have the computer generate it for us. After all, computers are really good at that sort of thing. But in order for the computer to generate code for us it has to have some input to base it on. Typical code generators are design-time tools that we work with outside of our codebase, that generate source that we include with it. These tools usually prefer their input to be XML or JSON files that we either manipulate manually or have some WSIWYG editor that lets us drag, drop and click it into existence. Other tools are build-time, that get run by our build system just before our project is built, but they too are driven by external inputs like XML and JSON files that we must manipulate separately from our code.

These solutions have their merits, but they are often intrusive, requiring us to structure our code in particular ways that allow the merging of the generated code to work well with what we’ve written. The biggest drawback, is that these tools require entire facets of our codebase to be defined in another language outside of the code we use to write our primary logic.

Some solutions, like post-build rewriters, do a little better in this regard, because they operate directly on the code we’ve written, adding new logic into the assembly directly. However, they too have their drawbacks. For instance, post-build rewriters can never introduce new types and API’s for our code to reference, because they come too late in the process. So they can only change the code we wrote to do something else. Even worse, assembly rewriters are very difficult to build because they must work at the level of the IL or assembly language, doing the heavy lifting to re-derive the context of our code that was lost during compilation, and to generate new code as IL and metadata without the luxury of having a compiler to do it. For most folks, choosing this technique to build tools to reduce boilerplate code is typically a non-starter.

Yet the biggest sin of all, is that all of these solutions require us to manipulate our nearly unfathomable build system, and in fact requires us to have a build system in the first place, and who really wants to do that. Am I Right?

Proposal: Code Injectors

Code injectors are source code generators that are extensions to the compiler you are using, as you are using it. When the compiler in instructed to compile the source code you wrote, code injectors are given a chance to exam your code and add new code that gets compiled in along with it.

When you type your code into an editor or IDE, the compiler can be engaged to provide feedback that includes the new code added by the code generators. Thus, it is possible to have the compiler respond to your work and introduce new code as you type that you can directly make use of.

You write a code injector similarly to how you write a C# and VB diagnostic analyzer today. You may choose to think of code injectors as analyzers that instead of reporting new diagnostics after examining the source code, augment the source code by adding new declarations.

You define a class in an assembly that gets loaded by the compiler when it is run to compile your code. This could easily be the same assembly you have used to supply analyzers. This class is initialized by the compiler with a context that you can use to register callbacks into your code when particular compilation events occur.

For example, ignoring namespaces for a moment, this contrived code injector gives every class defined in source a new constant field called ClassName that is a string containing the name of the class.

[CodeInjector(LanguageNames.CSharp)]
public class MyInjector : CodeInjector
{
    public override void Initialize(InitializationContext context)
    {
        context.RegisterSymbolAction(InjectCodeForSymbol);
    }

    public void InjectCodeForSymbol(SymbolInjectionContext context)
    {
        if (context.Symbol.TypeKind == TypeKind.Class)
        {
            context.AddCompilationUnit($@”partial class {context.Symbol.Name} 
                      {{ public const string ClassName = “”{context.Symbol.Name}””; }}”);
        }
    }
}

This works because of the existence of the C# and VB partial class language feature.
Of course, not all code injectors need to be in the business of adding members to the classes you wrote, or especially not adding members to all the classes you wrote indiscriminately. Code injectors can add entirely new declarations, new types and API’s that are meant to simply be used by your code, not to modify your code.

Yet, the prospect of having code injectors modify the code you wrote enables many compelling scenarios that wouldn’t be possible otherwise. A companion proposal for the C# and VB languages #5292 introduces a new feature that makes it possible to have code generators not only add new declarations/members to your code, but also to augment the methods and properties you wrote too.

Now, you can get rid of boilerplate logic like all that INotifyPropertyChanged code you need just to make data binding work. (Or is this that so last decade that I need a better example?)

Subjects not covered in this proposal but open for discussion too

  1. Ordering of Injectors – this concerns the order of injectors, which are run first etc., and the order of the new sources as presented to the compiler. The is of interest to the supersedes feature proposed in #5292
  2. Callback events – beyond callbacks for type symbols declared in source, what other callback patterns would be useful for code generators, keeping in mind that these will likely need to be invoked by the IDE as well.
  3. Having multiple injection event handlers lead to the generation of the same source code, only once, and being smart about it.
  4. Recursion - Can generated code trigger additional code injection events for the newly injected declarations? I’d rather the answer be No, since this will make the system much simpler.
  5. More?
daveaglick commented 9 years ago

Looking forward to seeing where this discussion goes. The last time I remember metaprogramming being discussed, or more generally compile-time hooks, it sounded like the team wanted a little more time to see how things shook out (https://github.com/dotnet/roslyn/issues/98#issuecomment-71780597). Hopefully now that it's 8 months later and the library and tooling is in the wild it's a good time to revisit.

I personally really like the idea of using code diagnostics and fixes to drive this functionality. We already have great tooling around developing and debugging them, and Roslyn already knows how to apply them. Once developed, there's also already a mechanism for explicitly applying them to target code to see how they'll look.

I can envision a variety of use cases for this concept, from simple one-off fixes to things like entire AOP frameworks based on applying code fixes in the presence of attributes that guide their behavior.

sharwell commented 9 years ago

I wrote the code generation portions for the C# targets of ANTLR 3 and ANTLR 4 using a pattern similar to XAML. Two groups will shudder at this statement: developers working on MSBuild itself, and the ReSharper team. For end users working with Visual Studio, the experience is actually quite good. There are some interesting limitations in the current strategy.

Reference limitations

It is possible for C# code to reference types and members defined in the files which are generated from ANTLR grammars. In fact, from the moment the rules are added to the grammar (even without saving the file), the C# IntelliSense engine is already aware of the members which will be generated for these rules.

However, the code generation step itself cannot use information from the other C# files in the project. Fortunately for ANTLR, we don't need the ability to do this because the information required to generate a parser is completely contained within the grammar files.

Undocumented build and IntelliSense integrations

The specific manner in which a code generator integrates with the IntelliSense engine (using the XAML generator pattern) is undocumented. This led to a complete lack of support for the code completion functionality described in the previous section in other IDEs and even in ReSharper.

ghord commented 9 years ago

Let's deal with the problem of undefined order of such transformations. In normal OO code, this pattern is conceptually similar to decorator pattern. Take a look at this code:

var logger = new CachingLogger(new TimestampedLogger(new ConsoleLogger()));

vs

var logger = new TimestampedLogger(new CachingLogger(new ConsoleLogger()));

The nice thing is that we have to manually specify the order of wrapping the class. This seems like the most obvious answer: Let the programmer specify the order.

I think this would be simple to do cleanly with attributes which would be somehow wired up to cause this transformations

[INoifyPropertyChanged, LogPropertyAccess]
public class ViewModel
{
   //gets change notification and logging
   public int Property { get; set; } 
}

They could be specified at assembly/class/member level to represent all kinds of transformation scope.

The problem is that until now, attributes serve as metadata only option - now instead they modify source code. Maybe user defined class modifiers would be better:

public notify logged class ViewModel
{
    public int Property { get; set; }
}

Where notify and logged are user defined somehow.

mattwar commented 9 years ago

@ghord That's a good idea. If we limit the code generators to only working on symbols with custom attributes explicitly specified, we can order the code generators by the order of the attributes as specified in the source.

daveaglick commented 9 years ago

@mattwar @ghord While I think the use of attributes to guide the code generation process could work (it's worked well for PostSharp, for example), I'd love to see a more general solution that isn't directly tied to a specific language syntax or feature. That's why I mentioned being able to apply analyzers and code fixes as a possible approach.

The way I would envision this working is that the compiler would be supplied with a list of analyzers and code fixes to automatically apply just before actual compilation. It would work as if the user had manually gone through the code and applied all of the specified code fixes by hand before compiling.

Benefits

I suspect that this could be achieved with a minimal amount of changes to existing Roslyn, at least functionality-wise (though it may take some serious refactoring - I have no idea). Of course, the compiler would need a mechanism for specifying the analyzers and code fixes and applying them during compilation. Note the following:

There would also be some synergy with this approach between existing authors of conventional analyzers and code fixes and those intended to be used for code generation. Existing code fixes could also be adapted or possibly applied wholesale during the code generation stage (if specified). The tooling and process would be the same so skills could be leveraged for either.

Challenges

I do see the following questions or complications with this approach:

An example that uses attributes

Getting back to the use of attributes, one of the big examples of this approach that I've been thinking about is using it to build out a full AOP framework similar to what PostSharp does. In this case, an analyzer would be written that looks for the presence of specific attributes as defined in a referenced support library. When it finds them, it would output diagnostics that a code fix would then act on. The code fix would then apply whatever code generation is appropriate for the attribute.

My favorite PostSharp aspects is OnMethodBoundaryAspect, which allows you to execute code defined in your aspect attribute class before method entry and after method exit. Something similar could be constructed by having a code fix inject calls to methods contained in a class derived from a specific attribute for any method that has said attribute applied to it.

You could potentially build up an entire AOP framework by creating analyzers and code fixes that act on pre-defined attributes and their derivatives. The point, though, is that you wouldn't have to. The code generation capability could be as flexible and general as analyzers and code fixes themselves, which because they directly manipulate the syntax tree can do just about anything.

MgSam commented 9 years ago

Very happy to see a proposal for this on the table. Are attributes how you envision applying a Code Injector? I think specifying the syntax for applying them is needed in the proposal.

Is the idea that CodeInjections all take place prior to build so that you can see and possibly interact with the members it generates? If so, I think being able to interact with the generated code is a another huge benefit that you should mention in your proposal. When using PostSharp, anything you have it generate doesn't exist until build time, so you can't reference any of it in your code.

@ghord The problem with your proposal on ordering is that you might not define the injections in the same place. For example, you could have an code-injection attribute on a class NotifyPropertyChanged and then a code-injection attribute on a method Log. Which one should be applied first? I think you need a way of explicitly specifying an overall ordering when you invoke a CodeInjector (probably just an integer).

paulomorgado commented 9 years ago

Using a property on an attribute to specify order is not new to the framework:

DataMemberAttribute.Order Property

But I think that, if order is important, then there's something wrong.

Notifying property change is something that is expected form the consumers of an object, not that the object expects itself. So, as long as it's done, the order doesn't matter.

Logging is the same thing. If you want to log the notification, than that is not logging the object but logging the notification extension.

Is there any compeling example where one extension influences the other and order matters and it can still be considered a good architecture?

MgSam commented 9 years ago

@paulomorgado Yes, there are any number of use cases. For example, you want to have some authorization code run before some caching code. PostSharp has several documentation pages about ordering.

MrJul commented 9 years ago

Rather than ordering at use site, why not let injectors specify their dependencies using something akin to those PostSharp attributes @MgSam is linking to? (Or OrderAttribute in VS.) Depending on the order of the attributes at the use site seems very brittle to me and prevent using them at different scopes.

ghord commented 9 years ago

There are some issues with attributes which we will have to overcome for this to work:

  1. Partial classes and members in separate files: attributes from which part of the class/member have the priority?
  2. Assembly attributes: attributes from which file have the priority?

We could make the order alphabetical according to file names. I'm pretty sure that in 99% cases the order won't matter, but leaving undefined behavior such as this in the language is very dangerous - application could crash or not depending on the applying order of transformations.

paulomorgado commented 9 years ago

@ghord, what in this proposal influences assembly attributes?

Inverness commented 9 years ago

I think code generation support for the compiler would be fantastic. I'd love to be able to do something similar to what PostSharp provides.

PostSharp's more limited free version, and the requirement to submit an application to get an open source project license makes me unwilling to look at it for anything but larger projects at work that we would invest money in.

I'd like to be able to have great AOP tools for everyday/hobby projects without additional hassle.

@daveaglick For debugging, if code generation is only happening after things are sent to the compiler, wouldn't inserting line directives into the syntax tree preserve the integrity of the debugging experience?

I did make a syntax tree rewriter for Roslyn to implement a simple method boundary aspect. I used a Roslyn fork to get this hooked in during compile time. Line directives ensured there was no issue with debugging. It was an interesting experience and an example of something I'd like to be able to do without jumping through hoops.

One issue I had though was the fact that I was working at the syntax tree stage deprived me of information that was needed from the bound tree stage. Is there a way to know about type relationship information at this point? When you see an attribute on a class how will you know that it subclasses MethodBoundaryAspect or whatever?

AdamSpeight2008 commented 9 years ago

Is this like F#'s type providers?

mattwarren commented 9 years ago

It's great to see this being proposed, I remember asking a while back if this was being considered.

I think that it should be possible to have the modified source code written out to a temp folder, to make debugging easier. Either by default or controllable via a flag.

I also think that having to apply an attribute to the parts of the source code that can be re-written is a nice idea as it makes the feature less magical and it's easier reason about.

mattwarren commented 9 years ago

@AdamSpeight2008 I don't think so, I see this feature more as a compiler step that lets you modify code before it's compiled. But crucially this isn't meant to be seen by the person who wrote the code, it happens in the background when the compiler runs.

My understanding of type providers is that they integrate more into the IDE and help you when you are writing code that works against a particular data source (by providing intellisense, generate types that match the contents of a live database, etc)

Antaris commented 8 years ago

Hi Team,

Just thought I'd share my thoughts regarding this, coming from a DNX/ICompileModule background.

Since the introduction of ICompileModule it enable me a few avenues for tackling some common runtime-problems as part of the compilation process. The key areas I've used compile modules so far include:

In all instances this has allowed me to improve the runtime experience by reducing the moving parts of my application. In hindsight, not having access to compile modules through the early evolution of DNX was perhaps best, as now I've come to depend/expect a certain level of functionality, because I now know what I have already acheived, so obviously my expectation of the replacement to ICompileModule is now set quite high.

I've already taken the approach of branching my code and implementing a reflection-based alternative to all functionalities I've mentioned above, but obviously I'd still prefer to tackle these sorts of tasks at compile time, because I want to make my framework as performant as possible. I've done this because when ASP.NET 5 ships, it would have migrated from DNX through to dotnet CLI, and therefore until there is metaprogramming support for Roslyn, I have to provide an alternative.

Broadly summing up, what I'd like from Roslyn's implementation for metaprogramming is:

My last point really relates to how ICompileModule instances are currently configured (and this may be more of a dotnet CLI issue rather than Roslyn), currently you have to drop code files at compiler/preprocess and a dynamic preprocess assembly is generated and compiled ahead of main compilation, referenced and executed. It would be a nice experience through the project.json file to simply have a key/value pair, something like precompile: [<identifier-or-path-to-module>]. Again, that's probably more of a CLI issue.

cc @davidfowl

fubar-coder commented 8 years ago

It would be great if we could enable those extensions using NuGet packages.

Inverness commented 8 years ago

ICompileModule seems to be a worthwhile standard to adopt from DNX.

alrz commented 8 years ago
            context.AddCompilationUnit($@”partial class {context.Symbol.Name} 
                      {{ public const string ClassName = “”{context.Symbol.Name}””; }}”);

So we have to write a HUGE "interpolated string", without real-time compiler verification, autocompletion, etc? What about typesafe, debuggable macros?

AdamSpeight2008 commented 8 years ago

@alrz What if we combine this, with T4 and #174

template ComplilationUnit foo ( SymbolContext context ) 
{
  partial class <#= context.Symbol.name #>
  {
    public const string ClassName = "<#= context.Symbol.Name #>" ; 
  }
}
alrz commented 8 years ago

@AdamSpeight2008 That would be nice actually, I don't know why macros are off the table per @gafter's comment,

we do not want to add a macro system to C# of VB.net.

But this one's a little horrible.

AdamSpeight2008 commented 8 years ago

Why isn't this also considered a macro system?

alrz commented 8 years ago

A "macro system" would make implementing this kind of libraries a piece of cake which is not limited to that case (REST APIs).

AdamSpeight2008 commented 8 years ago

When the compiler in instructed to compile the source code you wrote, code injectors are given a chance to exam your code and add new code that gets compiled in along with it.

This proposal is also "injecting" at compile-time, so what is the difference? @alrz In my addition to this proposal it have the "template" (or injected code) be a separate construct, ie it isn't a string. ( #174 see comment 5 )

mattwar commented 8 years ago

@alrz you don't have to write a huge interpolated string, you just have to produce a string and pass it to this API. It would probably be perfectly fine to use T4 or some other template engine to do this. You could also use Roslyn syntax building API's.

AdamSpeight2008 commented 8 years ago

@alrz and @mattwar see #8065.

alrz commented 8 years ago

@mattwar It'll become huge and error-prone if you actually use this feature. If we have to fallback to T4 or some other template engine what's the the point of this proposal then? It provides Roslyn API sure, but the fact that we have to build up a "string" for injecting code doesn't feel right.

@AdamSpeight2008 I'm not against of macros, I think they would be more powerful and safer to use compared to interpolated string injection!

mattwar commented 8 years ago

@AdamSpeight2008 Something like that is certainly interesting, but outside the scope of this proposal.

mattwar commented 8 years ago

@alrz You have to at some point produce text for the compiler to consume. That's what a template like T4 is doing. This proposal is not tied to any particular means of producing that text.

alrz commented 8 years ago

@mattwar If so, what's the advantage of this feature over macros (which seemingly have been dismissed in favor of this proposal)?

AdamSpeight2008 commented 8 years ago

Just to be clear #8065 is not a macro system, it is just a way of producing the "code / expression" string. (True a macro system could be built using them)

mattwar commented 8 years ago

@alrz I don't think macros are related to this proposal.

vbcodec commented 8 years ago

For AOP, this is pretty much wrong idea. Devs want to write code, instead provide string with code to some internals within compilers. Roslyn is big and complicated tool, and average dev won't write AOP code this way. Beside of this, there is no way to debug this provided code.

5292 is much better

m0sa commented 8 years ago

+1 for DNX style preprocessing

The dotnet cli currently just ignores the preprocess sources - https://github.com/dotnet/cli/issues/1812

mattwar commented 8 years ago

@vbcodec when you refer to devs do you mean the devs writing the aspects or the devs using the aspects? Why do you think you won't be able to debug the code injectors? You certainly will be able to debug them since they are just dotnet libraries. So, for example, you can debug the compiler and step through the individual code injectors. Or are you worried you won't be able to debug the injected code, because you can do that too. The injected code will appear as source that the debugger can find. The debugger will be able to step though and set break points in that as well. Which is a far cry better than most existing AOP solutions.

m0sa commented 8 years ago

As mentioned in https://github.com/dotnet/cli/issues/1812#issuecomment-196079253, this feature is going be the underlining provider for aspnet core view precompilation. Because of this fact, I think this should be high priority. Aspnet core won't see any serious usage without view precompilation, simply because it promotes compile-time errors from view files to run-time errors.

Inverness commented 8 years ago

I'd like some clarification as to whether or not the code generators feature being worked on will allow existing code to be replaced entirely, as opposed to only adding new code using partial classes and the new replace keyword.

I see SourceGeneratorContext on the features/generator branch only offers AddCompilationUnit() for mutation.

cston commented 8 years ago

@Inverness The approach in the features/generator branch allows replacing a method implementation using replace. The original method is still part of compilation, but not callable outside of the replacement.

Inverness commented 8 years ago

@cston Yes I looked over the code and saw this. There are two things I'd like clarified:

Does this work for other type members like fields and properties?

Do these code generators function independent of Visual Studio? Specifically, can I run the compiler from command line and have code generators be applied? I assume this is the case for build server scenarios.

m0sa commented 8 years ago

And also, does it work for replacing whole classes, not just their members?

mattwar commented 8 years ago

@m0sa the code generators just add new source files to the compilation. The supersedes (now replace/original) language feature allows you to replace existing members in a class from a declaration in the same class (though realistically its from a separate partial declaration of the same class.)

davidfowl commented 8 years ago

@marttwar can you add more than just source files?

mattwar commented 8 years ago

@davidfowl what kind of files would you like to add?

cston commented 8 years ago

@Inverness It should be possible to replace methods, properties, and event accessors. Other type members can be added but not replaced. And generators would be executed by the command-line compilers.

davidfowl commented 8 years ago

@mattwar adding resources (replacing resgen), adding/removing references.

Inverness commented 8 years ago

@cston That is useful for typical code gen scenarios, but why limit code generation to that? Simply allow the Compilation instance to be replaced. This would allow both the editing of existing syntax trees, and the editing of references as @davidfowl suggested.

jods4 commented 8 years ago

I'm probably going a little crazy, but I would consider using this to replace C# features with efficient codegen, in particular LINQ and string interpolation.

There are (several) issues on Github about the perf of these features, but they are hard to optimize in general because of edge cases in their specs that they have to support (e.g. interpolation has to handle the culture properly and formats everything, including strings).

With such a facility in place, the developer could opt-in optimized codegen at specific points.

m0sa commented 8 years ago

@jods4 all of this you can already do, if you run roslyn "from code", like in, wire up everything on your own, call the emit API yourself, if you really want to. At Stack Overflow we do (using StackExchange.Precompilation), because we really really want to do as much as possible at compile-time. For example we have compilation hooks, that bake localized strings into the assembly, doing additional optimizations when we detect that a parametrized string (we have something similar to string interpolation, but with additional localization features, like pluralization based on the value of multiple numerical tokens, and markdown formatting) is used in a razor view, which is pretty straightforward as soon as you have the semantic model. There, we avoid allocating the full string by directly calling WebViewPage.Write on it's tokens. The concept is the same as in my blog post where I discuss how to replace StringBuilder.Append($"...") calls with StringBuilder.Format.

The actual problem here is, that there's no streamlined interface for such hooks in the command-line & dotnet CLI tools. I really hope that at the end of this, we get something akin to analyzers in terms of pluggability, but with the power to modify the Compilation before it hits Emit.

Here are some open points that I think should be considered:

I'm OK with not solving the question above, a low-level thing we can just hook in and replace the Compilation would do just fine for starters. We can always add nice APIs on top of it at a later time.

jods4 commented 8 years ago

@m0sa Thanks! That's interesting and useful. As you noted, today setting this up is complicated. Where I work we would not accept that, at least not on "normal" projects. Being able to do it as easily as adding analyzers to your solution (i.e. pulling a Nuget package) would drastically change the game.

What we do instead is that we avoid C# productivity features on hot paths. LINQ can be handwritten code, same for string interpolation. It's sad to pass on those but what we loose in productivity and readability we win on perf.

Of course, your needs seem much more stringent than ours!

Vannevelj commented 8 years ago

@m0sa Regarding your first remark "One of the big caveats is, that as soon as you modify a SourceTree, the SemanticModel you had becomes invalid": is this not solved by the DocumentEditor which allows for multiple modifications to a single Document? Or how does this not apply to your scenario?