Closed mattwar closed 7 years ago
Looking forward to seeing where this discussion goes. The last time I remember metaprogramming being discussed, or more generally compile-time hooks, it sounded like the team wanted a little more time to see how things shook out (https://github.com/dotnet/roslyn/issues/98#issuecomment-71780597). Hopefully now that it's 8 months later and the library and tooling is in the wild it's a good time to revisit.
I personally really like the idea of using code diagnostics and fixes to drive this functionality. We already have great tooling around developing and debugging them, and Roslyn already knows how to apply them. Once developed, there's also already a mechanism for explicitly applying them to target code to see how they'll look.
I can envision a variety of use cases for this concept, from simple one-off fixes to things like entire AOP frameworks based on applying code fixes in the presence of attributes that guide their behavior.
I wrote the code generation portions for the C# targets of ANTLR 3 and ANTLR 4 using a pattern similar to XAML. Two groups will shudder at this statement: developers working on MSBuild itself, and the ReSharper team. For end users working with Visual Studio, the experience is actually quite good. There are some interesting limitations in the current strategy.
It is possible for C# code to reference types and members defined in the files which are generated from ANTLR grammars. In fact, from the moment the rules are added to the grammar (even without saving the file), the C# IntelliSense engine is already aware of the members which will be generated for these rules.
However, the code generation step itself cannot use information from the other C# files in the project. Fortunately for ANTLR, we don't need the ability to do this because the information required to generate a parser is completely contained within the grammar files.
The specific manner in which a code generator integrates with the IntelliSense engine (using the XAML generator pattern) is undocumented. This led to a complete lack of support for the code completion functionality described in the previous section in other IDEs and even in ReSharper.
Let's deal with the problem of undefined order of such transformations. In normal OO code, this pattern is conceptually similar to decorator pattern. Take a look at this code:
var logger = new CachingLogger(new TimestampedLogger(new ConsoleLogger()));
vs
var logger = new TimestampedLogger(new CachingLogger(new ConsoleLogger()));
The nice thing is that we have to manually specify the order of wrapping the class. This seems like the most obvious answer: Let the programmer specify the order.
I think this would be simple to do cleanly with attributes which would be somehow wired up to cause this transformations
[INoifyPropertyChanged, LogPropertyAccess]
public class ViewModel
{
//gets change notification and logging
public int Property { get; set; }
}
They could be specified at assembly/class/member level to represent all kinds of transformation scope.
The problem is that until now, attributes serve as metadata only option - now instead they modify source code. Maybe user defined class modifiers would be better:
public notify logged class ViewModel
{
public int Property { get; set; }
}
Where notify
and logged
are user defined somehow.
@ghord That's a good idea. If we limit the code generators to only working on symbols with custom attributes explicitly specified, we can order the code generators by the order of the attributes as specified in the source.
@mattwar @ghord While I think the use of attributes to guide the code generation process could work (it's worked well for PostSharp, for example), I'd love to see a more general solution that isn't directly tied to a specific language syntax or feature. That's why I mentioned being able to apply analyzers and code fixes as a possible approach.
The way I would envision this working is that the compiler would be supplied with a list of analyzers and code fixes to automatically apply just before actual compilation. It would work as if the user had manually gone through the code and applied all of the specified code fixes by hand before compiling.
I suspect that this could be achieved with a minimal amount of changes to existing Roslyn, at least functionality-wise (though it may take some serious refactoring - I have no idea). Of course, the compiler would need a mechanism for specifying the analyzers and code fixes and applying them during compilation. Note the following:
DiagnosticSeverity
is needed to indicate potential code generation, or maybe a DiagnosticSeverity
of Hidden
could be used with some other indication. Regardless, diagnostics can be used to identify where code generation should take place.There would also be some synergy with this approach between existing authors of conventional analyzers and code fixes and those intended to be used for code generation. Existing code fixes could also be adapted or possibly applied wholesale during the code generation stage (if specified). The tooling and process would be the same so skills could be leveraged for either.
I do see the following questions or complications with this approach:
.config
file (or equivalent)..pdb
or other debugging artifacts can still trace back to the original code?Getting back to the use of attributes, one of the big examples of this approach that I've been thinking about is using it to build out a full AOP framework similar to what PostSharp does. In this case, an analyzer would be written that looks for the presence of specific attributes as defined in a referenced support library. When it finds them, it would output diagnostics that a code fix would then act on. The code fix would then apply whatever code generation is appropriate for the attribute.
My favorite PostSharp aspects is OnMethodBoundaryAspect
, which allows you to execute code defined in your aspect attribute class before method entry and after method exit. Something similar could be constructed by having a code fix inject calls to methods contained in a class derived from a specific attribute for any method that has said attribute applied to it.
You could potentially build up an entire AOP framework by creating analyzers and code fixes that act on pre-defined attributes and their derivatives. The point, though, is that you wouldn't have to. The code generation capability could be as flexible and general as analyzers and code fixes themselves, which because they directly manipulate the syntax tree can do just about anything.
Very happy to see a proposal for this on the table. Are attributes how you envision applying a Code Injector? I think specifying the syntax for applying them is needed in the proposal.
Is the idea that CodeInjections all take place prior to build so that you can see and possibly interact with the members it generates? If so, I think being able to interact with the generated code is a another huge benefit that you should mention in your proposal. When using PostSharp, anything you have it generate doesn't exist until build time, so you can't reference any of it in your code.
@ghord The problem with your proposal on ordering is that you might not define the injections in the same place. For example, you could have an code-injection attribute on a class NotifyPropertyChanged
and then a code-injection attribute on a method Log
. Which one should be applied first? I think you need a way of explicitly specifying an overall ordering when you invoke a CodeInjector (probably just an integer).
Using a property on an attribute to specify order is not new to the framework:
DataMemberAttribute.Order Property
But I think that, if order is important, then there's something wrong.
Notifying property change is something that is expected form the consumers of an object, not that the object expects itself. So, as long as it's done, the order doesn't matter.
Logging is the same thing. If you want to log the notification, than that is not logging the object but logging the notification extension.
Is there any compeling example where one extension influences the other and order matters and it can still be considered a good architecture?
@paulomorgado Yes, there are any number of use cases. For example, you want to have some authorization code run before some caching code. PostSharp has several documentation pages about ordering.
Rather than ordering at use site, why not let injectors specify their dependencies using something akin to those PostSharp attributes @MgSam is linking to? (Or OrderAttribute in VS.) Depending on the order of the attributes at the use site seems very brittle to me and prevent using them at different scopes.
There are some issues with attributes which we will have to overcome for this to work:
We could make the order alphabetical according to file names. I'm pretty sure that in 99% cases the order won't matter, but leaving undefined behavior such as this in the language is very dangerous - application could crash or not depending on the applying order of transformations.
@ghord, what in this proposal influences assembly attributes?
I think code generation support for the compiler would be fantastic. I'd love to be able to do something similar to what PostSharp provides.
PostSharp's more limited free version, and the requirement to submit an application to get an open source project license makes me unwilling to look at it for anything but larger projects at work that we would invest money in.
I'd like to be able to have great AOP tools for everyday/hobby projects without additional hassle.
@daveaglick For debugging, if code generation is only happening after things are sent to the compiler, wouldn't inserting line directives into the syntax tree preserve the integrity of the debugging experience?
I did make a syntax tree rewriter for Roslyn to implement a simple method boundary aspect. I used a Roslyn fork to get this hooked in during compile time. Line directives ensured there was no issue with debugging. It was an interesting experience and an example of something I'd like to be able to do without jumping through hoops.
One issue I had though was the fact that I was working at the syntax tree stage deprived me of information that was needed from the bound tree stage. Is there a way to know about type relationship information at this point? When you see an attribute on a class how will you know that it subclasses MethodBoundaryAspect or whatever?
Is this like F#'s type providers?
It's great to see this being proposed, I remember asking a while back if this was being considered.
I think that it should be possible to have the modified source code written out to a temp folder, to make debugging easier. Either by default or controllable via a flag.
I also think that having to apply an attribute to the parts of the source code that can be re-written is a nice idea as it makes the feature less magical and it's easier reason about.
@AdamSpeight2008 I don't think so, I see this feature more as a compiler step that lets you modify code before it's compiled. But crucially this isn't meant to be seen by the person who wrote the code, it happens in the background when the compiler runs.
My understanding of type providers is that they integrate more into the IDE and help you when you are writing code that works against a particular data source (by providing intellisense, generate types that match the contents of a live database, etc)
Hi Team,
Just thought I'd share my thoughts regarding this, coming from a DNX/ICompileModule
background.
Since the introduction of ICompileModule
it enable me a few avenues for tackling some common runtime-problems as part of the compilation process. The key areas I've used compile modules so far include:
IModule
interface, and generate at compile time a ModuleProvider
which was given a static list of types. I still maintain the ability to reference new modules, and then through compilation that dynamic set of modules becomes a static set of modules. This was achieved by doing this workflow
DbContext
composition - Because the framework I am creating is modular by design, so to must the DbContext
stuff. I've done a considerable amount of customisation of the EF7 stack to enable better support for modular DbContext
, including support for multiple contexts with shared entities, and baking in support for cross-database navigation properties. This enables modules to define a DbContext
, add appropriate DbSet<Entity>
properties, regardless of the source module. A ICompileModule
is provided to:
DbSet<Entity>
properties of a context, and for each of those entities:DbContext
, I then search for appropriate instances of an IEntityConfiguration<Entity>
type for the given entity, which wraps up the IModelBuilder
grunt work in isolation. This enables me to compose the configuration of a DbContext
at compile time, thus this is one of the foundation aspects of my modular DbContext
approach.DbContext
work, the standard EF migrations wouldn't work because they are not designed with multi-tenancy in mind. So, I've had to roll my own for migrations, and again, using an ICompileModule
, I was able to look through the items marked as resources at compiler/resources/data/<version>/**.sql
and generate a class at compile time that provided a descriptor of a versioned migration, e.g. it would generate a (potential) series of classes like MigrationToV1_0_0
, MigrationToV1_0_1
and a custom build DNX command would allow me to deploy those migrations against a target.In all instances this has allowed me to improve the runtime experience by reducing the moving parts of my application. In hindsight, not having access to compile modules through the early evolution of DNX was perhaps best, as now I've come to depend/expect a certain level of functionality, because I now know what I have already acheived, so obviously my expectation of the replacement to ICompileModule
is now set quite high.
I've already taken the approach of branching my code and implementing a reflection-based alternative to all functionalities I've mentioned above, but obviously I'd still prefer to tackle these sorts of tasks at compile time, because I want to make my framework as performant as possible. I've done this because when ASP.NET 5 ships, it would have migrated from DNX through to dotnet CLI, and therefore until there is metaprogramming support for Roslyn, I have to provide an alternative.
Broadly summing up, what I'd like from Roslyn's implementation for metaprogramming is:
IAssemblySymbol
from references.My last point really relates to how ICompileModule
instances are currently configured (and this may be more of a dotnet CLI issue rather than Roslyn), currently you have to drop code files at compiler/preprocess
and a dynamic preprocess assembly is generated and compiled ahead of main compilation, referenced and executed. It would be a nice experience through the project.json
file to simply have a key/value pair, something like precompile: [<identifier-or-path-to-module>]
. Again, that's probably more of a CLI issue.
cc @davidfowl
It would be great if we could enable those extensions using NuGet packages.
ICompileModule seems to be a worthwhile standard to adopt from DNX.
context.AddCompilationUnit($@”partial class {context.Symbol.Name} {{ public const string ClassName = “”{context.Symbol.Name}””; }}”);
So we have to write a HUGE "interpolated string", without real-time compiler verification, autocompletion, etc? What about typesafe, debuggable macros?
@alrz What if we combine this, with T4 and #174
template ComplilationUnit foo ( SymbolContext context )
{
partial class <#= context.Symbol.name #>
{
public const string ClassName = "<#= context.Symbol.Name #>" ;
}
}
@AdamSpeight2008 That would be nice actually, I don't know why macros are off the table per @gafter's comment,
we do not want to add a macro system to C# of VB.net.
But this one's a little horrible.
Why isn't this also considered a macro system?
A "macro system" would make implementing this kind of libraries a piece of cake which is not limited to that case (REST APIs).
When the compiler in instructed to compile the source code you wrote, code injectors are given a chance to exam your code and add new code that gets compiled in along with it.
This proposal is also "injecting" at compile-time, so what is the difference? @alrz In my addition to this proposal it have the "template" (or injected code) be a separate construct, ie it isn't a string. ( #174 see comment 5 )
@alrz you don't have to write a huge interpolated string, you just have to produce a string and pass it to this API. It would probably be perfectly fine to use T4 or some other template engine to do this. You could also use Roslyn syntax building API's.
@alrz and @mattwar see #8065.
@mattwar It'll become huge and error-prone if you actually use this feature. If we have to fallback to T4 or some other template engine what's the the point of this proposal then? It provides Roslyn API sure, but the fact that we have to build up a "string" for injecting code doesn't feel right.
@AdamSpeight2008 I'm not against of macros, I think they would be more powerful and safer to use compared to interpolated string injection!
@AdamSpeight2008 Something like that is certainly interesting, but outside the scope of this proposal.
@alrz You have to at some point produce text for the compiler to consume. That's what a template like T4 is doing. This proposal is not tied to any particular means of producing that text.
@mattwar If so, what's the advantage of this feature over macros (which seemingly have been dismissed in favor of this proposal)?
Just to be clear #8065 is not a macro system, it is just a way of producing the "code / expression" string. (True a macro system could be built using them)
@alrz I don't think macros are related to this proposal.
For AOP, this is pretty much wrong idea. Devs want to write code, instead provide string with code to some internals within compilers. Roslyn is big and complicated tool, and average dev won't write AOP code this way. Beside of this, there is no way to debug this provided code.
+1 for DNX style preprocessing
The dotnet cli currently just ignores the preprocess sources - https://github.com/dotnet/cli/issues/1812
@vbcodec when you refer to devs do you mean the devs writing the aspects or the devs using the aspects? Why do you think you won't be able to debug the code injectors? You certainly will be able to debug them since they are just dotnet libraries. So, for example, you can debug the compiler and step through the individual code injectors. Or are you worried you won't be able to debug the injected code, because you can do that too. The injected code will appear as source that the debugger can find. The debugger will be able to step though and set break points in that as well. Which is a far cry better than most existing AOP solutions.
As mentioned in https://github.com/dotnet/cli/issues/1812#issuecomment-196079253, this feature is going be the underlining provider for aspnet core view precompilation. Because of this fact, I think this should be high priority. Aspnet core won't see any serious usage without view precompilation, simply because it promotes compile-time errors from view files to run-time errors.
I'd like some clarification as to whether or not the code generators feature being worked on will allow existing code to be replaced entirely, as opposed to only adding new code using partial classes and the new replace keyword.
I see SourceGeneratorContext on the features/generator branch only offers AddCompilationUnit() for mutation.
@Inverness The approach in the features/generator branch allows replacing a method implementation using replace
. The original method is still part of compilation, but not callable outside of the replacement.
@cston Yes I looked over the code and saw this. There are two things I'd like clarified:
Does this work for other type members like fields and properties?
Do these code generators function independent of Visual Studio? Specifically, can I run the compiler from command line and have code generators be applied? I assume this is the case for build server scenarios.
And also, does it work for replacing whole classes, not just their members?
@m0sa the code generators just add new source files to the compilation. The supersedes (now replace/original) language feature allows you to replace existing members in a class from a declaration in the same class (though realistically its from a separate partial declaration of the same class.)
@marttwar can you add more than just source files?
@davidfowl what kind of files would you like to add?
@Inverness It should be possible to replace
methods, properties, and event accessors. Other type members can be added but not replaced. And generators would be executed by the command-line compilers.
@mattwar adding resources (replacing resgen), adding/removing references.
@cston That is useful for typical code gen scenarios, but why limit code generation to that? Simply allow the Compilation instance to be replaced. This would allow both the editing of existing syntax trees, and the editing of references as @davidfowl suggested.
I'm probably going a little crazy, but I would consider using this to replace C# features with efficient codegen, in particular LINQ and string interpolation.
There are (several) issues on Github about the perf of these features, but they are hard to optimize in general because of edge cases in their specs that they have to support (e.g. interpolation has to handle the culture properly and formats everything, including strings).
With such a facility in place, the developer could opt-in optimized codegen at specific points.
String.Concat
code. When possible, interpolation could be replaced by a constant string at compile-time. (Those optimizations have to be opt-in because they don't faithfully implement the spec.)@jods4 all of this you can already do, if you run roslyn "from code", like in, wire up everything on your own, call the emit API yourself, if you really want to. At Stack Overflow we do (using StackExchange.Precompilation), because we really really want to do as much as possible at compile-time. For example we have compilation hooks, that bake localized strings into the assembly, doing additional optimizations when we detect that a parametrized string (we have something similar to string interpolation, but with additional localization features, like pluralization based on the value of multiple numerical tokens, and markdown formatting) is used in a razor view, which is pretty straightforward as soon as you have the semantic model. There, we avoid allocating the full string by directly calling WebViewPage.Write
on it's tokens. The concept is the same as in my blog post where I discuss how to replace StringBuilder.Append($"...")
calls with StringBuilder.Format
.
The actual problem here is, that there's no streamlined interface for such hooks in the command-line & dotnet CLI tools. I really hope that at the end of this, we get something akin to analyzers in terms of pluggability, but with the power to modify the Compilation
before it hits Emit
.
Here are some open points that I think should be considered:
SourceTree
, the SemanticModel
you had becomes invalid. The way we worked around it, was by calculating all the modifications first in form of TextChange
s, and then applying them in batch via SourceText.WithChanges
. It would be nice to have an API around that.#line
directives. I would be nice to get such support out of the box from the API aboveI'm OK with not solving the question above, a low-level thing we can just hook in and replace the Compilation
would do just fine for starters. We can always add nice APIs on top of it at a later time.
@m0sa Thanks! That's interesting and useful. As you noted, today setting this up is complicated. Where I work we would not accept that, at least not on "normal" projects. Being able to do it as easily as adding analyzers to your solution (i.e. pulling a Nuget package) would drastically change the game.
What we do instead is that we avoid C# productivity features on hot paths. LINQ can be handwritten code, same for string interpolation. It's sad to pass on those but what we loose in productivity and readability we win on perf.
Of course, your needs seem much more stringent than ours!
@m0sa Regarding your first remark "One of the big caveats is, that as soon as you modify a SourceTree, the SemanticModel you had becomes invalid": is this not solved by the DocumentEditor
which allows for multiple modifications to a single Document
? Or how does this not apply to your scenario?
Often when writing software, we find ourselves repeatedly typing similar logic over and over again, each time just different enough from the last to make generalizing it into an API impractical. We refer to this type of code as boilerplate, the code we have to write around the actual logic we want to have, just to make it work with the language and environment that we use.
One way of avoiding writing boilerplate code, is to have the computer generate it for us. After all, computers are really good at that sort of thing. But in order for the computer to generate code for us it has to have some input to base it on. Typical code generators are design-time tools that we work with outside of our codebase, that generate source that we include with it. These tools usually prefer their input to be XML or JSON files that we either manipulate manually or have some WSIWYG editor that lets us drag, drop and click it into existence. Other tools are build-time, that get run by our build system just before our project is built, but they too are driven by external inputs like XML and JSON files that we must manipulate separately from our code.
These solutions have their merits, but they are often intrusive, requiring us to structure our code in particular ways that allow the merging of the generated code to work well with what we’ve written. The biggest drawback, is that these tools require entire facets of our codebase to be defined in another language outside of the code we use to write our primary logic.
Some solutions, like post-build rewriters, do a little better in this regard, because they operate directly on the code we’ve written, adding new logic into the assembly directly. However, they too have their drawbacks. For instance, post-build rewriters can never introduce new types and API’s for our code to reference, because they come too late in the process. So they can only change the code we wrote to do something else. Even worse, assembly rewriters are very difficult to build because they must work at the level of the IL or assembly language, doing the heavy lifting to re-derive the context of our code that was lost during compilation, and to generate new code as IL and metadata without the luxury of having a compiler to do it. For most folks, choosing this technique to build tools to reduce boilerplate code is typically a non-starter.
Yet the biggest sin of all, is that all of these solutions require us to manipulate our nearly unfathomable build system, and in fact requires us to have a build system in the first place, and who really wants to do that. Am I Right?
Proposal: Code Injectors
Code injectors are source code generators that are extensions to the compiler you are using, as you are using it. When the compiler in instructed to compile the source code you wrote, code injectors are given a chance to exam your code and add new code that gets compiled in along with it.
When you type your code into an editor or IDE, the compiler can be engaged to provide feedback that includes the new code added by the code generators. Thus, it is possible to have the compiler respond to your work and introduce new code as you type that you can directly make use of.
You write a code injector similarly to how you write a C# and VB diagnostic analyzer today. You may choose to think of code injectors as analyzers that instead of reporting new diagnostics after examining the source code, augment the source code by adding new declarations.
You define a class in an assembly that gets loaded by the compiler when it is run to compile your code. This could easily be the same assembly you have used to supply analyzers. This class is initialized by the compiler with a context that you can use to register callbacks into your code when particular compilation events occur.
For example, ignoring namespaces for a moment, this contrived code injector gives every class defined in source a new constant field called ClassName that is a string containing the name of the class.
This works because of the existence of the C# and VB partial class language feature.
Of course, not all code injectors need to be in the business of adding members to the classes you wrote, or especially not adding members to all the classes you wrote indiscriminately. Code injectors can add entirely new declarations, new types and API’s that are meant to simply be used by your code, not to modify your code.
Yet, the prospect of having code injectors modify the code you wrote enables many compelling scenarios that wouldn’t be possible otherwise. A companion proposal for the C# and VB languages #5292 introduces a new feature that makes it possible to have code generators not only add new declarations/members to your code, but also to augment the methods and properties you wrote too.
Now, you can get rid of boilerplate logic like all that INotifyPropertyChanged code you need just to make data binding work. (Or is this that so last decade that I need a better example?)
Subjects not covered in this proposal but open for discussion too