Reboot build chain codegen

mgravell commented 7 years ago

There have been lots of thoughts of compilation extensibility points for .NET, including "dnx compile modules", "roslyn generators", etc. The reality is: due to various factors including complexity, changing needs (IDE support? just the compiler chain?), and time pressures (getting vWhatever shipped), the result of this is: nothing has changed. There's a charplang feature that is going nowhere, and the build-time options are all closed. It is acknowledged that the IDE side of things (i.e. the current hypothetical approach) is probably prohibitively expensive, so: that almost certainly won't happen.

It is also perceived as a niche feature, which subtracts from the perceived need.

My aim, here, is to challenge the line above. While I fully agree that it is a minority feature in terms of counting the implementors, I propose that it is not a niche feature in terms of reach. Quite the opposite, especially when we consider that netstandard now happily reaches into a lot of platforms that don't have active meta-programming support (no JIT, AOT-only).

There are any number of scenarios that can benefit here, but due to my projects, the ones that leap most immediately into my mind are things like:

serializers (Json.Net, protobuf-net, etc)
ORM-like tools (EF, dapper, etc)
RPC tools for implementing stubs
UI templating tools (razor, etc)

Additionally, deferring meta-programming to runtime even on those platforms that support it means there is unnecessary overhead at runtime, doing something every startup that could be done once and forgotten. In some cases, this reflection/emit work can be quite considerable. For platforms that don't support it, you have to use reflection workarounds, so you have an entire additional codebase / implementation to support and maintain, plus poor performance.

The point is: just about every single app is going to need either data, or UI - probably both. And every app wants to be performant. The number of implementors of these tools is indeed minimal, but the reach is immense.

The current status-quo leaves poor options. Additionally, with the ever-increasing trend towards async, even on platforms that support runtime emit, it is less and less desirable to use the emit API, as it is tortuous to correctly implement async code in this way. A lot of work has been done in the language to solve the complex problems of async, and it makes perfect sense to use the roslyn chain to do that injection.

How can we pragmatically move forward these needs, without drowning in the big goal complexity that has stalled "generators"? Would rebooting the "compile modules" work be viable?

(also: paging @jaredpar and @davidfowl)

mgravell commented 7 years ago

Ideal scenario for me as a library author:

consumer adds nuget packages and knows nothing about the voodoo
then a miracle occurs
assembly with code injected / mutated by the tool is ejected

then a miracle occurs

The "compile modules" work is presumably a good starting point for a minimal viable product of step 2. Obviously the same miracle should happen for all of dotnet build, msbuild and IDE build. I can see that there might be security issues if step 1 is completely silent, but by the same token: installing a library that does runtime meta-programming has some similar issues and nobody blinks an eye. If that is a concern, I wonder whether it would be reasonable to do something like a warning:

CS{NUMBER} Compiler module {name} discovered; to execute modules, add <compilerModules>true</compilerModules> to the project file {path}

(i.e. kinda similar to "CS0227 Unsafe code may only appear if compiling with /unsafe")

ReubenBond commented 7 years ago

Another for the list of uses for codegen is RPC proxies & stubs, like the GrainReference & IGrainMethodInvoker implementations we have in dotnet/orleans.

Currently, we use a mix of IL and Roslyn, but Roslyn code is much easier to maintain and debug. IL code is a necessary last resort, used for generating serializers/copiers for inaccessible types and fields & optimizing the results of reflection (eg, generating code to call some constructor).

Perform Roslyn-based code generation at runtime adds a considerable amount of time to startup as we need to scan types and perform deep accessibility checks (if we generate code for this type, will it even compile or will the compiler whinge about accessibility?) That's before we even get to generating syntax and invoking the compiler.

In order to generate code at compile time, we have a NuGet package which builds the target assembly, loads it and generates code for it, and then builds it again with that generated code included. That slows down builds and it's not pretty, but the metadata model exposed by reflection is nicer to work with than the one exposed by Roslyn during compilation time (at least the APIs I could find when I wrote those code generators in mid-2015). If it's possible for us to eventually move to a supported mechanism, like compiler modules, then that's very attractive.

Examples of codegen/tooling in Orleans: Roslyn-based: GrainReferenceGenerator.cs, GrainMethodInvokerGenerator.cs, SerializerGenerator.cs IL-Based: ILSerializerGenerator.cs, GrainCasterFactory.cs Tooling: Orleans.SDK.targets

Code generation is hugely important to the .NET ecosystem: it lets us augment the features provided by the language/runtime. As Marc was indicating, features which support codegen will never be touched by the vast majority of users, but nearly all of those users directly benefit from those features via the libraries which they consume.

EDIT: this issue is quite old, and we've evolved our code generation approach since then. We use Roslyn both for consumption (analysis) and production of source. Code generator library: https://github.com/dotnet/orleans/tree/master/src/Orleans.CodeGenerator MSBuild integration: https://github.com/dotnet/orleans/tree/master/src/Orleans.CodeGenerator.MSBuild

mgravell commented 7 years ago

@ReubenBond great example; and indeed your IL code looks very familiar - very comparable to protobuf-net's emit stage. Just a tip for anyone who might need to fight Roslyn (Reuben makes the point that the API is hard at times): roslynquoter is great if you want to figure out the Roslyn code needed for a particular scenario.

ReubenBond commented 7 years ago

I used Roslyn quoter in the beginning, and now LINQPad or VS with their syntax tree displays if I ever need it. It's a verbose API, but it's fine. Consuming syntax trees is harder than generating them, I feel, but maybe I just don't have enough experience with that aspect of roslyn yet. It feels stringly compared to using reflection on compiled code.

sharwell commented 7 years ago

For me, (source) code generation tools can be grouped into two categories:

Those that can represent the generated code strictly as a set of new (generated) files added to the build, without altering the effective content of any non-generated source files in the project
Those that alter code defined in non-generated code, e.g. Code Contracts' handling of Contract.Requires<TException> (this was actually an IL generation tool, but it should still serve as an example)

It's hard for me to tell which of these is more important for the people blocked on this issue.

For tools falling into the first category, it seems a reasonable option is a build-time tool which runs the code generation step prior to invoking the compiler. It aligns closely with the way other code generation tools currently operate, and it's always seemed to work well for ANTLR from both a tooling and an IDE experience perspective (though that wasn't a Roslyn-based generator). I'd be interested to see details on the reasons why this approach doesn't produce the expected results when Roslyn comes into the mix.

Tools falling into the second category can be further broken into two sub-categories: those than can represent the output of the code generation transformation as pure C# code (i.e. code that can be passed to an unaltered compiler such that the compiled binary executes as expected from the code generation tool output), and those that cannot. I have substantial but somewhat different concerns for these sub-categories, but if tools aren't actually falling into these groups then it doesn't seem necessary to go down this path right now.

vbfox commented 7 years ago

I think a good example of working Codegen with IDE support in the current .Net environment is F# type providers.

They cover only 1. of @sharwell classification but they already exists today and work relatively well in the IDEs (one of their currently missing feature is accessing types from the same compilation but there's an RFC).

Writing them is a niche feature for sure, but using them is pretty common and they already cover most database access, and serialization/deserialization needs.

mgravell commented 7 years ago

@sharwell I strongly suspect that "1" would fulfil the 80%+ case - and quite possibly the 98% case if we consider partial class etc. Frankly, that would be a great place to start. It is conceptually similar to what CodeGeneration.Roslyn does - but the point here would be to formalize something in this area, rather than workarounds, hacks, etc.

sharwell commented 7 years ago

@mgravell Suppose Antlr4.net40.targets worked on both MSBuild for desktop and MSBuild for .NET Core (cross platform). The overall behavior of the file could be modified for a different tool, e.g. a Roslyn-based tool, which gets plugged in where Antlr4ClassGenerationTask currently sits. This file already supports incremental builds, debugging, and cleanly integrates with both Visual Studio and ReSharper IntelliSense. Would you perhaps be unblocked with this file as-is (modified to provide the same feature set for a different code generator)? Or perhaps with a modification to work the same way on additional platforms?

mgravell commented 7 years ago

@sharwell I don't really know enough about that tool to offer an opinion. If it would work in a x-plat, x-target, x-tooling way that allows me to get extra code into the build based on analysis of the current code, then I guess "probably"?

sharwell commented 7 years ago

@mgravell It should be possible. You could cover several scenarios from the start by writing your MSBuild task against the new .NET Standard builds of the MSBuild APIs. You could then expand on that support by multi-targeting the build task (to .NET Standard plus MSBuild 14 to cover Visual Studio 2015), and updating the targets file to reference the correct task. The build task is just a wrapper that invokes your code generation tool with the correct input files and options, and feeds information about the outputs back for the rest of the build.

mgravell commented 7 years ago

oh god, I'm going to have to learn how to "do" msbuild, aren't I...

sharwell commented 7 years ago

@mgravell You were already working on IL code generation without API assistance, how bad could it be? :trollface:

In all seriousness though, start with the .targets file I linked and the .props file next to it.

Replace everything Antlr4 with some other name
Remove properties not related to your use, such as Java vendor
Remove the AvailableItemName lines and the ItemDefinitionGroup
Instead of passing @(Antlr4) to the task (all items with the Antlr4 build action), you'll want to pass @(Compile) (all the source code files)

It's probably not a one sitting thing, but it's not horrible either. This build definition has been hardened by many, many years of use in commercial settings.

davidfowl commented 7 years ago

@sharwell that requires a "double compile" no? Which is pretty sucky. What you want is to be handed a roslyn compilation so that tools can just look at that VS rewriting that boilerplate in every build task that needs the same compilation.

sharwell commented 7 years ago

@davidfowl It does (require two "core compilations"), but maybe it's not a problem. The C# compilation core is often only a minority of the overall build time, and the code generation component could reuse much of the same information that is already getting computed for the main build (e.g. the location of references).

davidfowl commented 7 years ago

@sharwell That seems unfortunate (I know emit takes up more time than making the compilation itself). We do 99% of the work with analyzers already, that's why this feels like such a PITA.

jaredpar commented 7 years ago

My aim, here, is to challenge the line above. While I fully agree that it is a minority feature in terms of counting the implementors, I propose that it is not a niche feature in terms of reach.

Don't think this needs to be challenged. Everyone involved with the generators features agrees on this. Else we wouldn't have spent so much time on it.

How can we pragmatically move forward these needs, without drowning in the big goal complexity that has stalled "generators"?

One of the bigger issues is scenarios. There is only a limited amount of time we can spend on language features. Even features with known advantages and well thought out designs get put on the back burner because of more immediate needs.

Take for example ref returns and span<T>. These are features that are well understood, have known / measured benefits, mature designs and available implementations to reference. That's been the case for at least 5+ years now, much longer for ref returns. Yet we're now just getting to them in the core language because we had motivating scenarios to push them to the top: kestrel / pipelines + unity performance.

Generators still have a number of open design issues but we're quite certain it's a costly feature. The actual compiler / language changes are moderate but the developer experience cost is very high. In order to get this going we need some very compelling scenarios.

Also the scenarios need to essentially show why we can't just use existing options: T4 templates, single file generators, etc ... Can a moderate investment there close the gap that full language generators are not needed?

Would rebooting the "compile modules" work be viable?

Compiler modules are harder to create a developer experience around because they could change literally anything about a Compilation (at least in the versions I looked at). The generators feature we worked on for C# 7.0 was very carefully designed to be an augmenting generator to limit the churn a generator could have to the compilation.

Finally due want to take a second to talk about the state of the feature. We put a lot of work into this for C# 7.0 across a number of teams. Eventually we cut it for 7.0 for pragmatic reasons and decided to revisit it during the next major release. Hence the inactivity here isn't a reflection of "we've given up" but more of a reflection on "shipping C# 7.0 - 7.2 is a ton of work and we need our brain power over there". Once we start ramping up again an C# 8.0 this will get revisited and we'll look at it again compared to all the other work we want to do.

sharwell commented 7 years ago

@davidfowl Note that I wouldn't say this is optimal, and small overhead definitely adds up over time and in combination with other things. My real hope from this is it unblocks users who need this functionality in the short term by providing a very reliable and tolerably-performing path to a desired outcome, and then use those results to start being very specific about the goals of compiler-integrated transforms.

davidfowl commented 7 years ago

@sharwell How does it affect incremental builds? Do you have any experience running multiple code generators in the same project?

sharwell commented 7 years ago

How does it affect incremental builds?

The approach I'm using for ANTLR 4 works seamlessly with incremental builds, including incremental clean (a change which causes output files to be renamed will delete previous outputs during incremental build).

Do you have any experience running multiple code generators in the same project?

No, but the order can be defined in MSBuild. By default, all generators run before the compilation step but are otherwise unordered relative to each other. If you have a code generator which depends on another code generator, it can be accounted for on an as-needed basis.

jmarolf commented 7 years ago

@mgravell What is the experience you want for:

Build/Compilation
Debugging
Browsing Generated Source

With code gen? There are many other problems to solve but I don't understand which position you are taking. Do you want something similar to roslyn generators or do have something else in mind?

jcouv commented 7 years ago

FYI @cston

clairernovotny commented 7 years ago

As another take on this, Refit (which does compile-time code generation), uses a Mustache file for its template language. Sure, it only works per-language, but that's ok for us.

It's really easy to update the template. The generation is called in a task before CoreCompile. https://github.com/paulcbetts/refit/tree/master/InterfaceStubGenerator.Core

Still, I'd love to see "the right way" to do this in a more standard way.

mgravell commented 7 years ago

I will definitely be looking into theGenerateStubsTask in refit and that .targets file, thanks for the pointer, @onovotny.

Call it a personal weakness:

I can write code to parse a 3rd-party DSL; no problem
I can emit C# accordingly: fine!
or IL: that's just dandy
but msbuild files? that scares and confuses me...

Out of curiosity; does that .targets file and the tool assembly deploy into consumers via nuget?

clairernovotny commented 7 years ago

In Refit, the tools get packed in a build\tools folder in the nuget package: https://github.com/paulcbetts/refit/blob/master/Refit/Refit.csproj#L28-L46

The only thing that gets deployed to the client itself is the main Refit library, that has the implementation called by the generated code.

One thing I will mention is that writing non-trivial Build Tasks is harder than it should be for .NET Core due to needing to deal with publish steps to get the right output and then needing a custom AssemblyLoadContext to load your task's dependencies alongside the dll: https://github.com/paulcbetts/refit/blob/master/InterfaceStubGenerator.BuildTasks/ContextAwareTask.cs (thanks to @AArnott for that one).

xoofx commented 7 years ago

In the list of scenarios that @mgravell listed above:

serializers (Json.Net, protobuf-net, etc) ORM-like tools (EF, dapper, etc) RPC tools for implementing stubs UI templating tools (razor, etc)

I will add to the UI templating tools, UI framework (e.g XAML) where we need to generate bindings/binary representation of the UI tree. Same goes for grammar languages...etc. I will add also IL patching, very narrow, but again, I had to use this in multiple projects and IL post processing hurts compilation times... The list goes on...

As @jaredpar asked, I'm wondering why this list of scenarios are not enough compelling?...

In my experience, I had already to work on two other similar problems to generate custom serializers and codegen from a scripting language. The fact that we don't have a simple hook in the compiler to generate everything as part of the compilation process is very annoying because it is inducing a post step compilation (with complications when you have to IL merge back), which caused much longer compilation times (typically, I remember some people complaining in our team that compilation time was bad for C#, but it was because of our extra passes that we couldn't really internalize as part of the regular Roslyn compilation process - I had even to make a server running in the background so that we could avoid NGEN steps or JIT slowing down the whole process by several seconds). For that scenario alone, we wouldn't even care about having access to Intellisense for these serializers, as they are completely internal stuffs...

The same would go for generating DLLImport interop for CoreRT/LLILC...

Alone XAML is quite a big typical scenario, compilation steps involved currently in the compilation pipeline (msbuild+VS experience)...

So, obviously, there are quite a few big scenarios that we have to workaround today and are hurting significantly our development process (and the overall feeling that people could have about .NET building process)

I completely understand the complexity of getting something full-featured covering all the cases, but couldn't we work step by step on this? Like:

1) provide a plugin API for running pre/post processing compilation steps - without Intellisense, similar to how analyzers are plugged. Get the feedback from the community. This would be a tech preview, added to the official compiler, but not ready for prime-time/broad usages. Breaking changes possible later. 2) take into account Intellisense more seriously and add support for it

For 1), it could help already many projects that don't require Intellisense. It would help the compiler team to get more feedback. For 2), it could likely introduce breaking changes for the Compilation plugin API, but it would be fine, as long as 1) is an internal stuffs that you need to activate.

Thoughts?

jaredpar commented 7 years ago

@xoofx

As @jaredpar asked, I'm wondering why this list of scenarios are not enough compelling?...

The key here is enough. The scenarios are compelling, but the work involved in making code generators work as a first class language + IDE feature is extremely high. It would likely take at minimum 3 developers the majority of a release cycle to complete. There are a lot of other features you could do with that man power.

Also we have to weigh existing solutions here. How much better could these scenarios get if we invested a fraction of the time in the experience around single file generators / T4 templates?

couldn't we work step by step on this?

I don't think the first step is going to give us a lot of actionable feedback. Definitely we'd like feedback in that area as it's new and has design holes. But it also represents the smallest amount of work. The second step, Intellisense, is basically where all of the crazy comes into play. Very likely the problems associated with making that function would force us to redesign the compiler layer.

xoofx commented 7 years ago

The key here is enough. The scenarios are compelling, but the work involved in making code generators work as a first class language + IDE feature is extremely high. It would likely take at minimum 3 developers the majority of a release cycle to complete. There are a lot of other features you could do with that man power.

I understand the challenges of the IDE part. But, many scenarios don't require IDE/Intellisense experiences because everything that is generated is internal/unknown to the developer. At least for the following cases:

serializers
mappers
rpc stubs
IL/calli for custom dllimport

The first 3 could generate additional files (in obj) that would be only required for debugging experience (and I don't expect this case to be that complicated to integrate).

The compiler would allow to generate code as part of the compilation process (i.e as part of Roslyn process - exactly like analyzers -), it wouldn't need complicated scenarios like triggering recompilation on user key stroke, navigating to generated code...etc.

The only scenario that may require a bit more work in Roslyn is the last one where we would need to have pluggability access at the IL generator level (and not only at the syntax level, or a way at the syntax level to pass through custom IL generator)

I don't think the first step is going to give us a lot of actionable feedback. Definitely we'd like feedback in that area as it's new and has design holes. But it also represents the smallest amount of work.

Considering that this feature alone would cover many of the existing scenarios, by far the most popular (in terms of impact to end-users) and assuming that it would not require any IDE modifications (expect maybe displaying code generators assemblies along analyzers in assemblies references), it sounds very reasonable that it should be possible to add this without tremendous amount of man power (I quote you here "the smallest amount of work." 😉 )

Would you agree with this or are there any side effects of this feature that I'm missing?

Also we have to weigh existing solutions here. How much better could these scenarios get if we invested a fraction of the time in the experience around single file generators / T4 templates?

Above the following scenarios, single file generators/T4 templates is not the most common scenario. T4 templates are working fine today for the few cases where you don't need to inject new code based on your existing code.

xoofx commented 7 years ago

@jaredpar could you confirm my question above?

Also, @mgravell , @ReubenBond, from your experience, can you confirm that codegen for the cases above (serializers/mappers/rpc stubs...etc.) usually don't require any navigation/intellisense/dynamic recompilation on the fly and that the generated code doesn't need to be accessed from "manual code"?

clairernovotny commented 7 years ago

@xoofx I would say that for those cases, and I include Refit, then no, it doesn't need IntelliSense support.

That said, in another case, like @AArnott's NerdBank.GitVersioning, it generates a static ThisAssembly class that contains useful members. This is currently done in a pre-compile step but could easily be turned into a generator. The data from the ThisAssembly class should be available to IntelliSense.

mgravell commented 7 years ago

(quietly nodes in agreement with the sage words of @onovotny)

ReubenBond commented 7 years ago

@xoofx in the Orleans case, codegen never requires any intellisense. It's all for behind-the-scenes support classes.

jaredpar commented 7 years ago

@xoofx

I understand the challenges of the IDE part. But, many scenarios don't require IDE/Intellisense experiences because everything that is generated is internal/unknown to the developer.

If there is no need for an IDE experience then a single file generator via MSBuild is a very attractive solution. Why not invest a small amount of effort there to make the experience more toolable?

Would you agree with this or are there any side effects of this feature that I'm missing?

There are two experiences that aren't accounted for here though: debugging and ENC. Imagine the generated code has an exception, or as the developer I simply want to step through it.

What will be the experience when I step into that file? Will intellisense be available, syntax highlighting, etc ... Getting that to function is not a trivial task. Not fixing it though will make the experience seem rather broken.

ENC is a whole other bag. What should the experience be when the deveolper edits the generated code? The compiler can't re-run the generator otherwise it would destroy the user edit. Also can't not run the generator or it wouldn't be able to account for other edits the developer made to the normal code.

benaadams commented 7 years ago

Imagine the generated code has an exception, or as the developer I simply want to step through it.

Isn't that a similar issue to Linq.Expressions?

xoofx commented 7 years ago

If there is no need for an IDE experience then a single file generator via MSBuild is a very attractive solution. Why not invest a small amount of effort there to make the experience more toolable?

A single file generator means that you need to commit the generated files as part of your repo, which is something that you don't want (e.g size of generated file, merging conflicts alone would be super annoying), moreover if you don't want to allow any changes to the code. Also what is the other IDE story support for them? (VSCode, Rider?...). The single file generator would have to be triggered automatically on every single changes... not sure VS is well equipped for this. There are also stuffs that you absolutely can't do with single file generator, like replacing empty method body (for the DllImport scenario typically) or if a code generator needs to generate nested serializer inside type in order for them to be able to access private fields... (by introducing behind the scene a "partial" even if the original class/struct doesn't have it)

So single file generator is not an option for the scenarios above, at least for me, as I have already used them in the past, precisely for a serializers scenarios, and it was an awful experience for developers... what do you think @mgravell @ReubenBond @onovotny ?

There are two experiences that aren't accounted for here though: debugging and ENC. Imagine the generated code has an exception, or as the developer I simply want to step through it. What will be the experience when I step into that file? Will intellisense be available, syntax highlighting, etc ... Getting that to function is not a trivial task. Not fixing it though will make the experience seem rather broken.

If we want debugging experience, it would require to generate files on the disk, so there is definitely some work to do here. I'm also concerned that the generator would need access to Roslyn syntax tree (with syntax inferences ready), it would then generate some files, these files would have to be added to the current assembly compilation unit, would it re-trigger a whole recompilation? (or is it possible to do this with Roslyn without recompiling everything, I don't know)

So this is definitely something important, but would this feature alone would take several man years? I fail to see exactly what would make this so difficult to add to Roslyn...

ENC is a whole other bag. What should the experience be when the deveolper edits the generated code? The compiler can't re-run the generator otherwise it would destroy the user edit. Also can't not run the generator or it wouldn't be able to account for other edits the developer made to the normal code.

ENC should not be allowed for them (if there would be any generated files)

clairernovotny commented 7 years ago

I agree with the @xoofx's concerns around SingleFileGenerators, namely that they're tied to an IDE and requires the artifact to be checked in (for the same reasons).

The tool we have should work from pure notepad + CLI builds. We have this today with a code-generating pre-compile Task injected with targets, but that's hard to maintain and is limited to C# since we hard-code the template.

Being able to step-in to the generated code is definitely important, however I don't see any need to support editing it. In fact, given that it is re-generated "constantly," the editor shouldn't allow any changes to the generated file.

jaredpar commented 7 years ago

@xoofx

A single file generator means that you need to commit the generated files as part of your repo, which is something that you don't want

Why do you need to commit the file? Again, imagine we made moderate investments in the scenario. For example making the single file generator run as a part of the build.

Also, whether you check in generated files is matter of preference. In Roslyn we check in our generated files because we found that it has a number of benefits: simplifies our development steps (restore, open solution), lets us SourceLink in 100% of our source code, and allows simple stepping / debugging.

There are also stuffs that you absolutely can't do with single file generator, like replacing empty method body

Why not? Or rather why do you think the code generators feature would allow this but a single file generator would not?

The design we settled on for code generators did not allow for generators to modify developer code. Instead it added a couple of small language features (think partial methods on steriods) that allowed for generated code to more cleanly replace those method.s

So this is definitely something important, but would this feature alone would take several man years? I fail to see exactly what would make this so difficult to add to Roslyn...

Where did I say this feature (Debugging + ENC) would take several man years?

ENC should not be allowed for them (if there would be any generated files)

This is your opinion and I can guarantee you it's not shared by a significant number of our customers.

xoofx commented 7 years ago

@jaredpar

Why do you need to commit the file? Again, imagine we made moderate investments in the scenario. For example making the single file generator run as a part of the build.

Can you explain how you would do this exactly? How this could run as part of a single pass/process within the compilation process (Roslyn)? Again, if you are proposing what we are already doing, by customizing special msbuild target, compiling first the assembly, reading metadata from it, generating file from the code, recompiling the assembly with the generated file (or merge back an assembly generated separately)... we have explained that in addition to add significant complexity to our compilation process, compilation time is hurting the whole experience

Why not? Or rather why do you think the code generators feature would allow this but a single file generator would not?

Afaik, single file generator cannot modify existing code (adding an attribute to a class for example), unless you have prepared your code to do so (add partial, even if you don't know if the generator is going to add this attribute...)....

How would you code the Pinvoke generator to replace DllImport with proper dll loading, calli and so on? Unless this is going to use again the slow route of generating the whole DllImport outside of the main compilation process? (as .NET Native is doing)... or using some custom internal of Roslyn we can't use?

Where did I say this feature (Debugging + ENC) would take several man years?

You said above "take at minimum 3 developers the majority of a release cycle to complete" (for the whole thing, not the reduced scenario we are focusing at right now), so I just played here to extrapolate the bits, to make it more... dramatic. I'm glad that it will be much less 😋

This is your opinion and I can guarantee you it's not shared by a significant number of our customers.

My opinion? In this discussion, we are a couple of customers, promoters of .NET in our companies, MVP, OSS contributors of some major projects (btw, you know, the kind of projects "your customers" are most likely using, sometimes, without telling us or even just thanking us) and we have been here to confirm some major use cases, requirements...etc. and we are really glad to help the whole .NET OSS platform here by discussing with you in public. It would certainly help if these customers could raise their voice directly here so that we could get the full picture...

But fair enough, this discussion seems to indicate that beyond analyzers, there is currently zero chance to add plug-able extensions for codegen scenarios to Roslyn in any short/medium term future.

At least, we have tried. 😉

jmarolf commented 7 years ago

I get that the outcome we want is "codegen features sooner", but what subset and when? If we are going to do this in a multi-step fashion what is the acceptable minimum feature set?

My reading of is it: Step 0: Better single file generator experience that is more integrated with the compiler giving better performance Step 1: Add more APIs so that nuget and the rest of the toolchain knows about these generators Step 2: Better APIs so that dynamic compilation of serializes is possible Step 3: Have analyzer-like\type-provider-like api that allows for generated types to be consumed in the IDE (intellisense etc.) Step 4: Have generated code be emitted to pdb so debugging works Step 5: Have Enc work with generated code

Where am I wrong in terms of what everyone wants? what would the ideal ordering of feature delivery be?

ReubenBond commented 7 years ago

@xoofx agreed: we also do not want users to have to commit the code we generate. In order to reduce the chance of this happening, we've taken to emitting the code into the obj directory, whereas it was previously in Properties.

This scenario is too important to give up on. Code generation in C# today is restricted to very few developers with few scenarios because it's so difficult (mostly because tooling/language support is lacking) and there's no blessed path. Look at Android, though, where a large chunk of the most popular libraries make use of Java's Annotation Processing Tool (APT) for codegen. Eg: Butterknife, Dagger, Retrofit, Robolectric, AndroidAnnotations, Parceler, IcePick. Those libraries make the ecosystem vastly better and they make the language more powerful and friendlier for application developers. They extend the scenarios we've discussed in this thread to also include UI binding & customization, application lifecycle, threading, testing, and dependency injection. We don't need all of those things in .NET since we have superior reflection support, but the scenarios are interesting nonetheless.

The outlined steps look good to me. My preference is: 0, 4, 1, 2, with very little desire 3 & 5.

Step 0: I understand this as "better APIs for processing and generating code at build time"

Step 1: I understand this as "code generators can be installed via packages and are exposed to tooling other than the compiler (for diagnostic purposes / transparency / management?)"

Step 2: Perhaps we could attach our own deserialization & copy constructors to a type, as well as other methods & properties. That should remove the desire to have some way to access private types/fields from generated code (see #11149). Currently we jump through hoops to make things "just work" in cases where the user has a private type or private/readonly fields. Eg, the generated C# code calls into methods which generate IL at runtime so that we can sidestep accessibility rules.

Step 3: For our scenarios, this is not necessary. When Orleans was originally released, we did have some user-exposed generated code. We would generate static classes for users to consume based upon their interfaces. The experience was not ideal, and we've since replaced it with non-static classes with generic methods. I'm not discounting the value of this in general. I'm sure it's very useful, just for our particular scenarios (RPCs & Serialization)

Step 4: This should be step 1 if it's not included in step 0. Without this we would end up instrumenting generated code (printf) so that we have some way of debugging it. Ultimately, we can live without it.

Step 5: I don't see the need. A developer should not expect ENC edits to persist after the debugging session has ended. It would be more surprising if they did persist. It's not mutable code & conceptually does not need to ever exist on disk. When we perform runtime code generation, we just pass syntax trees directly to the compiler, no textual C# code exists and certainly not on disk.

jaredpar commented 7 years ago

@xoofx

Afaik, single file generator cannot modify existing code

I covered this in my earlier comment. The generator feature we were designing didn't allow for code modification either. Hence single file generators are just as powerful here as the language feature.

Again, if you are proposing what we are already doing, by customizing special msbuild target, compiling first the assembly, reading metadata from it, generating file from the code, recompiling the assembly with the generated file (or merge back an assembly generated separately)... we have explained that in addition to add significant complexity to our compilation process, compilation time is hurting the whole experience

Not what I'm suggesting. I'm trying to dig into why single file generators don't work for the scenarios. Is it the development experience, the way in which they execute in MSBuild, the lack of access to Compilation objects, etc ...

You said above "take at minimum 3 developers the majority of a release cycle to complete" (for the whole thing, not the reduced scenario we are focusing at right now), so I just played here to extrapolate the bits, to make it more... dramatic.

How is creating unnecessary drama helping to move this conversation forward?

I'm glad that it will be much less

When did I say it would be much less?

It would certainly help if these customers could raise their voice directly here so that we could get the full picture...

The information I provided pretty much sums up their position. The ENC experience should work for the entirety of the C# code that comprises their assemblies. Whenever there is a gap we get pretty direct feedback about it.

davidfowl commented 7 years ago

I feel as passionately about this issue as the people on this thread (DNX compile modules were awesome 😄). @jaredpar would it make sense to convert a few real projects to what your proposal would be (better single file generators)? That way we could see how much pain there actually is(and there's a lot today), and how it scales across the various IDEs we need to support now (VS, VS Code, VS for Mac, Rider etc).

xoofx commented 7 years ago

@jaredpar

I covered this in my earlier comment. The generator feature we were designing didn't allow for code modification either. Hence single file generators are just as powerful here as the language feature.

I understand, but it seems that previous generator feature didn't have in mind scenarios for rewriting DllImport. Compare to serializers, this is less critical, but when we will have one day to work with CoreRT/LLILC to make DllImport code generation fast as part of build process, it will require to integrate it as part of the compilation process, otherwise it will slow down the whole compilation.

Something to keep in mind, but let's not take this scenario into account for now.

I'm trying to dig into why single file generators don't work for the scenarios. Is it the development experience, the way in which they execute in MSBuild, the lack of access to Compilation objects, etc ...

Assuming that we are talking about the way IVsSingleFileGenerator is currently working:

1) It is VS specific, design time only, running only from the IDE 2) It is running against a "trigger" file on which you assign a custom tool. It is inadequate when changes are coming from any code changes in your project (case of serializers, rpc..etc.) as It is triggered only if you change the trigger file. 3) It implies to generate a file on the disk and to add the generated file to the source control repo

For serializsers/rpc/db mapping, these are laborious constraints. I fail to see how to workaround these without integrating the generator as part of the Roslyn compilation process.

How is creating unnecessary drama helping to move this conversation forward? When did I say it would be much less?

My usage of rhetoric and humor doesn't get through here, so my apologize for the interference.

The ENC experience should work for the entirety of the C# code

If generated code is on the disk (like in obj\...), I don't see why it would not work (although, as we said, for generated code, we don't want to persist changes, as they don't make sense)... and if generated code is not on the disk, the user couldn't see it, so there should be no problem either.

xoofx commented 7 years ago

Note that for IVsSingleFileGenerator, in additions to the points above, this interface is also completely lacking any context of compilation objects.

What we are looking for is something very similar to Roslyn Analyzers (in terms of distribution, discover-ability, message reporting...), but that would run just before analyzers. I actually wrote a hack/proof of concept a few months ago in this branch at compilation-rewriter. This is not how it should be done, but it gives roughly an idea where this generator could be called in current Roslyn compilation process (A dedicated code would extract the pre-process outside of AnalyzerDriver, remove inheritance from DiagnosticAnalyzer, try to share a common base class and provide a default base class CompilationRewriter...etc)

daveaglick commented 7 years ago

I totally agree with @xoofx that analyzers provide a compelling basis for code generators. There's already a nice dev-time experience for analyzers that could be at least partially leveraged and developers are starting to understand how they work, what they're capable of, and how to use them. Adapting the analyzer pattern to automatic compile-time application seems like it would be easier for developers to grok than a whole new code generation paradigm.

sharwell commented 7 years ago

I just started work on a task that uses Roslyn-based code generation to provide a feature: https://twitter.com/samharwell/status/883767229896159232

I have no idea how it's going to turn out. Definitely intended to be a learning experience.

ReubenBond commented 6 years ago

How can we help to make progress here?

We can use Roslyn's code analysis APIs to build better experiences for users and improve the ecosystem if we have support from the toolchain. In my case, I want to be able to take a project as it's being built, analyze its syntax, and emit additional syntax before the build continues.

mgravell commented 6 years ago

I suspect that mine and Reuben's use cases are virtually identical.

Also emphasis: this kind of support would be a major boost to UWP (and unity / xamarin) users who currently get a third rate story from any tools that are meta-programming heavy. I've had quite a lot of frustrated conversations with those users (especially of late).

Think of the users! :)

On 15 Dec 2017 9:42 p.m., "Reuben Bond" notifications@github.com wrote:

How can we help to make progress here?

We can use Roslyn's code analysis APIs to build better experiences for users and improve the ecosystem if we have support from the toolchain. In my case, I want to be able to take a project as it's being built, analyze its syntax, and emit additional syntax before the build continues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/roslyn/issues/19505#issuecomment-352119038, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDsPi5jVURF9LYdThCFA77MqE959-Yks5tAufdgaJpZM4Na13S .

jcouv commented 6 years ago

Tagging @KathleenDollard

pentp commented 6 years ago

I also have two scenarios that would benefit from a proper compile time codegen solution.

An ORM-like API that currently uses double compilation (csc, Mono.Cecil, csc) to generate support functions that are invisible to the end user.
A localization API that could benefit from compile time codegen, but currently does not because there isn't a good way to make it work seamlessly (except for WebForms and Razor where the compliation units can be augmented and the result is quite nice but needs two separate codegen implementations).

In both cases I would like to modify the CompilationUnit before the assembly is emitted, but there is no need for any IDE/IntelliSense support (in fact, I don't want the end users to see any generated/replaced code, I just want them to use normal API calls that have a basic fallback implementation). IntelliSense, debugging and ENC are not needed at all. For the ORM support functions I actually use #line hidden to hide the generated code from the PDB.

KathleenDollard commented 6 years ago

I would love to see life in this topic again. I think our modern view would result in a good outcome, specifically, the notion of micro generation/translations rather than uber generations to solve all the problems of app development at one go.

There is also an interesting feature proposed by Anthony D Green https://github.com/dotnet/vblang/issues/282 This is an interesting finesse around what I have come to believe is a core requirement: the code you are going to emit needs to be checked syntactically while you are building the template. Since this approach actually doesn't use a template, there is innate syntax checking.

dotnet / roslyn

Reboot build chain codegen #19505