Reboot build chain codegen

mgravell commented 7 years ago

There have been lots of thoughts of compilation extensibility points for .NET, including "dnx compile modules", "roslyn generators", etc. The reality is: due to various factors including complexity, changing needs (IDE support? just the compiler chain?), and time pressures (getting vWhatever shipped), the result of this is: nothing has changed. There's a charplang feature that is going nowhere, and the build-time options are all closed. It is acknowledged that the IDE side of things (i.e. the current hypothetical approach) is probably prohibitively expensive, so: that almost certainly won't happen.

It is also perceived as a niche feature, which subtracts from the perceived need.

My aim, here, is to challenge the line above. While I fully agree that it is a minority feature in terms of counting the implementors, I propose that it is not a niche feature in terms of reach. Quite the opposite, especially when we consider that netstandard now happily reaches into a lot of platforms that don't have active meta-programming support (no JIT, AOT-only).

There are any number of scenarios that can benefit here, but due to my projects, the ones that leap most immediately into my mind are things like:

serializers (Json.Net, protobuf-net, etc)
ORM-like tools (EF, dapper, etc)
RPC tools for implementing stubs
UI templating tools (razor, etc)

Additionally, deferring meta-programming to runtime even on those platforms that support it means there is unnecessary overhead at runtime, doing something every startup that could be done once and forgotten. In some cases, this reflection/emit work can be quite considerable. For platforms that don't support it, you have to use reflection workarounds, so you have an entire additional codebase / implementation to support and maintain, plus poor performance.

The point is: just about every single app is going to need either data, or UI - probably both. And every app wants to be performant. The number of implementors of these tools is indeed minimal, but the reach is immense.

The current status-quo leaves poor options. Additionally, with the ever-increasing trend towards async, even on platforms that support runtime emit, it is less and less desirable to use the emit API, as it is tortuous to correctly implement async code in this way. A lot of work has been done in the language to solve the complex problems of async, and it makes perfect sense to use the roslyn chain to do that injection.

How can we pragmatically move forward these needs, without drowning in the big goal complexity that has stalled "generators"? Would rebooting the "compile modules" work be viable?

(also: paging @jaredpar and @davidfowl)

ReubenBond commented 6 years ago

Thank you, @KathleenDollard, that gives me hope that we can make progress on this.

My interim solution was to create an MSBuild target which runs before CoreCompile and passes some context from the current build to my code generator assembly which then creates an AdhocWorkspace, calls GetCompilation, walks the tree, and outputs supplementary syntax to a temporary file which is included in the final build. This is then bundled into a NuGet package which inserts the target. That says nothing about ergonomics, just about how it can be done today.

This is cheaper than what we're doing in dotnet/orleans (full double compilation: each assembly is emitted twice), but more expensive than it needs to be (adds 2.5s to build for a small library). This solution might also be fragile.

I believe that the approach from dotnet/vblang#282 wouldn't be suitable for our needs in Orleans since it seems to only apply to properties. It looks very AOP to me (which is fine). In our case, we generate RPC proxies & stubs, serializers, and assembly-level attributes which help us to quickly locate interesting types in an assembly.

Perhaps we could start with a general purpose code generation feature and introduce friendlier but more restricted features (like dotnet/vblang#282) later?

xoofx commented 6 years ago

I would love to see life in this topic again. I think our modern view would result in a good outcome, specifically, the notion of micro generation/translations rather than uber generations to solve all the problems of app development at one go.

@KathleenDollard the posts in this thread should give a pretty condensed view of years of usage/experience feedback from many major libraries or internal company use cases. We have been through many options in the past already (IL postprocessing + IL merging, IVsSingleFileGenerator generating files, Roslyn pre-analysis+generate files+actual compilation...etc.) and the bottom line is that without an integrated story into the compiler, we are making the compilation story horrible for our users.

The example of @ReubenBond above which adds 2.5s is very similar to the problem I had to fight with IL postprocessing, where I end up creating a server running on the user machine to make sure we don't have to reload all the processing assemblies... still the experience was terrible compare to regular .NET projects (not counting months if not years of trouble when Cecil was not correctly writing debugging information for async code making the debugging experience impossible after a IL postprocess)

In the above discussions, the Edit&Continue seemed to be one of the blocking points but we said that we don't want Edit&Continue for code that is generated by a tool (btw, I would love to hear metrics usages about Edit&Continue, because I have never been using it or any people in my teams). I believe that generating these files in intermediate obj/ directory would still allow a good debugging experience (in case we want to debug generate code). I would go further and say that ENC should be disabled if a Roslyn codegen plugin is present in the project. Users would know about it ("Can't ENC, because XXX requires Roslyn plugin codegen) and would understand the benefits. If they don't want this, they would choose a different RPC/UI/Whatever library (and would realize what they are loosing)

I have given above also an example where and how in the Roslyn toolchain this codegen plugin story could be developed by leveraging the existing Analyzer infrastructure...

So I concur with you, I would love to see life in this topic again! 😉

xoofx commented 6 years ago

FYI, at Unity we have been developing an incremental compiler using Roslyn. In the coming months/years, we are going to move to a model where many parts will be code generated (serializers...etc.), and this code generation will be part of our compiler. We are not forking Roslyn but reusing the different bits to allow this. Still, I would really prefer this kind of work to be integrated/standardized into the Roslyn compiler (As diagnostic analyzers have been done) as it would allow efficient codegen scenarios for a broader audience.

xoofx commented 6 years ago

So I have been in a lengthily discussion on a related issue https://github.com/dotnet/csharplang/issues/107#issuecomment-398663257

TLTR; I would like to know if the folks in the Roslyn/Microsoft teams (@jaredpar @agocke @CyrusNajmabadi @KathleenDollard @jcouv and all the people I have missed sorry) plus the people asking for this feature that have been developping similar scenarios (@mgravell @ReubenBond @xoofx, add your name if you have developed something related to heavy IL patching/codegen scenarios) would be interested to have a meeting together to try to sort out our requirements, the constraints and if we can proceed further?

PathogenDavid commented 6 years ago

I've only been lurking on the relevant GitHub issues, but I'd be interested in participating. We have a lot of generated code using a variety of methods (T4 templates, custom tools using Roslyn, custom tools not using Roslyn, and Fody) for both private and public codegen.

@SimonCropp and the rest of the @Fody team would probably have some valuable input too.

kentcb commented 6 years ago

Another use case from me: my PCLMock library uses codegen to take the busy work out of creating mocks. It uses Roslyn to achieve this, but .NET Core/SDK-style csproj came along and broke some things.

Writing the code to perform the codegen was one thing. The bigger challenge for me was deciding how to make that tooling available to developers (T4 template, console app, scripty...?). The concept of codegen should just be a formal part of the ecosystem and build tooling. As an added bonus, if it was formal, I suspect that the kinds of regressions I've struggled with would be less likely.

As it stands, PCLMock has remained unable to move to .NET standard for many months. 😢

jaredpar commented 6 years ago

@xoofx

I would like to know if the folks in the Roslyn/Microsoft teams ... would be interested to have a meeting together to try to sort out our requirements, the constraints and if we can proceed further?

I'm always up for a discussion. At the same time though most of what I'm interested in digging into how to make a sensible IDE story around generators. The compiler side of code generators is fairly straight forward in pretty much every design we've discussed. The IDE is the challenging part and what ultimately caused us to not do this feature.

The constraints we came up with for code generators and IDE were roughly the following:

The IDE must able to understand as changes happen to the code whether or not generators are out of date.
The IDE must have correct semantic information without running generators on every single key stroke.
Code generators cannot directly modify code the developer has authored.

ReubenBond commented 6 years ago

(I feel this is a more relevant issue for discussing this request than https://github.com/dotnet/csharplang/issues/107)

@KathleenDollard, you mentioned out-of-proc source generation. This could satisfy most of my scenarios.

It could let us consume & emit syntax trees which are included in the compilation, and it would be re-invoked whenever source changes. I assume some simple notifications we could subscribe to so we're not running on each key press or file modification. The tooling would have to ensure all generators are satisfied before the final compilation.

Two scenarios which additive source generation doesn't currently suffice for are:

Serializers usually need/want to access private fields and potentially add a custom serialization constructor so that we can initialize readonly fields and (more importantly these days) get-only autoprops. Currently we use unverifiable IL to read/write private/readonly type members, but that is an issue for AOT platforms.
AOP scenarios need to be able to modify code more generally. I cannot comment on this, since I haven't needed it. I'd be fine with a solution which didn't satisfy AOP's needs immediately.

For my uses:

Using Roslyn APIs to analyze source and to construct syntax trees is fine.
Users do not need to be able to see generated code in intellisense or otherwise access it in their own code. We would have hand-written interfaces for scenarios which might tempt this and generate against the interfaces.
Users should be able to inspect generated code for diagnostic purposes. Eg, today we write it to obj dir.
Users should be able to step through generated code with the debugger.
Generated code should be able to add new members (eg serialization ctor) and access private members of another type specified in a compilation, but removing or modifying members is not a requirement. We have partial today, but I believe requiring users to add partial to everything because of codegen implementation details is an undesirable end-user experience.
Code generators should be easily shippable as NuGet packages.
Code generation should not significantly slow down the developer's inner loop (eg, by significantly slowing the build process)

@jaredpar wrote:

The constraints we came up with for code generators and IDE were roughly the following:

The IDE must able to understand as changes happen to the code whether or not generators are out of date.

The IDE must have correct semantic information without running generators on every single key stroke.

Code generators cannot directly modify code the developer has authored.

I believe these wants align with those constraints. The scary/difficult one is regarding adding new members and accessing privates - that probably needs the most design.

xoofx commented 6 years ago

@ReubenBond all your use cases can be done with a post-compilation step without an IDE integration, the plugin would decide to dump to obj folders the relevant files that it wants to be debuggable. The access to private fields, the fact that it can change existing class to access private fields, the fact that you don't want to have explicit tag partial class, I agree with all your points, that's what we have with IL patching today (except the debugging experience, unless you go through a more complicated way of IL analyzing + generating cs + compiling them + ILMerge them back), and post compilation can simply handle this in a breeze.

Concerning @jaredpar requirements:

The IDE must able to understand as changes happen to the code whether or not generators are out of date. The IDE must have correct semantic information without running generators on every single key stroke.

These requirements are hurting most post-compilation scenarios (and de-facto excluding them) while I don't understand why you would want to generate serializers/ORM mappers/RPC code on every key stroke or whenever there is a single change in the code base that could affect a codegen aware trigger. This is a waste of IDE time, while this code gen should only happen when actual compilation to the disk is happening. We want here a similar treatement than async/await: the users doesn't have to suffer from the internal machinery at IDE time. The difference with async/await is that we could output debuggable code in the obj folder just in case you still want to debug the internals.

Code generators cannot directly modify code the developer has authored.

Post-compilation would output files to obj folders at compilation time - not IDE typing time. This would not be mandatory to output files, as for some AOP scenarios this might be irrelevant (pre-post code changing the body of a method). The IDE has nothing to know about them as it would access them automatically when debugging with the generated PDB informations generated at compilation time. Files are "readonly" by essence there (post-compilation). as they are not part of your project.

At the same time though most of what I'm interested in digging into how to make a sensible IDE story around generators.

This is where the requirements for a strong IDE story around generators is puzzling me. Post-compilation allow a good balance between zero IDE integration work while fitting a large chunk of use cases we have with codegen scenarios today (including debug). By making a strong IDE integration a requirement (making it almost a pre-condition to even discuss in a meeting), mismatching our use cases on several points, is disallowing to make any progress on the subject, and the staling state of this issue, one year later, is attesting sadly the blocking and our misunderstanding on this.

m0sa commented 6 years ago

@jaredpar, as @xoofx say, please re-consider post-compilation.

I think what a lot of people here want is to be able to optimize existing calls (probably with changes directly on the call site, hoisting stuff, etc). All of that code already works without any post-compilation, by doing stuff at run-time (e.g. serializers, ORMs). I like to think of it as comparing Debug and Release builds. You don't expect to be able to step through every C# statement when debugging a optimized release build. I think most of the optimized / generated code would be [DebuggerHidden] / [DebuggerNonUserCode] / [DebuggerStepThrough].

The major pain point here is that #line pragmas must be applied at the call-site, to line up the parameters when the call target gets swapped out, or in case it changes (and it would have to in some cases, e.g. inlining or converting anon type members into parameters in dapper calls).

So maybe we actually want an "optimizers" instead of "generators" feature, which would:

work on method calls decorated with an attribute
allow changing of call sites to those methods, allow them to be expanded arbitrarily, and introducing new members, to classes
require a run-time implementation, so they can work without post-compilation steps
don't need any IDE support

As you can see, this looks very similar to what generators do, but without any IDE support. That's why I think everybody wanting to get the "optimizers" feature was so excited about "generators", but are now bummed that the IDE support requirement is blocking everything.

jaredpar commented 6 years ago

I think what a lot of people here want is to be able to optimize existing calls

Optimizing calls almost always end up doing so by violating the semantics of the original code. Asking for this is essentially asking for generators to change the meaning of the code the developers typed. That will lead to less predictability and many, many subtle issues. I'm highly skeptical of generators that attempt to do this.

As you can see, this looks very similar to what generators do, but without any IDE support.

Even this design requires IDE support. It must be possible to debug the code after the generators have run: F5, step into, ENC, etc ...

xoofx commented 6 years ago

Optimizing calls almost always end up doing so by violating the semantics of the original code. Asking for this is essentially asking for generators to change the meaning of the code the developers typed. That will lead to less predictability and many, many subtle issues. I'm highly skeptical of generators that attempt to do this.

We are not asking Roslyn doing this. But this is something that is sometimes used in our products that we deliver to our customers but though, I have never been in a situation to violate the original code.

Think for example of providing a rewriter for Linq to expand them to use foreach instead. That's a super useful optimization.

Sure, someone with a compiler plugin could do something wrong, but many will do it right and on purpose. And nobody is forcing you to use a plugin that is going wild or is wrong.

Even this design requires IDE support. It must be possible to debug the code after the generators have run: F5, step into, ENC, etc ...

As I explained, post compilation do support debug and you don't need IDE support (because the PDB is already giving the infrastructure to jump around files) by outputing files to the obj/ folders at compilation time.

But for AOP scenarios, usually, you don't want to debug modified call site. If you look at PostSharp debugging experience they don't give you an opportunity to step into the modified call site (and hey, they could violate the code they change, but I'm sure they are careful at designing their product not to do this), but you are still able to step into the callbacks.

But even for AOP and Roslyn post-compilation, with a configuration asked by the user, we could still re-dump the files have been changed to the obj folder in addition with the code inserted, and with pragma #line it should still be able to debug his code and step into the generated code - and step back to the original code that was not modified via pragma #line (but usually, you don't want to debug a Linq->Foreach loop, you assume that it has been battery tested as much as async/await has been tested)... though not sure many people would ask for this.

For the ENC case, I disagree: post-compilation does not have to comply with this, and this should be fine to disallow ENC if there is a post-compilation plugin in the project, because users of this compiler plugin will place the features brought that the library much above an ENC requirement. I have never seen a complain about this in our product, while we were doing pretty heavy IL patching. I don't think that PostSharp users either. ENC doesn't make sense here if we rewrite the body of a method or we change a class to be partial and add some methods to it...etc.

Again, we have been working with customers for years, by providing this kind of workflow, and it has been working fine (except, that the compilation time is just horrible and IL patching - not the one pre-generating to cs and ILMerge back - was not providing debug, but this can be done with Roslyn post-compilation plugins actually)

I'm not saying I don't want pre-compilation (generated code that is accessible by user code), because this one of course will require a deep IDE integration. But pre-compilation is a lot more problematic and it doesn't cover the more larger scenarios used by the post-compilation option.

So we are looking for something where we can have, first, post-compilation with debug in Roslyn (which requires zero IDE integration) which is very easy to add, and pre-compilation later, once all the teams involved for this feature will have been able to find a proper solution to this problem.

xoofx commented 6 years ago

And I still would like to make this meeting, because I feel that we need to talk in real to clear things up more fluently 😉

m0sa commented 6 years ago

It must be possible to debug the code after the generators have run: F5, step into, ENC, etc ...

@jaredpar that's why I'm saying let's call the feature "optimizers". People are not expecting this of optimized code:

Step Into - Debug

Step Into - Release (optimized)

jaredpar commented 6 years ago

@xoofx

Think for example of providing a rewriter for Linq to expand them to use foreach instead. That's a super useful optimization.

That's a prime example though: it's an optimization that changes the underlying semantics of the code. I don't think C# should be creating features that allow the stated semantics of the language to be violated. If that is the goal then why have generators? Wouldn't the language be better off just saying "we can rewrite LINQ to be faster if we please"?

As I explained, post compilation do support debug and you don't need IDE support (because the PDB is already giving the infrastructure to jump around files) by outputing files to the obj/ folders at compilation time.

That can help with general debugging. A lot more work would be needed to address features like ENC.

For the ENC case, I disagree: post-compilation does not have to comply with this, and this should be fine to disallow ENC if there is a post-compilation plugin in the project,

That is your opinion though and it simply doesn't reflect the feedback we get from users. They want ENC to just work.

And I still would like to make this meeting, because I feel that we need to talk in real to clear things up more fluently

Agree. In part because I'm better at talking than writing 😄 Seriously though I've had better success discussing the "violating C# semantics" point in person with people. Mostly because there are very subtle issues that pop up during optimizations that pretty much always violate C# semantics. It's easier to detail this, and the C# team philosophy around it, in a back and forth setting.

xoofx commented 6 years ago

That is your opinion though and it simply doesn't reflect the feedback we get from users. They want ENC to just work.

You are referring to users in general - as a Roslyn team member, I can understand - but on our side we are talking about users that are using apps requiring to generate additional/modified code at compile time. In this category of person, I haven't seen them asking for ENC, or them saying "I won't use your product if you don't have ENC". So that's not really an opinion I have, but the reality we are living today: We are generating code through a non standardized setup (IL patching+custom build tasks) that is hurting the experience of our customers, but we don't any choice, because there are no other solutions to solve this problem today (hence why we are here trying to get this solution to get into Roslyn)...

That's a prime example though: it's an optimization that changes the underlying semantics of the code. I don't think C# should be creating features that allow the stated semantics of the language to be violated. If that is the goal then why have generators? Wouldn't the language be better off just saying "we can rewrite LINQ to be faster if we please"?

Sure, but can Roslyn oversee all the potential optimizations that all C# products around are looking for? That's the fantastic opportunity of having a true compiler plugin story (while the post-compiler plugin story is the easiest one), to allow people to extend the compiler (that we are today already extending through IL patching). It will allow people to prototype more easily new ideas (That sure could make it into Roslyn one day) or to distribute breakthrough to their customers directly.

I can feel from the different discussions with Roslyn team members that you are implicitly worried that it would open a pandora's box, and that the whole integrity of Roslyn is at stake here, that if a plugin starts to output something different than a stock Roslyn compiler, it would be a kind of treason or a super burden for Roslyn... while I'm more convinced that it would open more fantastic opportunities for the community than the few dark side of some rogue compiler plugins 😉

svick commented 6 years ago

@jaredpar

it's an optimization that changes the underlying semantics of the code.

Doesn't every useful code generator/codegen tool do that too? The whole point is that the user writes one thing, but the code that actually executes is different.

I don't think C# should be creating features that allow the stated semantics of the language to be violated. If that is the goal then why have generators? Wouldn't the language be better off just saying "we can rewrite LINQ to be faster if we please"?

The justification I've heard for the reason that the C# compiler won't do that is that it can't see the implementation of LINQ. But a user that decides to install a "make LINQ faster" optimizer/code generator does know if it's appropriate for them. So I don't see the issue here.

agocke commented 6 years ago

Doesn't every useful code generator/codegen tool do that too? The whole point is that the user writes one thing, but the code that actually executes is different.

No, most code generators are things like serializers that take in a spec (in code or an external file like .proto) and generate additional code in the project to implement that specification. Since this is all C# code that's being generated and no user-written code is modified, by definition this can't violate any C# semantics.

SimonCropp commented 6 years ago

Happy to hop on a call and hash things out

NMSAzulX commented 6 years ago

No ENC. Emit code is easy to understand but writting emit is a terrible experience. We need C#-Script、AOP on runtime. I think it is a very important enhancement for the .net core. I can even write life with it. :)

jaredpar commented 6 years ago

@xoofx

I can feel from the different discussions with Roslyn team members that you are implicitly worried that it would open a pandora's box, and that the whole integrity of Roslyn is at stake here

No. I'm more worried about the integrity of C#. There is a reason that C# has a spec and that the compiler adhers to that spec with the exception of compat issues. The language provides strong guarantees about how the code will be interpreted and executed. Once plugins can arbitrarily rewrite the code then all those guarantees go away. It becomes impossible for the developer, and the compiler team who gets the bug report, to understand exactly what a foreach or select clause are doing. It's no longer defined by the spec but instead by the whims of an optimization engine.

jmarolf commented 6 years ago

(IL patching+custom build tasks) that is hurting the experience of our customers

I agree that IL patching as a build step is both:

hard to maintain as a implementer
difficult reason about as a customer

Correct me if I am wrong @xoofx but I going to be working under the assumption that a solution to our problem would be able to create LinqOptimizer using roslyn apis.

So what do we need?

Ability to load compilations (implicitly stating that we only need the context of a csproj file, not a the entire solution)
Ability to examine existing code patterns (similar to analyzer infrastructure)
Ability to modify code (similar to code fix infrastructure)

That last one is really tricky. Suppose we have two "optimizers" installed. Which order do they run in? What if they each modify the same section of code? If there are conflicts when the optimizer is run how are they reported to the user?

Today code fixes are user initiated actions. Most of these ambiguities are resolved by the user being shown the set of code changes that are going to be made, the user deciding that these changes are correct, and additional error dialogs being shown to the user if application fails.

A general purpose optimizer api will need to at least solve these problems to be viable.

order of application for optimizers
error model for failed optimizer application
some strategy for parallelization / merging (so N number of optimizers don't require N number of builds)

In addition you've identified several concerns that the compiler team has about code gen so I'll spell them out:

Being able to completely change language semantics means a user could install an optimizer and then be unable to reason about their code. All arithmetic operations could be revered (+ to - for example) and the user wouldn't know until runtime. This sounds like an odd things to be concerned about ("who would ever write such a thing?" you may ask), but if this was introduced in the public compiler API there would be no half measures. I can personally attest that when you have a userbase as large a C# anything that can be done will be done and C# dialects are not a thing that the compiler team wants to introduce. C# as a language has a lot of explicitness in its design so you can look at a snippet of C# code and know what it is going to do. Losing that feature is not a tradeoff the compiler team is willing to make.

So where does that leave us? I still think that your scenario is a reasonable one. If you have a library that has a nice linq-like api it would be great if you could get the performance your users need without making the api harder to work with. My current thinking is that CodeGeneration.Roslyn is the best place to look at for the optimizer case.

CodeGeneration.Roslyn does all three of these:

Ability to load compilations
Ability to examine existing code patterns
Ability to modify code

It has some design limitations that make it unsuitable to solving all of unitys problems out-of-the-box, but I still think that modifying this solution will be easier because:

It uses existing extensibility mechanism (msbuild instead of the compiler)
It has an api that is better than IL patching

xoofx commented 6 years ago

@jaredpar

Once plugins can arbitrarily rewrite the code then all those guarantees go away. It becomes impossible for the developer, and the compiler team who gets the bug report, to understand exactly what a foreach or select clause are doing. It's no longer defined by the spec but instead by the whims of an optimization engine.

That's a bit exaggerating that the whole integrity of C# would be at stake because a plugin could misbehave. We could perfectly ask the user to report to Roslyn with only safe mode compiler "on". This could be a property in the project to setup.

No. I'm more worried about the integrity of C#. There is a reason that C# has a spec and that the compiler adhers to that spec with the exception of compat issues. The language provides strong guarantees about how the code will be interpreted and executed.

So today, there are thousands of projects using IL patching solutions (either commercial or things like Fody) that are well integrated enough into msbuild and post-compilation tasks, that a user can't make a distinction if it is integrated into Roslyn or not (apart that the build is slightly/significantly slower), so these solutions that have been around for years (even before Roslyn was released) are in power of breaking the whole integrity of C# or put the Roslyn team at a high risk?... Have you been through any recurrent trouble (or even a single trouble?) reported back to Roslyn with a fake compiler bug introduced by these solutions?

jmarolf commented 6 years ago

Have you been through any recurrent trouble (or even a single trouble?) reported back to Roslyn with a fake compiler bug introduced by these solutions?

Yes we have had several compiler bugs where IL weaving caused crashes and it took us a long time to determine that it was not a compiler bug, but an IL weaver bug.

xoofx commented 6 years ago

Yes we have had several compiler bugs where IL weaving caused crashes and it took us a long time to determine that it was not a compiler bug, but an IL weaver bug.

So yep, that confirms that you get trouble anyway, even without a compiler plugin in Roslyn... so, the integrity in C# is not doomed 😉

jaredpar commented 6 years ago

@xoofx

That's a bit exaggerating that the whole integrity of C# would be at stake because a plugin could misbehave.

Disagree. This is not about misbehaving plugins. It's about correctly functioning plugins. The only reason for rewriting code is to meaningfully change the way in which the code executes.

So today, there are thousands of projects using IL patching solutions (either commercial or things like Fody) that are well integrated enough into msbuild and post-compilation tasks

Agree these exists and that they should exists. But they should not exist as part of the compiler.

Have you been through any recurrent trouble (or even a single trouble?) reported back to Roslyn with a fake compiler bug introduced by these solutions?

This is a frequent problem for the compiler team. This is true for all the different ways in which developers can manipulate compiled IL. They are often the highest hit crash count the compiler team deals with.

agocke commented 6 years ago

@xoofx This actually presents an ongoing problem for the compiler team. We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

This is especially bad for enterprise customers with support contracts, as we are often required to fix their bugs, but they often can't even show us the binaries. We've had to send engineers to do actual off-site support requests for these bugs, wasting huge amounts of resources.

xoofx commented 6 years ago

This actually presents an ongoing problem for the compiler team. We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

This is a frequent problem for the compiler team. This is true for all the different ways in which developers can manipulate compiled IL. They are often the highest hit crash count the compiler team deals with.

So this problem is unavoidable no matter plugins exist or not. But at least, with a plugin infrastructure right into the compiler, we would be able to track through some assembly attributes which plugins have been used. It would at least streamline to identify what has been modifying the code.

But fair enough, I finally got the plot behind the resistance against the post-compilation plugin idea. That's unfortunate, because I believe that not having this as part of Roslyn is actually hurting more Roslyn than it would, because solutions today are actually more dirty and error prone (working at IL level, ms build tasks in the middle...etc.)

agocke commented 6 years ago

I think Jared and I have basically different arguments here, fwiw. I don't like post-compilation modification because, by and large, the results are often buggy that we inevitably have to debug. Jared's a manager and isn't weighted by such pedestrian concerns 😉. I believe his position is more that the compiler shouldn't be in the job of code rewriting, because our job is to produce a translation engine from C# to IL, not an arbitrary code generation platform.

Edit: Change language. "buggy crap" was too strong and I was being a bit tongue-in-cheek here. I also have no idea of the proportion because by definition we don't get reports from people who have working post-compilation rewriters. However, we do see a lot of these bugs.

SimonCropp commented 6 years ago

@agocke

We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

I don't like post-compilation modification because, by and large, the results are buggy crap that we inevitably have to debug

This seems unnecessarily antagonistic to many of the people we want to involve in this conversation, ie the people who have experience in IL re-writing.

Also I am certain than many of the owners of those tools that have bugs (as all software does), have spent a non trivial amount of time helping users of their tools (and also MS customers), to debug problems that resolve down to a bugs in MS software and sometimes specially roslyn. I know this is true for Fody (which i maintain).

From my perspective this issue provides us with an opportunity to have a more formalised approach allowing 3rd parties to provide the business value that is best delivered by codegen. Ideally this would result in less, as you say, "buggy crap". One specific example of how this could be achieved is, if there was a plugin based codegen model, after each codegen plugin was run an IL verification (like peverify) could run. Also we could provide testing helpers that run the same verification when someone is unit testing their codegen plugin

agocke commented 6 years ago

Now you're describing an arbitrary codegen platform. Helping users generate IL, making sure the IL is legal, maybe making sure the IL has some subset of the semantics provided by the initial program -- these are just codegen tools. I don't think inserting this stuff into the compiler is good software design and I don't think the goals of the compiler team are well served by us owning this process.

xoofx commented 6 years ago

Got a Roslyn plugin compiler with my branch compilation-rewriter using the existing diagnostic analyzers infrastructure... Now, it is so tempting to proceed further...

agocke commented 6 years ago

Btw I didn't mean Fody or any other tool is buggy. There's a huge amount of ildasm/ilasm going on that is very buggy. A lot of it from within Microsoft.

svick commented 6 years ago

@agocke

This actually presents an ongoing problem for the compiler team. We already spend a fair amount of time chasing down bugs in other products, especially IL rewriters and obfuscators. Increasing the probability that people use these mechanisms has a direct negative cost to our productivity.

To me, this sounds like an issue that new code rewriting infrastructure could help with, if it's designed with that goal in mind. For example, it could mandate how rewriting of a piece of code is enabled (e.g. every C# file to be rewritten has to have some marker in it or have a special extension like .csr), how it's presented in VS, how it's logged or where the rewritten source files are located.

@agocke (explaining @jaredpar's position)

the compiler shouldn't be in the job of code rewriting, because our job is to produce a translation engine from C# to IL, not an arbitrary code generation platform

I don't think it matters much from a user's perspective whether this would be part of the Roslyn project/team or whether it would be a separate project/team. I (and I assume others here, but I can't speak for them) think that this is something that's missing from the .Net ecosystem. And since the community didn't manage to create such a tool, I am asking Microsoft to do it. And I think the Roslyn team is the obvious point of contact and discussion about this, even if making this work would ultimately require creating a separate team (for example).

On the other hand, if the definition of the job of the C# compiler was that narrow, it would still be a black-box csc.exe, not a library for analyzing C# source code or a platform for analyzers and code fixes.

agocke commented 6 years ago

On the other hand, if the definition of the job of the C# compiler was that narrow

I don't think the job of the Roslyn platform is narrow, I just don't think it's an arbitrary code generation platform. I'm sorry if my statements came across as insulting @SimonCropp, I meant them the exact opposite way -- Fody and friends are existing code generation platforms that are good. I just don't think they belong in the compiler pipeline. Code rewriting is an important piece, but it should be one taken deliberately and carefully, with full knowledge of what's going on. By allowing arbitrary codegen in the compiler we have removed our ability to know what code we're generating, which will result in buggy programs and very unclear chains of responsibility. Sometimes it will be a plugin's bug. But sometimes it will be a Roslyn bug. Sometimes it will even be bugs from earlier codegen plugins interacting with the output of another codegen plugin. None of this looks like a robust pipeline to me, and it puts the compiler in the job of intermediating between all of the concerns, which is very much not the compiler's responsibility.

SimonCropp commented 6 years ago

By allowing arbitrary codegen in the compiler we have removed our ability to know what code we're generating, which will result in buggy programs and very unclear chains of responsibility. Sometimes it will be a plugin's bug. But sometimes it will be a Roslyn bug. Sometimes it will even be bugs from earlier codegen plugins interacting with the output of another codegen plugin. None of this looks like a robust pipeline to me, and it puts the compiler in the job of intermediating between all of the concerns, which is very much not the compiler's responsibility.

Replace "compile time" with "runtime", and you have exactly described what the majority of people spend much time on as programmers, debugging interactions between various components. Dont get me wrong this is not ideal, but is the nature of delivering business value. as long as the business value outweighs the friction, then we are ahead in the long run.

I dont see why the compiler should be immune from this equation. It cannot be so black and white, as in "there is no amount of possible value we could deliver to users of roslyn that would convince us to allow people to perform even the slightest amount of code gen". Based on that logic i would like to see the discussion focus on "what can we expose that will deliver significant impact while mitigating the possible negative side effects"

By allowing arbitrary codegen in the compiler we have removed our ability to know what code we're generating, which will result in buggy programs and very unclear chains of responsibility

This would seem to already be the case. As you have already asserted people are using codegen in the wild. The problem is, from my perspective, it is currently the wild west. There is no guidance, little tooling, no agreed apis, no standard conventions. I suspect many of the problems caused codegen are due to this lack.

Ideally i would like to see most (the ones that add real value and are not just me wondering if something is possible) of the fody addins be re-targeted against a supported API from MS. With the end goal of me doing a scorched earth on the Fody core codebase

CyrusNajmabadi commented 6 years ago

This would seem to already be the case. As you have already asserted people are using codegen in the wild. The problem is, from my perspective, it is currently the wild west. There is no guidance, little tooling, no agreed apis, no standard conventions. I suspect many of the problems caused codegen are due to this lack.

Here's my concern about this though. In many of the conversations that have come about on this topic, i've often seen the following back and forth:

Ok. If we were to provide something in this area, we'd need to really ensure the tooling was top notch. We'd have to enforce X, Y and Z. We'd need to make sure that certain experiences had an airtight story (like 'debugging').

To which, the response has been something like:

Oh. You don't need to worry about that. The experience isnt' great. But people put up with it.

So, it's very hard for me to gauge what is and isn't actually important. Or why it is/isn't important to be part of the actual Roslyn compiler platform.

If the existing deficiencies aren't a problem, then why does it need to be part of Roslyn? If it's going to be part of roslyn, i think the team's general position is "it can't be the wild west". but that means then actually putting in all the time and effort to solve those problems. Which means actually investing in all those expensive bits.

xoofx commented 6 years ago

Oh. You don't need to worry about that. The experience isnt' great. But people put up with it. If the existing deficiencies aren't a problem, then why does it need to be part of Roslyn?

Using quotes for something we never said is a bit troubling... We did say that the problem exists but we didn't have the choice other than to live with it. That's different, we wouldn't have had this discussion in the first place if we were not already in the wild west looking for something better

Let me reiterate why this plugin architecture is important to be part of Roslyn:

Compilation time would be largely improved
All code modifier/generators would use the same unified infrastructure (no more custom msbuild tasks, no more hazardous custom IL patching tools)
Using clean SyntaxTree instead of whatever dirty IL rewriter
As simple as using DiagnosticAnalyzer today, by installing a NuGet package
We could have a good debugging experience (in cases where it makes sense)

So, overall a significant better experience compare to what we have today.

From this discussion, it appears that scenarios that could modify the existing code will never get approved by the Roslyn Team while generators adding stuffs could be. It is better than nothing, but it is excluding quite a few chunks of scenarios out there (e.g AOP). I'm probably gonna try to release a lightweight-fork of Roslyn (easily upgrade-able to any new version of Roslyn) with NuGet packages that will allow these kind of compiler plugins, if it can help to federate a bit our wild west.

KathleenDollard commented 6 years ago

Thank you for this list of goals. I think this helps clarify a lot.

I would love to see a similar list of the reasons you think new files cannot solve the problem of editing existing code. Part of the work done prior to the feature cut in 7 was to work to increase the scenarios where separate file generation would solve the problem.

jaredpar commented 6 years ago

@xoofx

Compilation time would be largely improved

I disagree. Build throughput would be improved here but I think compilation time would be slower. This would be moving code today which executes outside the compiler inside it. Hence it won't get better, only worse. But yes build throughput is likely to improve.

We could have a good debugging experience (in cases where it makes sense)

Most designs I think can achieve this but there is significant work involved. I am going to push back on the "when it makes sense" part though. This is essentially "return true" for us. The push back we get from customers when there are gaps in our debugging experience is virtually guaranteed. I'm not basing this on personal whims, but instead my real experiences over the years on this team.

So, overall a significant better experience compare to what we have today.

You've excluded the biggest con of the design though: developers can no longer reason about how the C# code they wrote executes by thinking of C# semantics. It's now up to the decisions of the code generators (even in the cases where the code generators execute exactly as designed). That is likely to be a non-starter for most of the developers involved.

I don't think this is necessary to have a successful code generator story. There are many ways to do non-modifying generators that provide loads of developer productivity and performance enhancements.

My push back on "code is better written like X" is always the same: then write a code fixer. If the code is better expressed as a different C# pattern then encourage developers to just author it that way.

xoofx commented 6 years ago

I disagree. Build throughput would be improved here but I think compilation time would be slower. This would be moving code today which executes outside the compiler inside it. Hence it won't get better, only worse. But yes build throughput is likely to improve.

Yes, I was implicitly talking about the compilation time as a whole (built time in this context), from a user perspective today a user can't separate between the compilation time and the IL patcher time (they are part of their build as a whole), but strictly speaking, that's build time, I agree. My early experiments are showing that build time is a big win using an integrated rewriter into roslyn.

You've excluded the biggest con of the design though: developers can no longer reason about how the C# code they wrote executes by thinking of C# semantics. It's now up to the decisions of the code generators (even in the cases where the code generators execute exactly as designed). That is likely to be a non-starter for most of the developers involved.

I disagree (but you know that 😉 ). I don't know which "most of the developers involved." you are referring to. From the beginning, I'm talking about users that are using products that are doing already codegen IL patching to their assemblies (AOP users, serializers done by a product like Xenko...etc.). These developers using these products never ever came to us saying "Oh, you are modifying the output of Roslyn, it's a huge deal breaker, I won't use your product"

Anyway, I understand the Roslyn team position. So, I'm currently working towards providing a lightweight fork of Roslyn that will provide this compiler plugin infrastructure. People will have to explicitly reference this new compiler in order to use it (though a nuget package). I believe that it will serve as a great playground and provide practical feedback on the subject, without putting Roslyn at risk.

SimonCropp commented 6 years ago

@KathleenDollard

I would love to see a similar list of the reasons you think new files cannot solve the problem of editing existing code. Part of the work done prior to the feature cut in 7 was to work to increase the scenarios where separate file generation would solve the problem.

If i understand your question correctly, you are referring to the approach of partials (classes and methods) as sibling files for code gen?

Some restrictions/scenarios are i can think of

Partial methods are implicitly private, and therefore they cannot be virtual. This often means u cannot have you code base in a compilable state, and hence cannot get to the code gen phase.
Property changed injection
Method timing
Logging injection. eg https://github.com/Fody/Anotar#your-code
IDisposable injection. eg https://github.com/Fody/Janitor#your-code
Default all async code to .ConfigureAwait(false);. eg https://github.com/Fody/ConfigureAwait#your-code

KathleenDollard commented 6 years ago

OK, we're talking each other on this point, my fault for not being clear about what the feature work cut in 7 was. I don't think anyone will argue that the current partials is the right answer.

I just went back and found a design idea on how to have generated code replace code. Here's the code:

class C
{
  public void M()
  {
    // do stuff
  }

  [INPC] public int X {
    get { ... }
    set { ... }
}

// generated
partial class C
{
  public supersedes void M()
  {
    // do something
    superseded(); // calls original method
    // do more
  }

  public supersedes int X {
    get { return superseded; }
    set 
    {
      superseded = value;
      RaiseNPC(nameof(X));
    }
  }
}

I believe this works for non-private, property changed, logging and Disposable injection. I am not sure what method timing is (if it's perf instrumentation, then yes, it would work).

So, the most important of the ones you listed is the last. The supersedes proposal will not help that case. Thank you, that was what I was interested in.

I'll have to noodle on that and thinking of similar scenarios, because I would like to see that particular problem permanently solved with analyzers, because I remain in the camp that the code itself should be correct and understandable by anyone reading it and generation be an enhancement programmers can easily find. But this is an opinion.

svick commented 6 years ago

@jaredpar

My push back on "code is better written like X" is always the same: then write a code fixer. If the code is better expressed as a different C# pattern then encourage developers to just author it that way.

Except there is more than one way in which code can be "better".

For the example of LINQ to foreach optimization, that results in code that performs better, but is harder to understand and maintain. Which is why I would prefer to have LINQ in the source code I'm editing and foreach in the code that's executing.

A code fix is a half-measure: I can use a LINQ query when I first write the code, but then I have to convert it for foreach and from then on, always read and maintain it as foreach. That's better than nothing, but not good enough.

xoofx commented 6 years ago

FYI, I just released the Conan compiler, a lightweight fork of the .NET Compiler Platform ("Roslyn") by adding a compiler plugin infrastructure (as described in this discussion as the "post-compilation" solution)

71 commented 6 years ago

Since we're mentioning PoCs, I thought I'd mention Cometary, which also adds compiler plugins to Roslyn. Instead of being a fork of the compiler, however, it's a simple analyzer that hooks into the inner workings of Roslyn when it is loaded by it, and then rewrites things to load other plugins in memory.

stakx commented 6 years ago

Citing from the "Roslyn Overview" wiki page (with emphasis added by me):

This is the core mission of Roslyn: opening up the black boxes and allowing tools and end users to share in the wealth of information compilers have about our code. Instead of being opaque source-code-in and object-code-out translators, through Roslyn, compilers become platforms—APIs that you can use for code related tasks in your tools and applications.

The transition to compilers as platforms dramatically lowers the barrier to entry for creating code focused tools and applications. It creates many opportunities for innovation in areas such as meta-programming, code generation and transformation, [...].

Having read this issue, I cannot help but thinking that all of this innovation could happen a lot easier if Roslyn provided suitable extensibility points, rather than forcing everyone to come up with their own workaround solutions (inevitable bugs included).

Roslyn's "core mission" was to open up the compiler and enable new scenarios (and simplify existing ones)... why stop now?

71 commented 6 years ago

Absolutely. Without counting Cometary which is more of a hack, I know two different projects that attempt to bring something close to metaprogramming and/or compiler plugins by forking Roslyn: Conan and StackExchange.Precompilation.

Furthermore, the most popular tool right now for extending the compilation process is Fody, which has the advantage of working on all of .NET, but the disadvantage of only allowing the IL to be modified.

jaredpar commented 6 years ago

Having read this issue, I cannot help but thinking that all of this innovation could happen a lot easier if Roslyn provided suitable extensibility points, rather than forcing everyone to come up with their own workaround solutions

This issue is about finding the right extensibility points. As has been said several times already: we want to add generators to the language. In fact we spent a considerable amount of time in the Dev15 time frame doing exactly that. At the same time we have to find a solution that meets the expectations of our users.

SamPruden commented 5 years ago

It's a shame to see this stalled again at the moment.

This issue is about finding the right extensibility points. As has been said several times already: we want to add generators to the language. In fact we spent a considerable amount of time in the Dev15 time frame doing exactly that. At the same time we have to find a solution that meets the expectations of our users.

The issue does seem to be a bit more than that. People are asking for a powerful low level code transformation and generation extensibility point in the compiler. As @stakx said, this could bring uniformity and ease of use to the existing set of post-processing tools, and it does seem to fall within Roslyn's stated goals of being a true CaaS.

The resistance to that request does not actually seem to come from the design challenges on the IDE side, but from some fear that this feature might somehow be too powerful and that people will use it badly. As somebody who would like to be able to write these types of transformations, that feels like it's babying me a bit - any sufficiently powerful tool will inevitably allow me to break things if I use it wrong. I don't understand the distinction between it being okay that this is available through third-party post-processors but not okay when using an official API.

I can actually see one reason why this could be an issue, and I haven't seen it mentioned here so I'll lay it out. In a typical IL rewriter scenario, the person using the tool is expected to be very aware that they're doing so - it's always a conscious and considered decision. However, if rewriters could ship through nuget alongside libraries, we would have a lot more situations where somebody is using a transformation inadvertently. This particularly applies to the LINQ-like API optimisations discussed.

dotnet / roslyn

Reboot build chain codegen #19505