Champion "Replace/original and code generation extensions"

gafter commented 7 years ago

[ ] Proposal added
[x] Discussed in LDM
[ ] Decision in LDM
[ ] Finalized (done, rejected, inactive)
[ ] Spec'ed

Independent generation

Generate artifacts independent and before compile/build. Real artifacts exist on disk, generally in spaces not checked in to source control.

Example: T4

IDE issues vanish because there is no difference between generated and hand written code
It requires a gesture or a watcher - so programmers don't have to think about generation
Some issues would be more graceful if generated code was identified: accidental editing (now solvable) and avoiding refactoring generated code
People associate this approach with some negative history:
- T4 had a horrible execution model (historic)
- Previous versions (that I've seen, except for a spike I did) have been text base - text base is fundamentally broken for a mixed language world (solvable)
- Insufficient delineation of code that runs for the generator and outputted code (solvable)
- Most folks didn't discover pre-compiled templates
- Outdated architectures - in today's world any generation is the capstone (small, critical piece) of an architecture using the many tools we didn't have in 2001
Programmer can learn from artifacts

Early generation

Generate artifacts just before compile publish. Real artifacts exist on disk. Unless something has changed, this is Razor. I think this has aspects of the adjacent approaches and is mostly valuable in tooling, and I don't understand how it works with IDE compilation, so I'm skipping it.

During compilation ("source generation")

Alter the compilation in flight. If new artifacts exist, they are ethereal and if they exist on disk it's an optimization.

Examples: None, we can't do this today

IDE issues
Generation is a natural part of the inner loop for the programmer, they don't have to think about generation
There is no real artifact for the programmer to accidentally change
Either requires writing against the compilation, or uses a generation language we have not yet created (creating templates requires skill)
Programmer cannot see or learn from artifacts and if IDE problems are not handled well, the programmer may have difficulty understanding the execution of their program

Late compilation/IL generation

Do not generate artifacts, but change what IL is emitted.

Examples: PostSharp usage, optimization, async/await, @AnthonyDGreen's vblang #282 Design Safari

Some IDE issues are finessed because the programmer "doesn't need to know about" the generated code
- Debugging arguably presents problems with the "doesn't need to know about" philosophy
Generation is a natural part of the programmer's process, they don't have to think about generation
There is no real artifact for the programmer to accidentally change
Requires writing to the compilation or IL (creating templates requires skill)
Programmer cannot see or learn from artifacts and the programmer may have difficulty understanding the execution of their program

Personal opinion:

We have many approaches to reducing boilerplate code today. Generic inheritance for implementations and functional programming greatly reduce the need for generated code
After all the above refactorings, there remains some boiler plate code that follows relatively narrow well defined patterns that can be solved with IL generation/AOP. async/await and maybe INotifyPropertyChanged are poster children (*)
After all above refactoring and above hidden pattern application (AOP), there remains some code that is boilerplate. Much of this code tends to have a heavy reliance on metadata with relatively simple patterns (EF).

When I look at the problem this way, I simply can't see justification for the high level of effort in source generation in Roslyn.

I've tried. I asked for source generation as the core feature of Roslyn when I first heard of the project (thankfully they ignored me). I absolutely love the vision. But, the execution would have been tricky in the old world where the compiler wasn't a core feature of the IDE. In the new world, even if the problems with the Visual Studio IDE were solved, what about OSS? How could a generative approach be considered real if it didn't work in every editor on every platform?

I would like instead to see good support for independent generation:

"replace"/similar language constructs to allow replacing generated code
better identification of generated status within code
IDE showing which code is generated (watermark/etc & Solution/File explorer)
IDE understanding generated code in key actions (refactoring, replace, optionally search)
watcher becoming an integral part of inner loop or a trigger for out of proc action on compilation
default generator
leadership/guidelines on metadata patterns
spike on a new cross-language generation approach/language

I also feel a deep passion about generative code being a route for programmer learning at many levels.

There are a number of places in .NET that would benefit, including EF and other generation along with scaffolding and project templates replaced with reentrant generation.

I believe the amazing outcome of this work would be community driven architectures. Where tools have allowed this, cool stuff happened (FoxPro, CodeSmith, etc). I want us to do that again.

(*) For AOP, I like a variation of @AnthonyDGreen's vblang #282 Design Safari that I spiked. The variation of Anthony's approach is to demand that all code exist to allow stepping. The attribute methods overload on interfaces, and some other tweaks. It allows programmers to use AOP with almost no new concepts and makes the debugging experience much simpler to create.

Apologies for the length of this comment

orthoxerox commented 6 years ago

@KathleenDollard thanks a lot for the extensive summary!

I have just one pressing question: what is "spike"? I suppose it's some Microspeak, but Raymond Chen hasn't covered it yet in his blog.

KathleenDollard commented 6 years ago

Spike (software development), a small task done to reduce uncertainty about a larger task (wikipedia.org, which is an attribution, not a judgement. Thanks for asking)

xoofx commented 6 years ago

@KathleenDollard thank you for this good summary 👍

I'm perfectly fine with T4 for what it can and cannot do. I'm using it sometimes in my projects and I'm not looking for improvement there. It is doing its job.

I would like instead to see good support for independent generation:

I would like to understand:

What kind of problem this is solving exactly?
Who has been requesting this feature? I'm not able to find a post with an actual library relying on this for its end-users (with a wide impact indirectly on end-users).
Which OSS projects today is actually looking for a better support for this?

Most of the concrete use cases we have been discussing in the posts are coming from Serializers/RPC (@ReubenBond, myself) , ORM (@mgravell) , AOP (a few folks around, including myself)....etc. that are done today mostly through post compilation steps via IL patching (or worse, via reflection emit at runtime) but could be done a lot more efficiently through a Roslyn integration (no debug, no ide, no intellisense, no project-system). I have also usecases related to UI (similar to WPF) which requires "During compilation", but let's say that I could live without that.

I'm really failing to see how independent generation respond to our requirements which are roughly:

We want to be able to generate code based on existing code (that can even modify existing code)
We don't want users to access the generated code from their project (they have ILSpy for that), or to store this code in the project/sourcecontrol, nor to be editable. They are mainly for post scenarios. Worse, if we are making an evolution to the underlying generated code (e.g removing a method), by just upgrading the package, we don't want users to have an un-compilable project.
We are not looking for improving the debugging experience (there are tools to step into post generated IL code actually)
We want to have a very fast compilation experience that would not require to re-parse the files, or re-read all the assemblies, multiple times (because today, we don't have any standard, so each solution has to queue an IL patcher to the build process via custom msbuild target files)
We want to have a standard and integrated way for libraries to provide such a compilation step through a NuGet package (ala Diagnostic analyzers)
We want to work at a higher level than IL level
We want a transparent and fluid experience for the users of our library/nuget compiler plugin

We know that this could be done easily, just with a few weeks of work, inside Roslyn, and would help many of our projects.

KathleenDollard commented 6 years ago

@xoofx End to start:

We know that this could be done easily, just with a few weeks of work, inside Roslyn, and would help many of our projects.

I can't comment on how much work, but I have to say that I don't think doing this halfway is a good idea. It does help a small group of folks vocal here, but I think putting a solution in Roslyn should have broader benefit.

We want to be able to generate code based on existing code (that can even modify existing code)

I think available metadata should include the compilation, so agree on that. I'm not yet sold on modifying existing code.

We don't want users to access the generated code from their project (they have ILSpy for that), or to store this code in the project/sourcecontrol, nor to be editable. They are mainly for post scenarios. Worse, if we are making an evolution to the underlying generated code (e.g removing a method), by just upgrading the package, we don't want users to have an un-compilable project. We are not looking for improving the debugging experience (there are tools to step into post generated IL code actually)

We'll just have to agree to disagree on whether programmers should have easy direct access to the code that runs in production and be able to easily understand and debug it.

We want to have a very fast compilation experience that would not require to re-parse the files, or re-read all the assemblies, multiple times (because today, we don't have any standard, so each solution has to queue an IL patcher to the build process via custom msbuild target files)

Pre-compilation independent generation out of proc runs at the speed of the user and does not affect compilation speed (except having a slightly larger project to parse). I understand this as a comment on today's IL generation scenario.

We want to have a standard and integrated way for libraries to provide such a compilation step through a NuGet package (ala Diagnostic analyzers) We want to work at a higher level than IL level We want a transparent and fluid experience for the users of our library/nuget compiler plugin

Did you look at @AnthonyDGreen's Design Safari approach to an AOP like experience? It would allow you to add attributes (via a NuGet package) that added code at emit time in a very controlled manner. The guard rails I have in mind for that might be too tight. But if you have a scenario you think would not work I'd be happy to noodle on it.

Most of the concrete use cases we have been discussing in the posts are coming from Serializers/RPC (@ReubenBond, myself) , ORM (@mgravell) , AOP (a few folks around, including myself)...

I don't see an issue, but I assume you fleshed out your thoughts on inconsistencies with an independent approach in the bullet points I already answered.

I'd really like you to take a look at the vblang issue. It's AOP via C# code. If it only handles INotifyPropertyChanged, it's viability is quite a bit different than if it also handles your pain points.

If IL is painful for authors, and source-generation is seen as a bigger issue by the team, can we finesse by having attributes actually tell us what they will do and inserting that code.

xoofx commented 6 years ago

Did you look at @AnthonyDGreen's Design Safari approach to an AOP like experience? It would allow you to add attributes (via a NuGet package) that added code at emit time in a very controlled manner. The guard rails I have in mind for that might be too tight. But if you have a scenario you think would not work I'd be happy to noodle on it. If IL is painful for authors, and source-generation is seen as a bigger issue by the team, can we finesse by having attributes actually tell us what they will do and inserting that code.

What would be these attributes? (like FreezableAttribute?) Would they be hardcoded by Roslyn? Usually, we are working with domain specific attributes, and the compiler rewriter is able to understand them. For serializer, they are usually domain specific (various domains can have a very different way/restrictions to handle serialization). So if you are proposing hardcoded attributes, this is too restricting and not what we are using today.

One thing that I forgot (and the reason why "readonly" post-scenarios are important). We don't want to have reverse dependencies, user code starting to reference generated code... with a pre-step solution like "independent generation" this is a bit more difficult to avoid (though maybe you could provide an analyzer that would throw an error in that case, ok).

Also, pre-compilation-step/"independent generation" will just exclude any versatile AOP scenario (unless the attributes allow this, but I don't see how, as AOP is by essence a post-step operation)

But fair enough, if you are implying that "independent generation" is the only plan that is accepted by the Roslyn Team, that's of course better than having nothing, and it will certainly help a few projects out there. But it is going to be a lot more involving than a post-step compilation (you need a close IDE integration story, a way to remove files previously generated by plugins, a way for the plugin to not take into account the generated files when re-generating...etc.), and unlikely to happen in the coming months if not years (let's come back here in one year to challenge this).

But hey, thinking about it, we could have both, post-step done now, pre-step done whenever, and we would be more than happy! 😉

Anyway, on our side, at Unity, we have already started to use our own Roslyn based C# compiler, so we will extend the compilation pipeline from there instead.

agocke commented 6 years ago

Just to say once more for the record, I think we can do source generation, although everyone's pet feature may not be there, and it's tractable to make it work everywhere with a good experience. I think that we will eventually do it, it's simply a matter of priority. We would basically have to cancel C# 8 to get it working in the near-term

KathleenDollard commented 6 years ago

@xoofx I am certainly not saying that pre-compilation is the only plan accepted by the Roslyn Team. We have accepted nothing, but individual members have been thinking hard on this topic, mostly for a long time.

vblang Design Safari has some rules for accessing the methods on the attributes and some conventions, but the choice of attributes and their naming is up to you. You're entirely in control, and I would not otherwise bring it up in a conversation on generative techniques.

As @agocke points out, the source generated solution is a lot of work, not impossible. Personally, I'd like to see consideration of both the vblang #282 approach and the pre-compilation approach. I think they cross up at the middle enough that if source generation is delayed a long time, most scenarios have an avenue forward.

xoofx commented 6 years ago

@KathleenDollard Forgot also a few comments to challenge the "independent generation" approach, I don't need necessarily answers now, but don't forget these cases:

We'll just have to agree to disagree on whether programmers should have easy direct access to the code that runs in production and be able to easily understand and debug it.

For sure, in general, I will prefer a code debuggable rather than a non debuggable code.

But have you heard a story with PostSharp users frustrated by this that it would make the product completely unreliable because generated code is not production compatible if it is not debuggable? I'm all for a good debugging experience, but there are also scenarios where the generated debug code is not where the debugging experience is meaningful, where the generated code is rock solid (battery tested), while user code is usually... not.

Also, don't forget merging branches of people working concurrently on a code base with pre-step generators: It will generate merging conflicts on existing generated code (assuming that the code is checked-in in the repository), conflicts that can't be merged without re-generating the generated code after the merge (oh, and you could merge a code that have no conflicts, but is wrong, because the generated code would be different... )

Pre-compilation independent generation out of proc runs at the speed of the user and does not affect compilation speed (except having a slightly larger project to parse). I understand this as a comment on today's IL generation scenario.

What is going to be editing life cycle? How does the generation is going to be triggered? (on idle?) What if you generate serializers in your project and you remove a field that is used by the serializer? If a users has edited some generated code, should it be overwritten entirely or not? What is going to be the sync policies? What if you rename a field? It will change the generated code (or not) but it should re-trigger a generation? There will be some serious IDE integration story here, quite close to what "source generation" is also trying to challenge.

KathleenDollard commented 6 years ago

@xoofx In my limited experience, PostSharp users are working with a subset of generated code that is rock solid. It seems much harder for project specific generation (with a need to debug, etc) to work in a late-generation scenario - thus making it good for a subset of problems (I hope I already said that).

The question of merge I think is the tip of a broader conversation. All of the portions of the generation need to be available for CI - thus, redoing the generation should not be hard. The merge conflict should occur on the metadata - which I hope is checked in too. It's not just the merge to have a clear story on.

It's important users don't edit generated code. Some of my points about why I think there is some work to be done on a pre-compilation/independent generation approach were due to this. Programmers and source control rules need to know what code is generated and treat it that way. If we stick to a file level, extensions work pretty well.

I don't see being able to trigger an external process on an analyzer as being as complicated to build as source generation, but we really have not explored this approach with the team. I tried to mark my earlier comments as my opinion, and I'm sorry if that wasn't clear. We haven't worked out the details of any approach, except @agocke work on source-generation.

Is it clear that I think AOP and pre-compilation are each valid in a set of scenarios?

m0sa commented 6 years ago

We'll just have to agree to disagree on whether programmers should have easy direct access to the code that runs in production and be able to easily understand and debug it.

FWIW compiler optimizations in release builds are not exactly easily understandable and debuggable either...

xoofx commented 6 years ago

Is it clear that I think AOP and pre-compilation are each valid in a set of scenarios?

Yes, and I was actually half joking when saying "we could have both, post-step done now, pre-step done whenever, and we would be more than happy"

1) Post-compilation:

Pros: Allow AOP scenario and covers most of ORM/Serializers/RPC codegen scenarios that are already used today. Very easy to implement in Roslyn (one person, maybe two to plug the stuffs in msbuild targets to pass the arguments coming from the compiler-rewriters to csc.exe, as it is done currently for diagnostic analyzers)
- Cons: Not debuggable 2) Pre-compilation / source generator:
Pros: Debuggable, you can inspect codegen without going through ILSpy
- Cons: No AOP scenarios. A lot more complex to develop (ide, intellisense, sync...etc., lots of teams to involved), not even sure it will make it one day.

I'm pretty sure that the community would be fine to make a PR for 1) (next week?), while for 2) it is barely impossible. But from the discussions, 1) seems to be considered as an entire no-go (nothing has changed since discussions happened more than one year ago). I'm very practical when it comes to deliver things and so, I'm highly skeptical that 2) will ever make it (plus the fact that it doesn't cover AOP scenarios).

m0sa commented 6 years ago

I'd argue that anything that has a run-time implementation, which works w/o post-compilation, and wants to do post-compilation steps, is an optimization. ORM/Serializers as well as a lot of AOP in the .NET ecosystem would fall under this definition.

What most folks use locally (when developing / debugging simple stuff) is the not-optimized / debug build (see my point above, you can't event step through a simple for loop in it's entirety if you debug the release build with optimizations turned on). Unfortunately, I know a lot of people who are used to dealing with behavior mismatches between debug / release builds, and go digging into IL / JITed code when they stumble across such a case.

It feels like that the main pain point for debugging in this (optimization) scenario is the #line directive, especially when there is a chain of transformations on a single file. To address this, I'd take inspiration from source-maps. They work nicely throughout the pipeline (e.g. typescript -> javascript -> minifiers/optimizers -> bundlers). The original / intermediate files can nowadays be stored as embedded sources in PDBs (https://github.com/dotnet/roslyn/issues/12625)

mgravell commented 6 years ago

We'll just have to agree to disagree on whether programmers should have easy direct access to the code that runs in production and be able to easily understand and debug it.

Compare to the situation today. The people using the tools that we're talking about already don't have that. Nobody has complained. Yes, I understand the issue of rogue tooling doing nasty things, but you pretty much have that today the moment you install any package. Being readable source code isn't the thing that adds safety.

LokiMidgard commented 6 years ago

The people using the tools that we're talking about already don't have that. Nobody has complained.

I don't like the way how IL manipulation works. But it is the only way to do some tasks (like replacing a method with another implementation e.g. INotifyProeprtyChanged).
But I would never complain that a project uses IL manipulation. After all nobody forces me to use it.

That saied I would like to see the original replace feature implemented even without source code generation support in the compiler, vs and intelisense. There are already usecases (INotifyPropertyChanged may be the prominetesd) where we dont need support to generate code on every keystroke. If no new methods/clases are generated ony ones that are there replaced. those two keywords would be enogh to have a value. The only reason I see to stop this untill the source code generating stuff is done, if we create problems with how the keywords are implemented.

In addition there are already source code generating librarys out there which could benefit from this. Including those integrated in Visual Studio today. E,g, The Windows Form designer creates the void Dispose(bool disposing) method, you cannot change it because generated files tend to be regenerated and overiding you changes, you can not override it, for that you would need to inherited your from. What you want is replace the method. So the two keywords can not only benefit new tools but also existing ones.

amis92 commented 6 years ago

Undebuggable (?) compiler-generated, well-known and well-defined desugared code (often with support for sugar debugging, as is the case with e.g. iterators and async) cannot be compared to hiding externally generated source code. It was said a couple of times already, but the expectations differ hugely between compiler platform and 3rd parties.

I'm definitely expecting a full-blown, debug-supported, source code based, and tooling-friendly scenario from the compiler platform. Cutting down a couple of compilations isn't worth enabling poor-man's integration within compiler. This platform's strength is it's integrity as well, and I feel like this integrity would be a little broken if it enabled magical weaving of source trees that doesn't have reflection in source code. You can do that today (Fody etc), but it's not what I'd like to see flourish in decades to come, so to speak.

That said, I'd also like to ask what AOP scenarios can't be covered by source generation as proposed in OP?

Pre-compilation / source generator:

Pros: Debuggable, you can inspect codegen without going through ILSpy

Cons: No AOP scenarios. [...]

popcatalin81 commented 6 years ago

Being readable source code isn't the thing that adds safety.

It's very important, I personally, would say it's Paramount. Otherwise, you might push users into debugging hell, considering how much a generator can modify the code and alter the semantics.

(I'm not implying that fair playing library designers would do this intentionally ...)

Edit:

On the other hand, we've used reflection emit successfully over the years, so black box reflection emit code generation is not unusable for the general case, because of the restrictions it inherently has IE: cannot modify existing code unless it was an explicit override point exposed like virtual properties.

xoofx commented 6 years ago

That said, I'd also like to ask what AOP scenarios can't be covered by source generation as proposed in OP?

@amis92 sure, almost all AOP scenarios are incompatible with it. Source generation is assuming that the generated source is a strong part of the project right? The generated code is accessible by your code (which is a chicken egg problem, so you need to assume that there is some pre-compilation step involved here)

Now, what's happening if you have a file Program.cs which an AOP is going to emit a pre and post callback on it:

public static void Main()
{
     Console.WriteLine("Hello World");
}

AOP is supposed to modify your code to write this, by somehow rewriting Program.cs by a virtual Program1.cs:

public static void Main()
{
#line hidden
     CallSomeCodeProlog();
#line 3 "Program.cs"
     Console.WriteLine("Hello World");
#line hidden
     CallSomeCodeEpilog();
}

So assuming that this file is part of your project now. How can compilation work now that we have Program.cs and Program1.cs? That's the whole point of AOP: you can't have it as part of your project because it is a post-processing step that can modify existing code. Same apply if you want to produce a serializer library that will transform existing class with "partial," access private fields...etc. You would have to modify the existing code.

Post-compilation steps allow this kind of scenarios because modified file are not part of your project, but they still can be debugged (because you can dump them to obj folder and use proper #line pragma as I have shown in the example above)

Again, we are not talking into the vacuum of a theory here, we have been practicing this for years (with thousands of users reached indirectly through our products)

xoofx commented 6 years ago

The problem with Source Code generator described here, is that it is very very similar to what we would do with a post-compilation step (similar API to rewrite Compilation, replace SyntaxTree, output to obj files, debuggable...etc.) but the difference is that the generated code is part of your project and user can access it, so it is a pre-compilation step in the end. And that's very different, because it is bringing a lot more trouble for the IDE story (and even more problems that we are not looking for, like generating serializers on every single key stroke or on whatever agreement on the sync it would do)

amis92 commented 6 years ago

@xoofx I'd imagine two solutions:

Rewrite and replace completely

// Program.generated.cs
public static replace void Main()
{
#line hidden
    CallSomeCodeProlog();
#line 3 "Program.cs"
    Console.WriteLine("Hello World");
#line hidden
    CallSomeCodeEpilog();
}

Call back to the original

// Program.generated.cs
public static replace void Main()
{
#line hidden
    CallSomeCodeProlog();
#line 3 "Program.cs"
    original();
#line hidden
    CallSomeCodeEpilog();
}

The developer story would be that the IDE detects complete method replacement and shows the Main in Program.cs greyed out or sth with clear visual that the method is replaced and a navigable list of replacements that call into original and single base call (not necessarily from the original source file).

I do assume that we have replace/original keyword support, and multiple generator support (multiple replacements of single member), which both are suggested in the original proposal. I don't think this feature can exist without these additions.

I'd imagine that multiple replacement would be resolved by assembly-level attribute that defines the order in which generators' replacements are called into each other.

// AssemblyInfo.cs

[assembly: SourceGeneratorsOrder(
    typeof(BoundaryLoggerGenerator), // outer-most
    typeof(ValidationGenerator), // middle
    typeof(LinqRewriterGenerator) // inner
)]

// Program.cs
public partial class Program
{
    public void Print(int[] numbers)
    {
        var query = from x in numbers
                    where x > 3
                    select x.ToString();
        query.Select(Console.WriteLine);
    }
}

// Program.LinqRewriterGenerator.generated.cs
partial class Program
{
    replace void Print(int[] numbers)
    {
        foreach (var x in new[] {1,2,3,4})
        {
            if (x > 3)
            {
                Console.WriteLine(x.ToString());
            }
        }
    }
}

// Program.ValidationGenerator.generated.cs
partial class Program
{
    replace void Print(int[] numbers)
    {
        if (numbers == null)
            throw new ArgumentNullException(nameof(numbers));
        original(numbers);
    }
}

// Program.BoundaryLoggerGenerator.generated.cs
partial class Program
{
    replace void Print(int[] numbers)
    {
        Console.WriteLine("Entering Program.Print(int[])");
        original(numbers);
        Console.WriteLine("Exiting Program.Print(int[])");
    }
}

And the result of the call to Program.Print would be:

Program.Print // Program.BoundaryLoggerGenerator.generated.cs calls:
Program.Print // Program.ValidationGenerator.generated.cs calls:
Program.Print // Program.LinqRewriterGenerator.generated.cs doesn't call any other original/replacement

The original code was never called. If it was called, how would you debug the query.Select invocation?

xoofx commented 6 years ago

I do assume that we have replace/original keyword support,

That's a requirement that AOP doesn't want nor you would like to have this for serliazers or ORM mapper or....etc. The purpose of AOP (or serializers...etc.) is to be able to work horizontally/vertically on your code without you having to modify your codebase. It would mean that you would have to make all method or type "replace" by default. That's not sustainable.Plus the case I used with Program.cs is really too simple. When you start to have other things in the same file that are not modified or partial class...etc. It is going to be just a nightmare to get all of this working.

And what you propose here, is actually serving the whole point of why post compilation is a lot more easier. Your solution is bringing such a complexity for IDE integration, intellisense, project system...etc. and is basically why the whole codegen issue is being blocked (including the post-compilation scenario)

We have already a solution (post-compilation) that is perfectly viable, proven working (by existing solutions like IL patching that are less flexible), allow debugging and has zero impact on IDE/project-system...etc.

xoofx commented 6 years ago

Again, not saying that pre-compilation doesn't bring interesting scenario (e.g access to generated code, useful for UI framework for example) nor that it is either a fight between pre or post. Both have their goals and should exist on their own. But in practice, post-compilation is the only one that could make it in a very short term, assuming that the Roslyn team agree with that (instead of trying to fit everything into a single solution, that everybody sees as so complex that it will probably not make it)

LokiMidgard commented 6 years ago

If I understood it correctly ther would be no requirement to mark all your methods with replace. Only a method replacing another (which will in most cases be generated) would use this keyword and an replace keyword to call the original one. It is not like virtual

And I actually don't see the difference between post compile IL magic and pre-compile source code generation. As far as I know one issue is that there is a problem with performance of generating when it will be used by intelisense. It does not matter if a method will be created using IL manipulation or generating of source code. In both cases the the worst case is that the generator runs on every keystroke to update the methods that apear in intelisense. Post compile generating does not change this. The only thing that would change this is, we are not allowed to generate new stuff but only to replace already existing one. In this case again it does not matter if post or pre compile generation is used.

xoofx commented 6 years ago

And I actually don't see the difference between post compile IL magic and pre-compile source code generation.

There is a structural difference: pre-compilation allows the user code to reference generated code, while post-compilation doesn't allow this. The implication of this is following:

As far as I know one issue is that there is a problem with performance of generating when it will be used by intelisense. It does not matter if a method will be created using IL manipulation or generating of source code. In both cases the the worst case is that the generator runs on every keystroke to update the methods that apear in intelisense. Post compile generating does not change this.

That's incorrect. Post-compile doesn't have to happen on IDE time and is only relevant at compile time. No IDE integration is needed at all (except the debugging side that doesn't need modification to the IDE as it is done already through PDB loading)

LokiMidgard commented 6 years ago

There is a structural difference: pre-compilation allows the user code to reference generated code, while post-compilation doesn't allow this.

That's not nessasary true. You could ignore all generated source code so it will only used in compilation and not interfere with the IDE expirience. I'm not sure if this works the other directions. You could defenetly predict the outcome of a generator and add intelisense for those methods, but I don't think the compiler allows to compile against methods that would exist after post il modification step.

I'm actually not a fan of IL manipulation, and try to avoid it if possible. So I must admit my perspective is a litle biased I think. Normaly I use custom tools that generate source code before compile. But as soon as I need to replace something, like propertys for INotifyPropertyChanged I currently have no good alternative. So those two keywords would help me alot even without first party support for code generator (still would like to see that).

Liander commented 6 years ago

@mattwar, @CyrusNajmabadi It is a long thread but as I understand, it is still VS tooling that is the roadblock preventing progress. Is there a way that one can divide this feature into smaller pieces to remove the blocking? When I was thinking about ways to simplify I thought of the following that you can look at for inspiration: 1) Support only method-body rewrite as part of compilation and see the resulting generated code as external code to your project in the IDE. The result is that you won’t get any changes in signatures/meta-data, and having the generated code not seen as part of the project from the IDE tooling perspective should make it invisible for refactoring, intellisense, navigation, etc. As for the debug experience it could behave much like ‘just-my-code’ debugging where the generated pieces are marked hidden and the original pieces are referenced with line directives. What I am aiming for here is that there won’t be hardly any IDE changes at all, while getting an IDE experience that is comparable with the ‘just-my-code’ experience. 2) Extend the debugging experience in the IDE by being able to deselect a flag like “Enable Just My Code in Aspect Methods” which could behave like its sibling but in this case the pdb-file is not downloaded but instead being generated from a version with no hidden line directives. You would then be able to debug in a read-only sense just like debugging other external code. 3) As last step you would include extending types and source code with new signatures as you have planned, like with partial classes. This is the piece that you are struggling with, isn’t it?

I think the last step is the least valuable piece of this feature which is the piece that involves the IDE the most. You would get very far with rewriting just existing methods. Most aspects have that as the only requirement. Some aspects however, like INotifyPropertyChange, would require that the user has defined some class with the dependent signature, like a property of type PropertyChangedEventHandler in this case. I think that is very reasonable thing to provide and I wouldn’t mind that at all. It would be a fantastic first cut of this feature.

My biggest take-away is that I want to see requirements being specified that makes the IDE-blocking go away and have a first aim for something simpler yet almost equally powerful.

jmarolf commented 6 years ago

@Liander if you don't want IDE support you can do https://github.com/AArnott/CodeGeneration.Roslyn with MSBuild tasks and roslyn without any compiler changes. What scenarios do you need for which this is insufficient if IDE support is unnecessary?

Liander commented 6 years ago

@jmarolf I do want to use the IDE, and as I understand it that is blocked because the feature is too heavy on it. That is the reason for requesting a somewhat stripped handling.

default0 commented 5 years ago

I haven't really read all 100+ comments on this issue, but most of the use-cases of custom per-project code generation can be met today with Build Tasks + CS-Script (ie .csx Intellisense support). I wrote a private NuGet package that adds a build task that looks for .csx files in a project and executes them while providing them with context of the projects handwritten source code (thanks Roslyn <3) and an ability to emit codefiles. One downside as of now is that generated codefiles have to be added to the project by hand, but aside from tooling support obviously being somewhat subpar (.csx editing experience in VS 2017 is pretty subpar w/o CS-Script currently), this seems to address most major concerns: -Custom code generators can easily be written (it's as simple as adding a .csx-file to the project) -You get intellisense for generated code (VS just sees it as yet another .cs-file in your project) -You can handle it properly through git/version control (just make sure the generated source files are named "Something.generated.cs" and add a *.generated.cs pattern to .gitignore to avoid conflicts on generated code) -You can package and ship more general code generators (INotifyPropertyChanged et al) to others just as easily (since code-generators are recognized, loaded & executed from referenced assemblies of the project automatically by the same build task as well)

The main reasons why this approach falls short of a "built-in" solution are that you have to trigger a compilation to update generated code, which can be annoying (but should usually not be that big of a deal), that - as mentioned earlier - Tooling-Support for .csx files is in a pretty sorry state without 3rd party extensions as of now and that custom code generators in a project cannot have symbolic references to handwritten code in the project they are generating code for (because that would create a circular dependency for the compilation) and finally that you don't have replace/original so you sometimes have to opt for a bit less nice handwritten code as basis for code generators (fe declaring [PropertyChanged] private int property; instead of public replace int Property { get; set; } for the code generator to successfully auto-complete your implementation of INotifyPropertyChanged).

My main point is that I do not see a whole lot of value in a full-blown effort for code generators since the way in which they offer to address the paradigm does not offer much above what is already possible with just MSBuild and a bit of clever NuGet magic - so I'd prefer the C# team either rethink the way they wanna spend their efforts (since the current discussion seems like they are planning to do a lot of work long-term for something that is quite possible today already) or just design/add something for this paradigm that isn't already possible, like single-file inline code generation (think dlang's "mixin").

The current needs are imho best served by enabling better tooling around library solutions for this paradigm (fe easier "nesting" of files á la Something.cs > Something.designer.cs in Visual Studio (ie not having to edit the project file by hand for it every single time)), not by adding keywords to the language.

Pzixel commented 5 years ago

@default0 did you see https://github.com/AArnott/CodeGeneration.Roslyn ? I've done several project using it, e.g. this and this. It does it job pretty well, but some built-in feature (like good macro system) would be really appreciated.

default0 commented 5 years ago

I did, and it's part of the reason why I'm saying that replace/original and the current codegen proposal can largely be met by libraries + NuGet packages that are possible to implement today :-)

LokiMidgard commented 5 years ago

Esspecially if you write a library with https://github.com/AArnott/CodeGeneration.Roslyn it would be nice to have replace/origin feature.

Pzixel commented 5 years ago

There are still issues with depdendencies and IDE features... I'd like Rust approach much better, when you have distinct test/build/runtime dependencies, where you can write a procedural macro that is first-class sitizen instead of handcraft shmuzy tools. So first class feature is very required.

default0 commented 5 years ago

@Pzixel The .csx-based approach I am using for my private NuGet package enables the use of different sets of dependencies for code generation and for the running program (which is used to fe avoid dragging Roslyn into my runtime dependencies for no reason). So this, too, is possible without any compiler changes.

@LokiMidgard replace/original does not enable any simplifications of code generator interfaces nor does it enable any new scenarios. You can design code generators just fine without it (fe using private fields with an attribute instead of public properties as signifiers for auto-implementation of INotifyPropertyChanged).

LokiMidgard commented 5 years ago

There are different scenarios not only INotifyPropertyChanged. Changing a method implementation would not be so easy, e.g. A Diagnostic attribute that measures the time a methods need. Or Transaction that enters an transaction at the beginning of an method and leaves it at the end.

What I want, if a user uses one of my librarys, in best case he would write code as he is used. In C# e.g. use Propertys instead of fields. And then uses generators attributes to change his implementation.

default0 commented 5 years ago

Pseudocode for something that measures the time needed to execute a method:

[Measured]
private void measureMe()
{
    // do something you want to measure
}

Generates:

public void MeasureMe()
{
    // generated measure init code
    measureMe();
    // generated measure finishing code
}

Works just fine. Same concept for transactions. Again, yes, replace/original do make this a bit nicer, but it does not enable new scenarios. Just design code generators differently.

generateui commented 5 years ago

I have read most of the comments of the past threads. @CyrusNajmabadi: thanks for the continuous push for quality in Roslyn. That is much appreciated! @KathleenDollard: Thanks for summarizing the thread, I feel it is an actual reflection.

The different concrete usecases I had and have are the following (some I actually had multiple times):

Generate implementations [of interface IX residing in assembly AX] into assembly AY, where AY already has partial classes. This usecase requires 1) AX to build first, then 2) generate code and finally 3) build assembly AY.
Generate classes [based on metadata in xml file X1 residing in assembly AX] into assembly AY. This requires 1) generate code then 2) build all assemblies.
Generate runtime assemblies [based on runtime AST instance of language LX in assembly AX] using generated C# into a new in-memory assembly AY. Subsequently calling code residing in AY from AX. This requires runtime 1) runtime generation of C# and 2) runtime creation of an assembly.
Generate C# code [based on dsl metadata residing in assembly AX] into assembly AX. This requires 1) generating C# code then 2) building the assemblies.
Generate MC++ code [based on C++ header files] into assembly AX. This requires 1) generating MC++ code then 2) building the assemblies.
As (5), but with custom generated code for certain C++ types. Same requirements as (5), but using a pluggable code generator.
Fast IL mutation. We had (and still have) various classes of serious bugs. A solution to these classes of bugs is to insert "fail fast code" automatically into assemblies. Spiking one of the various IL emitters (e.g. Fody proved drastically (~50%-100%) lowering the build times, which was unacceptable. We had a few notorious hard to track down IDisposable bugs surfacing in complex situations. Inserting assertions in an automated fashion would allow us to catch these much easier. Just applying an IL weaver for a one-off debug build would not help us here, as we needed real-world use.
Improving our DI/IoC container performance á la Dagger. We currently use compiled lambdas, which gives us reasonable performance (compared over our old reflection based system). However, I want to generate the full instantiation graph into C#. This would give us (I expect) a small but noticeable performance improvement. This requires 1) compile DI API assembly AX then 2) compile plugin assemblies AYs 3) generate code 4) compile plugin assemblies AYs again. Step 4 could be substituted to instead generate code into an empty project P and then build assembly AP from that.
Have plugin implementors reuse our code generators. The usecase here is OpenTK; We allow downstream plugin implementors to provide shaders. These shaders are then used to implement interfaces using generated C# classes. These generate C# representation of shaders can plugin developers use further in their plugin implementations. Practically, I'd like to have plugin implementors: 1) include a shader file into their project 2) a generated C# class (based on the shader) automatically appears in the project. The plugin developer does not need to bother with the code generator.
Runtime implementation of (generic) interfaces. In my API, I provide IInterface<T> where T : IAnotherInterface. Whenever I encounter implementations of IAnotherInterface in plugin assembly AX, I provide a runtime implementation of IInterface<PluginClass>.

Above usecases are concrete examples of problems I needed to solve in the past 5 years. I could probably dig deeper into my brains for more, but I'm afraid it will produce outdated requirements which can currently be solved in better ways. Apologies for including similar cases, but I felt it would give a better picture on the problems I encounter.

In general, the most common and therefore important usecase for me has been generating C# code during build-time. The order of this is significant and should be configurable:

generate code pre-compile-anything (2, 4, 5, 6, 9)
generate code before compiling an assembly (8?)
generate code after compiling an assembly (1, 8?)

As runtime code generation seems out of scope for this discussion, please ignore (3) & (10). I included them to paint a complete picture.

(6) probably does not strictly differ from (5) as seen from a Roslyn API perspective (the pluggability is internal in the codegenerator).

(9) would be a nice-to-have, but I can see library authors benefitting from such a feat (reusable code generators).

As a side note, I do want editing of generated code (ENC not needed, but would be nice). I want this to debug code generators (e.g. what would happen if I change this generated code to that? scenarios)

generateui commented 5 years ago

Further, the current state of affairs for compile-time code generation has usability problems. A few that come to mind:

T4 templates are not type-safe (refactoring C# code is not done in T4 templates)
You can refactor generated code, which never makes sense
Including T4 code generation into the build script is avoided as it is a very slow process. In one build definition, going around this introduces more expensive problems. As such, the build time is increased with around a minute of modern quadcore Intel laptops. Considering the relative simplicity of these T4 templates, it is quite out-of-balance.
Some generators only generate C# code with fully-qualified type names, making the code hard to read. This is a code generators implementation problem, but I can see a nice Roslyn API potentially solving this problem.
As code generation is not included in the build definition, generated code is included in the repository. This pollutes the diffs.
As code generation is not included in the build definition, users must manually invoke code generators. This requires users to know all input dependencies the code generator has. This proves to be problematic in practice, as it is easy to forget, and new users of the codebase must be explicitly instructed somehow.
Visual Studio runs code generators just by opening a file using a certain editor (I'm looking at you, WinForms & XSD designer). This should not be a problem in theory, but these generators include a timestamp which therefore changes the file thus gets included in the VCS stage.
No dependency-checking is performed on codegenerator dependencies. A code generator taking input of assembly AX to generate code into assembly AX can be problematic, as it potentially introduces a cyclic self-dependency. Something that warns the user for simple cases would be helpful, but nice to have. Quite likely, it's impossible to write a checker that checks all potential usecases.
Dependency analysis wrt code generators and their dependencies is limited. For example, I can see what dependencies a T4 template files has, but that's about it. A good visualization of some sorts to know the dependencies of a codegenerator would ease solving dependency-hell.

(6) is the most important issue for me to see solved. All the others are sharp edges and have workarounds, but (6) has introduced a fair share of bugs, which to me is unacceptable. I have not found a workaround for (6) that is reliable. Basically, (6) requires a disciplined human process, which is better implemented using an automated solution.

gafter commented 5 years ago

/cc @davidwrighton

LokiMidgard commented 4 years ago

I've noticed Generators without language change under 9.0 candidates in Language Version Planing.

Is it based on this discussion (just without the replace/original keywords)?
There was no issue assosiated with that entry.

Pzixel commented 4 years ago

Unfortunately, this feature looks like a poor AOP support. You won't be able to do interestring things like automatically derive interface/shapes(when they come), add members and doing other stuff. I understand IDE concerns but without that features this feature has little use.

TLDR: C# needs its own shapeless

LokiMidgard commented 4 years ago

@Pzixel Why should it not allow derive interfaces or add mebers? (As long as the changed class is partial)

From the name I would assume it would only ommit fancy and relay helpfull Keywords. Unless you have found mor information then the title.

Of course it would be nice if you wouldn't need to make those classes partial. But if ommitting this will bring code generation earlier, I'm fine with it.

I already use source generation today in form of scripts or small programms that that run in preBuild or before CoreCompile. However this is kind of quirky today and offten runs in problems. Having an offical supported way to use code generation can make it only better.

Pzixel commented 4 years ago

For some reason I thought it cannot add new members. But now I reread the docs

To add members to an existing class, the generated source will define a partial class. In C# this means the original class definition must be defined as partial.

Adding partial keyword is a minor inconvenience. If it works then it's great.

In the end, I'd like to have seeing something like

[Derive(typeof(IComparable<>), typeof(IEquitable<>)]
public partial class Point {
    public int X { get; set; }
    public int Y { get; set; }
}

...

var comparision = new Point(10, 20).CompareTo(new Point(20,3));

HaloFour commented 4 years ago

IMO doing it without the language changes is probably more about proving out the tooling support of the feature first. I would assume that adding new members or types would cause much more impact on the tooling support than replace/original would as the IDE needs to discover those types/members in order to offer them via autocomplete.

Pzixel commented 4 years ago

Yes, that's what I meant when saying "I understand IDE concerns". But without it it's just a toy feature "look, we can inject logging in the beggining and the end of the method. Woah". IMHO, of course.

vbcodec commented 4 years ago

https://github.com/dotnet/roslyn/projects/54 What does this means ? Seems like wish list. Hard to say if this is serious commitment or another prototypical attempt.

CyrusNajmabadi commented 4 years ago

Wouldn't trying to prototype this indicate a serious commitment?

vbcodec commented 4 years ago

Actually not. You may build prototype after prototype, but without serious commitment, never able to push it through RTM phase.

CyrusNajmabadi commented 4 years ago

We're not going to be able to make this happen without prototyping it. It's a challenging and complex space. I cannot be designed in absence as this is really about a tooling feature (like analyzers), and much less of a language feature.

MS is currently paying people to do this and allocating the time to make this happen, ahead of a long list of other with that could be fine instead. If you don't see that as commitment, then I don't know what to tell you.

dotnet / csharplang