Champion "Replace/original and code generation extensions"

gafter commented 7 years ago

[ ] Proposal added
[x] Discussed in LDM
[ ] Decision in LDM
[ ] Finalized (done, rejected, inactive)
[ ] Spec'ed

See also

https://github.com/dotnet/roslyn/issues/5292 https://github.com/dotnet/roslyn/issues/5561
An implementation branch is at https://github.com/dotnet/roslyn/tree/features/source-generators
Use-cases by @alrz: Discussion: Code Generator Catalog
Spec: https://github.com/dotnet/roslyn/blob/master/docs/features/generators.md

CyrusNajmabadi commented 6 years ago

Compare that to being able to manipulate the syntax tree directly in any way you want.

Here's the thing though. You can already do that. Just run a pre-step that manipulates the syntax tree however you want. And then feed that into a normal roslyn compilation. As people in this thread have stated, it's seemingly acceptable if that compiled code is something you cannot navigate to, or refactor, or get intellisense on, or debug. So why do you need roslyn for anything? It would be totally simple to do this, and then get completely full power to do whatever you want to the syntax...

CyrusNajmabadi commented 6 years ago

The main thing I dislike about this proposal is that it provides limited support for code generation, and we may end up being stuck in the future because of these limitations (remember once you add a feature to the language it'll have to be supported forever). The approach simply doesn't feel clean to me. I'd prefer being able to replace anything I want in the AST at codegen time, because that's a more sustainable approach and it feels right.

But here's the the thing. If you don't limit things, then you have the problems i've talked about en masse. Limiting things is a way of ensuring that you have some semblance of understanding of what the system is doing, and you can actually optimize and make things efficient for core scenarios.

And, as i mentioned above, if you want an "unlimited" system, you can have that today. Nothing is stopping you from just writing your own SyntaxTree-translation system that then gets compiled.

CyrusNajmabadi commented 6 years ago

You run it just after the "hardcoded" compilation, and right before the analysis.

Just to comment on this.

I have no idea what this means :) What is the "hardcoded" compilation and what does it mean "right before the analysis" :)

The analysis is the compilation. if you run after the compilation, you've run after all the analysis happens. You could run before any analysis happens. And there are ways you could do that today.

xoofx commented 6 years ago

Could you clarify that bit. For example, if privates are generated into the containing type, then that would be accessible/intellisense-ible by the user. If a new type is being generated, then that type will likely be accessible tot he user, etc. etc. Thanks!

That's a potential option. However, if you don't generate, you may totally risk the user being able to see the intermediate state. i.e. where they expect something to be generated, but as far as the IDE is concerned it doesn't exist. This may lead to an unacceptable user experience.

No. And the reason for that is simple. The compiler transformation cannot affect the intellisense experience. THe compiler could do the current rewrite, or translate down to new IL intrinsics, or something else entirely, and it wouldn't effect what intellisense gets shown.

That is very different from Source generation/replacement. In that case, we are not limiting the feature to "only the changes that could not be perceived by the user program". It is very much "in scope" that the generator be able to make changes that are visible to the the rest of the program.

Sorry, when I said private, I said in the sense of async/await generated code for the state machine. And that's good to know that async/await are happening at a different stage because all solutions using IL patching today are in this situation:

any AOP are doing this
many serializers, ORM mappers are doing this
look at this code base https://github.com/SiliconStudio/xenko/tree/master-2.1/sources/common/core/SiliconStudio.AssemblyProcessor in a previous project I was working on where we had many patching scenarios like serializers....etc. (Note that for serializers we tried also for quite a long time T4 templates - which where very impractical)

So the experience we have today is that we have to post-patch the IL, we don't have of course any kind of Intellisense, nor a single debugging experience... but we have also a much slower compilation build pipeline that can induce a x10 if you don't do anything. When you are using Cecil, you are going to re-read all assemblies that were written previously (plus all the one coming from frameworks...etc), doing the patch on the assemblies and re-saving them. In order to workaround this, I had to develop a server that was trying to mitigate JIT time, cache assemblies...etc. You can figure out quickly how much CPU, cache misses, disk access are being put into this: this whole experience is exactly what you describe for a different context "This may lead to an unacceptable user experience."

So, if we can first, address this long standing problem of having an integrated compiler plugin story that doesn't require any kind of debugging/Intellisense/project-system/IDE integration, that would be already a huge step forward. And I would concentrate my effort on this scenario first.

agocke commented 6 years ago

Opening up the compiler internal data structures to modification by arbitrary third parties is an explicit non-goal of this feature. The compiler is not implemented in a way to provide safe access to any part of the compiler pipeline aside from source trees. If this is what you want out of source generators, it's not happening.

xoofx commented 6 years ago

Opening up the compiler internal data structures to modification by arbitrary third parties is an explicit non-goal of this feature. The compiler is not implemented in a way to provide safe access to any part of the compiler pipeline aside from source trees. If this is what you want out of source generators, it's not happening.

Of course! That would not happen on internal compiler infrastructure but using the external Roslyn API (e.g Compilation, SyntaxTree...etc.) , so it should aligned, exactly like how diagnostic analyzers are working, but running just a little bit before and right after the internal compiler steps (that is producing the public model)

agocke commented 6 years ago

The "public model" in this case is the Compilation. In essence, once you construct a Compilation, that's it. That's what the compiler sees as "the world." In all designs I've come up with, the focus is on allowing generators to run on a Compilation constructed of the existing source files, and emit new source which will be used to construct a second Compilation, which will be the one actually used by the compiler for Emit.

xoofx commented 6 years ago

Exactly, and that's what we would like to see, just a step that can transform this Compilation to a new Compilation step so we don't need a debugging/Intellisense/project-system/IDE integration story.

To demonstrate this point to the issue https://github.com/dotnet/roslyn/issues/19505 I even created a pseudo prototype code compilation-rewriter doing exactly this:

    public interface ICompilationRewriter
    {
        Compilation Rewrite(Compilation compilation);
    }

agocke commented 6 years ago

Yup. That's the part where the previous design fell down because it couldn't guarantee "safe points" where it didn't have to re-run generators. I think any design needs to provide explicit notations of where the safe points are if it wants to get off the ground.

CyrusNajmabadi commented 6 years ago

so we don't need a debugging/Intellisense/project-system/IDE integration story.

If you don't need that story... you don't really need anything in Roslyn to make this possible :)

and that's what we would like to see, just a step that can transform this Compilation to a new Compilation step

You can already do this. Just write a tool that loads everything into a compilation, do your analysis, then generate a new compilation. You can then emit that compilation. :)

xoofx commented 6 years ago

Yup. That's the part where the previous design fell down because it couldn't guarantee "safe points" where it didn't have to re-run generators. I think any design needs to provide explicit notations of where the safe points are if it wants to get off the ground.

Yep, but in our case, we don't need a safe point. We just need to be integrated into csc in the end. I understand that the Intellisense/IDE story is a lot more complicated, but lots of codeggen scenarios are not looking for this, but solely looking for post patching (moving what we are doing at IL level to directly the compiler to remove all the cost induced for re-reading all assemblies/re-writing them)

agocke commented 6 years ago

If you want this to ship, the Compilation seen by the IDE needs to be the Compilation emitted. Period.

CyrusNajmabadi commented 6 years ago

so it should aligned, exactly like how diagnostic analyzers are working, but running just a little bit before and right after the internal compiler steps

What do you mean by "running just a little bit before"? I can at least see what 'after' means. It's once there is a compilation, you then do what you want and produce a new compilation. I'm not seeing what 'before' means. Can you clarify? Thanks!

xoofx commented 6 years ago

You can already do this. Just write a tool that loads everything into a compilation, do your analysis, then generate a new compilation. You can then emit that compilation. :)

Yes, but we want this to be integrated so that someone can pull a NuGet package and get this enhanced compilation story very easily (without using a fork of csc or whatever)

CyrusNajmabadi commented 6 years ago

Yes, but we want this to be integrated so that someone can pull a NuGet package and get this enhanced compilation story very easily (without using a fork of csc or whatever)

Why would you need to fork anything? Just make a build task of some sort. Your build task can use the entire roslyn API, and you can create a full compilatino, do whatever you want with it, then do the normal csc task.

xoofx commented 6 years ago

Why would you need to fork anything? Just make a build task of some sort.

A build task of what? Re-running Roslyn? What we are looking for is to integrate into Roslyn so that we don't have to re-analyze everything. I don't understand how a build task could help here (we are already using build tasks for IL patching)

CyrusNajmabadi commented 6 years ago

A build task of what? Re-running Roslyn? What we are looking for is to integrate into Roslyn so that we don't have to re-analyze everything.

The moment you make a new compilation, everything needs to be reanalyzed. That's how compilations work. Except for the decl-tables, pretty much everything is rebuilt from your source**. And the decl-tables are really useful for the sorts of experiences we want for intellisense (i.e. diving into code to figure out what it means). For actual compilation/emitting, it doesn't really improve anything.

So you'll still have the full cost here.

--

** It has to be. Basically any edit can invalidate every bit of semantic analysis that has been done. So the compiler will re-analyze everything. Check for all the diagnostics all over again, and emit things all over again (if you choose to emit).

xoofx commented 6 years ago

If you want this to ship, the Compilation seen by the IDE needs to be the Compilation emitted. Period.

This is confusing with what @CyrusNajmabadi said before about my question about async/await.

No. And the reason for that is simple. The compiler transformation cannot affect the intellisense experience. THe compiler could do the current rewrite, or translate down to new IL intrinsics, or something else entirely, and it wouldn't effect what intellisense gets shown.

Why on earth the IDE needs to be the compilation emitted then?

The moment you make a new compilation, everything needs to be reanalyzed. That's how compilations work. Except for the decl-tables, pretty much everything is rebuilt from your source. And the decl-tables are really useful for the sorts of experiences we want for intellisense (i.e. diving into code to figure out what it means). For actual compilation/emitting, it doesn't really improve anything. So you'll still have the full cost here.

What do you mean? If you modify a SyntaxTree to re-feed a Compilation it is not going to re-read all the source from the disk, re-construct the existing SyntaxTree right?

xoofx commented 6 years ago

I will provide a benchmark against a Mono.Cecil solution just to give you an idea about the cost of the current situation against the cost of having Compilation rewritten.

CyrusNajmabadi commented 6 years ago

Why on earth the IDE needs to be the compilation emitted then?

Because, in this scenario where you are purely Compilation->Compilation there is no way to know that the changes you are making do not affect the IDE. This is contrasted with things like async/await where hte compiler knows that hte changes it is making will not effect it.

This ties into what andy said about: think any design needs to provide explicit notations of where the safe points are if it wants to get off the ground.

If you are doing Compilation->Compilation it would not have to be seen by the IDE if you only allowed invisible transformations. But if you allow arbitrary changes (which 'Compilation->Compilation' seems to indicate) then you have a problem.

CyrusNajmabadi commented 6 years ago

What do you mean? If you modify a SyntaxTree to re-feed a Compilation it is not going to re-read all the source from the disk, re-construct the existing SyntaxTree right?

If you don't have the actual source, then i'm not sure how debugging could work at all. For example. say that the generator literally just inserted a 100 line method at the top of a class. Literally all the debugging info will be wrong for the generated code and it will be impossible to set a breakpoint or step through the code. So i assumed you'd at least be having hte actual source somewhere so people would have a real artifact they could work with :)

CyrusNajmabadi commented 6 years ago

Why on earth the IDE needs to be the compilation emitted then?

@xoofx I think this conversation has been confusing because it sounds like you're trying to get a solution for a very specific use-case, but you've been addressing points and arguments made in the issue that is about providing a very general mechanism by which people could do code-gen.

Yes, if you want to limit things to a very narrow slice of allowable actions, i def believe it would be easier to create something. However, i also don't think that very narrowly sliced thing needs to come from roslyn. I think it would be fine if that came from a separate effort.

--

To me, to do in Roslyn means going the distance and really trying to make a fully featured and powerful solution. This means that use cases like "i want to actually generate public stuff" are considered, and we actually have to consider things like editing/navigating/debugging/etc.

xoofx commented 6 years ago

If you don't have the actual source, then i'm not sure how debugging could work at all. For example. say that the generator literally just inserted a 100 line method at the top of a class. Literally all the debugging info will be wrong for the generated code and it will be impossible to set a breakpoint or step through the code. So i assumed you'd at least be having hte actual source somewhere so people would have a real artifact they could work with :)

No, and I'm confused as I have been explaining above that we want to have a solution for the problem of IL patching. We don't need any kind of IDE/Intelissense integration, we don't need debugging (because we don't have this today with IL patching and it is fine). That's why it is different from source code generator. I came up here only because it was related to codegen tools, but actually, this is really not what we are looking for (IDE integration...etc.)

iam3yal commented 6 years ago

@xoofx

debugging

Take everything on the productivity side but anyone who was doing T4 before that was fairly complex know how important this is and how hard it is to debug them, it's no joke so if you gonna do the same mistakes might as well stop and do nothing.

agocke commented 6 years ago

Basically, if your translation is just compilation, a semantic preserving transformation, you're fine. But it won't be. Tons of proposed use cases are not semantic preserving. And you also have no way to enforce this in the API. Are we going to bundle a dependently typed proof system where the generator author proves that their compilation is equivalent?

Otherwise, the IDE needs to err on the side of showing the user what's generated, not the source of the generation.

xoofx commented 6 years ago

@xoofx I think this conversation has been confusing because it sounds like you're trying to get a solution for a very specific use-case, but you've been addressing points and arguments made in the issue that is about providing a very general mechanism by which people could do code-gen.

Fair enough, sorry for coming here to relate to the problem of https://github.com/dotnet/roslyn/issues/19505

Indeed, it is a very specific use case but is used my many many scenarios, so at least, if we can get this, but never get the source code gen described here, that will be fine. 😉

xoofx commented 6 years ago

Basically, if your translation is just compilation, a semantic preserving transformation, you're fine. But it won't be. Tons of proposed use cases are not semantic preserving. And you also have no way to enforce this in the API. Are we going to bundle a dependently typed proof system where the generator author proves that their compilation is equivalent?

When you generate serializers in a code base, yes, you are adding code (done with IL patching again today) so you change the compilation. If you are implying that the Roslyn team will never allow a compiler plugin to add some auto-generated code to what was not written by hand, then that should be clearly stated in the isuse https://github.com/dotnet/roslyn/issues/19505 as it means that we can definitively close the issue and look for proprietary Roslyn compiler infrastructure (as we have been started to do at Unity)

CyrusNajmabadi commented 6 years ago

We don't need any kind of IDE/Intelissense integration, we don't need debugging (because we don't have this today with IL patching and it is fine).

I have dealt with code-rewriters in the past, and i can say form first-hand experience that not being able to debug is a serious issue. Note: this is not just an inability to debug the mutated portions of the code. In the scenario i outlined in https://github.com/dotnet/csharplang/issues/107#issuecomment-398631142 you can't debug anything in that source file because legitimately nothing matches between the source the user will have and the actual code/pdb that is generated.

Again, consider just the simple case of generating a 100 line method at the top of your class. You are SOL in terms of being able to debug because absolutely nothing of your actual-source file matches anything of the generated-IL/pdb that you are actually debugging through. This is pretty serious blow and is not something i think would be acceptable in a full-fledged solution provided at the roslyn level.

CyrusNajmabadi commented 6 years ago

If you are implying that the Roslyn team will never allow a compiler plugin to add some auto-generated code to what was not written by hand, then that should be clearly stated in the isuse dotnet/roslyn#19505 as it means that we can definitively close the issue and look for proprietary Roslyn compiler infrastructure (as we have been started to do at Unity)

@xoofx This conversation is very confusing because the requests and responses are jumping all over the place.

There are still a lot of questions i have that aren't clear to me. For example: https://github.com/dotnet/csharplang/issues/107#issuecomment-398629521

Right now your position is hard to generally discuss because it's hard to pin down what's being asked for. At times, it's just being able to generate new compilations (something def possible, though with lots of potential drawbacks). At other times it's very specific (but vague), i.e. "but running just a little bit before and right after the internal compiler steps" or "having an integrated compiler plugin story". These are very unclear asks.

it's really hard to state anything like "the Roslyn team will never allow a compiler plugin to add some auto-generated code " because that is so open-ended and unclear about what is actually being asked for vs what would certainly be out of scope (for the foreseeable future) given the compiler architecture.

agocke commented 6 years ago

If you are implying that the Roslyn team will never allow a compiler plugin to add some auto-generated code to what was not written by hand

Nope, just that that code must be part of the compilation shown to the IDE.

xoofx commented 6 years ago

@xoofx This conversation is very confusing because the requests and responses are jumping all over the place.

Yes, I will propose to https://github.com/dotnet/roslyn/issues/19505 if we could plan a meeting with all the people involved on this feature request (not the one described here).

If you don't have the actual source, then i'm not sure how debugging could work at all. For example. say that the generator literally just inserted a 100 line method at the top of a class. Literally all the debugging info will be wrong for the generated code and it will be impossible to set a breakpoint or step through the code. So i assumed you'd at least be having hte actual source somewhere so people would have a real artifact they could work with :)

When you do AOP programming by using for example code generated from an attribute, yes, and that's fine, and that's what people are already familiar with that today. Take the example of checking [NotNull] for arguments of the method, you would generate a few if (xxx==null) throw NotNull..etc. on which you can't step and that's fine. Source code gen doesn't make any sense here, because you want to debug your code, not to debug the extra stuffs provided by some contract attributes. That would be a lot more confusing for users.

This is pretty serious blow and is not something i think would be acceptable in a full-fledged solution provided at the roslyn level.

Again, that's the world we are living today. Any IL patching solutions have this and our customers have been living with this for years. But that's not completely true that you can't debug. What you can't debug is the code you insert of course, but you can debug the code that is called by these IL patching if it is one of your library or even some code from a library provider. See for example this Postsharp example http://doc.postsharp.net/debugging-runtime , they are patching the IL, you can't intercept where the IL is modified, but you can at least step in your code being called by the IL patching.

it's really hard to state anything like "the Roslyn team will never allow a compiler plugin to add some auto-generated code " because that is so open-ended and unclear about what is actually being asked for vs what would certainly be out of scope (for the foreseeable future) given the compiler architecture.

My apologies to mix up problems from https://github.com/dotnet/roslyn/issues/19505 with here. I was trying to find why it is blocking (that's why I started to try to understand/mitigate the problem of the IDE), and I think that with our discussion, I understand why it is blocking.

Today most of our workaround for codegen (that relies on information from the user code) are done with IL post patching (AOP, serializers, ORM...etc.). These solutions are partially debuggable (and we and our customers are perfectly fine with this - rather than not having a product at all or a more convoluted compilation steps). And this is where we want to find at least a bare minimum solution for our case, of having an efficient alternative to IL codegen by integrating these steps into Roslyn csc (after Compilation has been emitted, before the analyzers are running for example, as it was pseudo prototype in the compilation-rewriter branch above). It sounds from all our discussions here, that the full solution (described by this issue here) is probably never going happen (too complex, too much problems with IDE, integration...etc.), fair enough, but if we can't have the full thing, we would be perfectly fine with something that is a lot less ambitious, but that would still provide a better experience for our customers (compilation times, integration story into the compiler, not custom targets, not multiple IL exe patching waiting to process assemblies in a conga line....etc.). So I'm going to ask on the other issue if we can setup a meeting with the people involved on the subject to clarify these points and if we can try to find a shared solution to our common problem.

agocke commented 6 years ago

Just to note, I have a significantly more constrained version of the original design that's focused only on adding new members which, I think, can address the "safe point" requirements.

I don't think the feature is dead, it would just require dropping a bunch of other stuff for people to work on it immediately.

CyrusNajmabadi commented 6 years ago

When you do AOP programming by using for example code generated from an attribute, yes, and that's fine, and that's what people are already familiar with that today. Take the example of checking [NotNull] for arguments of the method, you would generate a few if (xxx==null) throw NotNull..etc. on which you can't step and that's fine.

Note: for now, i'm willing to ignore the problem of not being able to debug through generated code. This has been a problem i've personally experienced and have hated, but i'm willing to put it aside to jsut cover general debugging.

So let's take your example but let's flesh this out a bit more. So you've added code into the syntax tree (i.e. if the 'if statements'). Now that means that all code has been shifted because of those 'if statements'. So now what happens if the person tries to set a breakpoint on some of their actual code? The breakpoint will say something like "i want to break at the code that hits position 456 in teh code". However, because of your tons of inserted 'if statements', positoin 456 means something entirely else due to the PDBs being generated against your transformed syntax tree.

So i don't even get how you continue being able to debug the normal user code (again, completely ignoring trying to debug the generated code).

xoofx commented 6 years ago

So let's take your example but let's flesh this out a bit more. So you've added code into the syntax tree (i.e. if the 'if statements'). Now that means that all code has been shifted because of those 'if statements'. So now what happens if the person tries to set a breakpoint on some of their actual code? The breakpoint will say something like "i want to break at the code that hits position 456 in teh code". However, because of your tons of inserted 'if statements', positoin 456 means something entirely else due to the PDBs being generated against your transformed syntax tree.

Hm, isn't the SyntaxTree relying in the end on original lines coming from syntax tokens? Where does this line information comes from then? If we insert a node in a syntax tree, it is shifting this information really? If that's the case, we are in a trouble indeed...

CyrusNajmabadi commented 6 years ago

Again, that's the world we are living today.

I don't see why that means it would be acceptable to be part of the core Roslyn system. :-/

That may be acceptable for your needs. But if limitations and issues are acceptable, then i'm not sure why you actually need a replacement system :)

To me, if we were to do this in roslyn, i would def expect at a bare minimum:

people be able to debug their code without any issues.
ideally, people could actually debug the generated code. After all, if the generated code barfs, you need something to be able to work with to get through it. Otherwise, we'll just be inundated with messages from people going "the program is null-ref'ing... but i have no idea where/why".

but if we can't have the full thing, we would be perfectly fine with something that is a lot less ambitious, but that would still provide a better experience for our customers (compilation times, integration story into the compiler, not custom targets, not multiple IL exe patching waiting to process assemblies in a conga line....etc.).

Here's the thing. You want this solution because there are issues you consider very important (like 'compilation times'). My feedback on all of this is that i find it extremely weird that compilation-time is a concern, and that's what needs to be optimized, but all the rest of these concerns are not.

Let's say we do this, and then we receive a ton of feedback and requests from people saying: uh... i don't get how i can possibly use this. Intellisense just doesn't work. I can't refactor thigns properly because the refactorings don't know about this generated code, and thus semantic analysis doesn't properly understand my code relations. I can't debug my crashing program properly.

If we respond to all of that with "it's not a problem. we provided a fast system to do compilation transforms," then we stand the chance of having a lot of upset users who are underserved by this system. And now we'll have the problem of trying to figure out how to potentially solve those issues after having shipped something that is overly flexible, and hard to constrain.

--

When the feature is external, people can set different expectations for what it can/can't do and how capable it is. When it's part of the main product, that changes things a lot.

CyrusNajmabadi commented 6 years ago

Hm, isn't the SyntaxTree relying in the end on original lines coming from syntax tokens?

No. :) This is a fundamental part of the Roslyn syntax model. It's completely accurate and real. So if you insert code somewhere in the tree, all subsequent nodes/tokens will be 'shifted'.

But that's not really even a good way of desctribing things. 'shifting' makes it seem as if some system is coming in and going and updating hte positions of these tokens and moving them somewhere else. In actuality, the position of a token is just an intrinsic part of what it is (in a platonic sense). A token's position is not something that could ever mutate. It's fundamentally determined by where the token is in the tree.

This is because a SyntaxTree 100% roundtrips with source. So a token's positoin is literally defined as (at the same time) "where it showed up in the original source" or "where it would show up in source if you generated the source from the tree". There is no way for it to be otherwise (and even attempting anything like that would lead to massive breaking of roslyn invariants).

If we insert a node in a syntax tree, it is shifting this information really?

See above :) Things aren't manually "shifted". Nothing goes and says "let me update the positions of all these tokens". Instead, the token simple "exists" at whatever position it is at. So if 500 chars get inserted before it, it's position will necessarily be 500 characters later.

Invariants are a fundamental part of our trees, and core to how ever higher layer looks at them. You cannot, for example, 'insert a node' in a tree, but somehow have subsequent tokens have hte same 'position' from before. The position of a token is determined entirely from what comes before it. What tokens/nodes precede it, and what text constitutes them.

In that's the case, well we are in a trouble indeed...

Indeed :)

CyrusNajmabadi commented 6 years ago

Here's a good reference on Roslyn syntax trees: https://docs.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/work-with-syntax#syntax-trees

xoofx commented 6 years ago

Let's say we do this, and then we receive a ton of feedback and requests from people saying: uh... i don't get how i can possibly use this. Intellisense just doesn't work. I can't refactor thigns properly because the refactorings don't know about this generated code, and thus semantic analysis doesn't properly understand my code relations. I can't debug my crashing program properly.

But we are in this situation today. We deliver a product that doesn't have this debuggable feature. Do you think that it is stopping our customers to use our product or they are so unhappy about this that they are looking for something else? No, they want to use our product because we are adding a lot more value than just generating some IL code.

If we respond to all of that with "it's not a problem. we provided a fast system to do compilation transforms," then we stand the chance of having a lot of upset users who are underserved by this system. And now we'll have the problem of trying to figure out how to potentially solve those issues after having shipped something that is overly flexible, and hard to constrain.

If an ORM product provides codegen at IL time and it is not debuggable, they are in this situation already. If this ORM product starts to use this Roslyn compilation plugin, code is not going to be debuggable and it is fine. Why these customers would go back to Roslyn and tell them that this is unacceptable?

xoofx commented 6 years ago

This is because a SyntaxTree 100% roundtrips with source. So a token's positoin is literally defined as (at the same time) "where it showed up in the original source" or "where it would show up in source if you generated the source from the tree". There is no way for it to be otherwise (and even attempting anything like that would lead to massive breaking of roslyn invariants).

So you can't insert a SyntaxTree that would be built with something similar to using pragma #line directive that is supported by C#?

CyrusNajmabadi commented 6 years ago

But we are in this situation today

You are in this situation today. Roslyn is not :)

If an ORM product provides codegen at IL time and it is not debuggable, they are in this situation already. If this ORM product starts to use this Roslyn compilation plugin, code is not going to be debuggable and it is fine. Why these customers would go back to Roslyn and tell them that this is unacceptable?

This presumes that people will not expect and want this feature in roslyn to do more. There are a lot more scenarios than just the ones you've outlined. People do want to be able to generate more than just private/invisible impl details.

--

So, as has been said a bunch of times: Perhaps, if we scoped things down to a very limited set of things that could be allowed. It would enable some scenarios, while making it clear what wouldn't work.

This might be something tenable. However, it would likely be something very different than the current explorations into source-generation that are encompassed here.

xoofx commented 6 years ago

This presumes that people will not expect and want this feature in roslyn to do more. There are a lot more scenarios than just the ones you've outlined. People do want to be able to generate more than just private/invisible impl details.

I know (and I'm also looking for these scenarios, I have quite a few private projects doing this actually, with some ugly build tasks in the middle and a double compilation step 😉 ) but as I said, if we can at least get the bare minimum standardized of an "IL like undebuggable post-patching approach" (that is covering most of what popular libraries are doing today, used by thousands of projects and products), that's better than staying in a statu-quo. I would love to see the full feature source codegen described here to come to reality, but it sounds that it is unlikely to happen, because there will be always something less costly/complex/more prioritized to implement in Roslyn (I really don't blame you, I see the constraints and that's perfectly understandable)

CyrusNajmabadi commented 6 years ago

but as I said, if we can at least get the bare minimum standardized of an "IL like undebuggable post-patching approach" (that is covering most of what popular libraries are doing today, used by thousands of projects and products), that's better than staying in a statu-quo.

Yes. i can def see how that would help things out a lot. Honestly... i could even accept basically:

hidden from intellisense.
nothing to navigate to.
no project system exposure.

The part that i think really does need a solution is debugging. I get that your customers have been ok with the limited debugging ability your exisitng solutions have. However, i go back to this being a min-bar:

people be able to debug their code without any issues.
1. ideally, people could actually debug the generated code. After all, if the generated code barfs, you need something to be able to work with to get through it. Otherwise, we'll just be inundated with messages from people going "the program is null-ref'ing... but i have no idea where/why".

Note: i'm allowing everything else to potentially be out of scope. but i see any potential solution needing to at least solve those points. note that this should already greatly lower the cost, and also address your concerns. But it would still need a thoughtful design to make sure the above was thought through and a suitable solution was found :)

Note: these are just my thoughts on the matter. Others may absolutely not agree. :)

m0sa commented 6 years ago

@CyrusNajmabadi basically, what razor does, but more integrated, allow for 3rd parties to do the same thing more easily, as part of Roslyn itself?

xoofx commented 6 years ago

Note: these are just my thoughts on the matter. Others may absolutely not agree. :)

Ok, so let's do this meeting then 😉

m0sa commented 6 years ago

We already have a tool that replaces CscTool with a custom exe, that uses the public Roslyn API to parse the command-line and then lets us modify it before emmitting it. It's been getting increasingly difficult to maintain it, though, especially perf-wise (I've found the shared compilation impossible to replace without totally reimplementing everything, most of all of that roslyn code is internal, with hard-coded things in the msbuild tasks), as well as getting it to run under dotnetcore (the commandline workspace is only available on desktop).

CyrusNajmabadi commented 6 years ago

@m0sa

@CyrusNajmabadi basically, what razor does, but more integrated, allow for 3rd parties to do the same thing more easily, as part of the compilation itself?

Funny that you should mention Razor. :) That's a good example of:

a massively complex system that has been hugely costly to support.
something with very difficult issues that caused an enormously long tail of bugs.
something that has morphed over time esp due to customer requests due to limitations.
something that has occupied many man-decades (probably centuries at this point) of effort, with numerous devs having to maintain/fix/update it :)

Effectively, i would like to avoid the mistakes we've made in the past :) It's rarely the case that it will be as simple as people want. And, as time goes on, the problems continue to plague you and lead to customer unhappiness. Years later, you're still paying, and you are now trying to figure out how to do basic stuff customers are asking for like "make refactorings work gosh-darnit!" :)

--

This is not to say that we should not be working toward better code-gen answers. Just that these sorts o examples demonstrate why it's so important to not hand-wave away the parts of the problem space one does not currently care about. These things come back to bite you, and you commonly can end up in the "ugh... why did we make such bad decisions... i wish i could burn it all down and start over again" place :)

CyrusNajmabadi commented 6 years ago

It's been getting increasingly difficult to maintain it, though, especially perf-wise (I've found the shared compilation impossible to replace without totally reimplementing everything, most of all of that roslyn code is internal)

Sorry, would need more details to understand better. Based on what @xoofx has been saying, it sounds like primarily he just wants a system where he could do syntactic transformations (and, of course, analyze it using the public Roslyn API). What things do you feel like you need to reimplement? Thanks!

m0sa commented 6 years ago

If one wants to do syntactic transformations, as a custom msbuild step outside of roslyn, you loose all advantages, and support for what dotnet build-servers offer, for starters:

each run has to load all roslyn dlls, except if you implement some custom build-server-like thing around it
roslyn has to re-parse all the source files, so you can get access to the Compilation and modify, analyze it

xoofx commented 6 years ago

If one wants to do syntactic transformations, as a custom msbuild step outside of roslyn, you loose all advantages, and support for what dotnet build-servers offer, for starters:

I concur and I'm not looking for that, as it would be falling exactly into the same problem we have with IL patching today

PathogenDavid commented 6 years ago

My apologies to mix up problems from dotnet/roslyn#19505 with here.

@xoofx You've said this a couple times, but I'd say they're very closely related. dotnet/roslyn#19505 even mentions this issue in the main issue text.

I feel like the discussion Roslyn issue is focusing more on improving the tools we have today (leaning towards IL mutation without double compilation) while the discussion here focuses more on replacing those tools with something better entirely (namely, source generation.) Both solutions can accomplish many of the same goals, it's just two different ways of looking at the issue.

dotnet / csharplang

Champion "Replace/original and code generation extensions" #107