AArnott / CodeGeneration.Roslyn

Assists in performing Roslyn-based code generation during a build.
Microsoft Public License
408 stars 59 forks source link

To MemberDeclarationSyntax or CompilationUnitSyntax #128

Closed mwpowellhtx closed 5 years ago

mwpowellhtx commented 5 years ago

Hello,

That is the question. Rather, I'll couch it in my use case.

First, what are the requirements for the trigger Attribute? In particular, I do not foresee needing to decorate anything concrete, per se, such as a class or an enum, but I do want to trigger as an [assembly: MyTriggerAttribute(...)].

Additionally, do the IEnumerable<MemberDeclarationSyntax> bits land in a CompilationUnitSyntax? For my use case, I would rather like to generate the CompilationUnitSyntax bits themselves, apply the customary .NormalizeWhitespace() and save those to the generated file(s).

In summary, potentially, I see my Code Generation Roslyn service triggering on an assembly attribute and generating a handful of separate modules (files).

In terms of the ICodeGenerator interface, I imagine this would be the moment when I do this code generation, I could yield break the expected return, for instance, while I also saved the units themselves?

Does this seem doable?

Thanks!

mwpowellhtx commented 5 years ago

Correction, or, rather, return an empty SyntaxList<MemberDeclarationSyntax>. Informed by a yield break or empty set of MemberDeclarationSyntax.

mwpowellhtx commented 5 years ago

Potentially separate issue apart from the question:

Actually, it might be interesting to change the interface, or add another interface, to support this: SyntaxList<SyntaxNode>, which could support either MemberDeclarationSyntax or CompilationUnitSyntax, or parts in between. Rinse, repeat, extend scope as needs be.

Pzixel commented 5 years ago

I already do it, I actually implemented assembly-level resolution for the project. See https://github.com/Pzixel/Solidity.Roslyn

mwpowellhtx commented 5 years ago

@Pzixel Just following up on this one, is that CG implementation to support that? Are you submitting a PR? Are you following/maintaining a fork apart from this one?

Pzixel commented 5 years ago

See #67

mwpowellhtx commented 5 years ago

@Pzixel True, so, CompilationUnitSyntax is not a MemberDeclarationSyntax for starters, whereas NamespaceDeclarationSyntax is. TypeDeclarationSyntax is a kind of BaseTypeDeclarationSyntax, which is also MemberDeclarationSyntax. Point being, if there was any sort of filtering going on expecting to generate any CompilationUnitSyntax instances, that was being effectively ignored. They are all, however, CSharpSyntaxNode.

Maybe this was mentioned in another issue, forum, blog; I do not recall precisely which. IRichCodeGenerator may support the sort of code generation I am aiming for? That is, generate code triggered by an Assembly Attribute?

Pzixel commented 5 years ago

I'm not quite get what's your point. You have to place an attribute which allows you enable codegen in the project. You can place it on anything: Type/Namespace or assembly itself. What kind of filtering you mention? I don't get it. You just get your object you places an attribute on and then you process it any way you want.

mwpowellhtx commented 5 years ago

@Pzixel Yes, but I can place the [assembly: MyCodeGenTriggerAttribute] at the assembly level, correct? Given the document-level code generation, that needs to be in an AssemblyInfo.cs or similar CSharp file?

I do not want to generate MemberDeclarationSyntax, but rather CompilationUnitSyntax, which I suppose my choices are to implement the IRichCodeGenerator?

// Because Namespace and Type Declarations are both a kind of Member Declaration.
var memberNodes = inputDocument
    .GetRoot()
    .DescendantNodesAndSelf(n => n is CompilationUnitSyntax || n is MemberDeclarationSyntax)
    .OfType<CSharpSyntaxNode>();

Also reviewing the CodeGeneration source, there is a possible improvement there. For performance reasons, I'm also considering a possible improvement deferring SyntaxList<...> (a struct) in favor of IEnumerable<...>, until the last possible moment when struct comprehension can be deferred. But this is another longer term issue, I think.

Pzixel commented 5 years ago

No, it shouldn't. See test in linked PR.

I do not want to generate MemberDeclarationSyntax, but rather CompilationUnitSyntax, which I suppose my choices are to implement the IRichCodeGenerator?

Currently generators should produce types, its true. So your suggestion is basically "relax generator output type to by anything". I never needed that so you probably need having some consultation with repo owner.

mwpowellhtx commented 5 years ago

@AArnott I have this scenario in which it does not make sense for me to decorate any class, or even namespace. Rather, I want to decorate the Assembly itself. Your thoughts? Thanks!

Pzixel commented 5 years ago

Omg. Sorry, but I explained it several times. I decorate an assembly, read how assembly attributes work. But I produce classes from that attribute. If you need to generate somethign else then current state doesn't fit it. If you just want to decorate some assembly then it's doable and I actually did it in several projects.

mwpowellhtx commented 5 years ago

@Pzixel Oh, okay. I'm still a bit confused myself. Well, my code generator may generate classes, but these are not extending any partial classes but rather generating completely new ones. It may also generate enum declarations. I may also declare "outer" as well as "inner" using directives, that sort of thing. So, CompilationUnitSyntax is the best way for me to deliver those nodes, I think.

Pzixel commented 5 years ago

Just try coding and you'l see everything yourself. The package can do it, I did it myself. just code it out.

mwpowellhtx commented 5 years ago

@Pzixel Too many pronouns. ICodeGenerator is not the right fit, I think. Maybe IRichCodeGenerator, but this seems geared to generate just one "rich" generated code for the context on which the decoration occurred. I have the scenario, One Attribute triggers -> multiple enum/class source code generated.

mwpowellhtx commented 5 years ago

@AArnott Here's what I'm proposing. I'm not sure it makes sense for IRichCodeGenerator to derive from ICodeGenerator, especially considering the confusion observed in the EnrichingCodeGeneratorProxy. Never mind all the pattern matching around concerns such as class, struct, and so on.

However, I am willing to overlook that, not least of which because it seems like a very entangled web to untangle, if I can simply introduce an API that supports the code generation author furnishing an arbitrary number of CompilationUnitSyntax instances. The Attribute trigger may happen anywhere, including an assembly attribute.

In my mind, that would be a clearer division of responsibilities; in other words, CodeGeneration.Roslyn would be there to simply offer the pipeline events and necessarily hooks into those events, whereas the code generation author provides whatever syntactical nodes are appropriate at the unit level.

Thoughts?

Pzixel commented 5 years ago

As a workaround you can generate a class with nested enums/classes/etc. It's not that clean but it may work with existing version.

mwpowellhtx commented 5 years ago

@Pzixel It's a fair point, I appreciate that. I'm just analyzing the overall architecture and wondering whether I could distill a more focused semi-fork derivation from this one. Especially wrapping my head around minimum units of work, possible points of extension, better division of labor between Code Generation scaffold and subscriber responsibilities, etc.

mwpowellhtx commented 5 years ago

It is fairly straightforward at the moment to adapt a next gen code generation solution from this one I think. So far so good. The only tricky bit I am finding is that the assumption was made that generated file names were based on, pretty much, a 1-1 relationship with the source files, the files in which the trigger attribute(s) were discovered. I'm not sure that any hash codes are that necessary, however, in fact, I am consider basing a naming convention on simple UUID, but I am just crossing that bridge now.

While I'm there, I also decided to "formalize" as it were code generation in a Descriptor. I pulled the preamble text into the same descriptor, which default draws from an embedded RESX resource, as well as other options such as whether to include a new line at the end of the generated code. The benefit of a preamble resource is that localization could be considered in future versions.

amis92 commented 5 years ago

@mwpowellhtx please try to keep issues contained. If you think up a new topic, I suggest to create new issue.


Summarized answer

In an attempt to shortly answer your question regarding CompilationUnitSyntax: it's a node that relates to a single document's content; so no, we won't support adding multiple CUSs to single-attribute generator results.

A core concept in this CG.R project is that a single .cs file can have 0-1 new generated .cs files. These files are "patched" up by stitching together results from all code generators that triggered on any attributes within a given source code file.

Of course, you can propose to extend it somehow and that's great idea, but currently, no, you can't have multiple generated source files for single input source file.

If this answers your original question, please consider closing the issue, or drill down on it with further questions. :)

amis92 commented 5 years ago

Also, for some quite extensive code generators you can refer to https://github.com/WarHub/wham/tree/master/src/WarHub.ArmouryModel.Source.CodeGeneration

AArnott commented 5 years ago

I'm also considering a possible improvement deferring SyntaxList<...> (a struct) in favor of IEnumerable<...>, until the last possible moment when struct comprehension can be deferred.

How is that an improvement? The work has to be done either way. How would doing it later speed anything up? Also, a generator method that returns IEnumerable<T> would risk that enumerating more than once would multiply the work incurred, so we'd then need to cache the result, all problems that SyntaxList avoids.

I'm not sure it makes sense for IRichCodeGenerator to derive from ICodeGenerator, especially considering the confusion observed in the EnrichingCodeGeneratorProxy

That's a perfectly fair point. It is an artifact of the evolution of the project and not the best design that we even have two generator models, probably.

Rather, I want to decorate the Assembly itself. Your thoughts?

As @pzixel said, we've already accepted a PR to make this possible.

The only tricky bit I am finding is that the assumption was made that generated file names were based on, pretty much, a 1-1 relationship with the source files, the files in which the trigger attribute(s) were discovered.

As for producing a whole new file or number of files, I think you're right in that we don't quite let you do that today. I wouldn't object to a PR to add that, but keep in mind the incremental generator check would need to be made more complicated to understand that one source file can produce multiple target files, each timestamp of which would need to be checked to see if they are stale, and if fewer files are generated on a subsequent build, incremental clean should delete the files that are no longer generated.

I am curious why generating multiple files from a single file is important to you. AFAIK C# syntax never requires separate files. NamespaceDeclarationSyntax derives from MemberDeclarationSyntax so you can literally define multiple namespaces within your single generated source file, and within each namespace block you could define usings and whatever else you want.

I'm not sure that any hash codes are that necessary, however, in fact, I am consider basing a naming convention on simple UUID, but I am just crossing that bridge now.

Because all the generated files go into one directory, but source files can be scattered everywhere. When two source files have the same filename but exist in unique directories, the generated files stomp on each other. So we need a way to keep them distinct in the obj directory, but keep their filenames deterministic so that each build overwrites (or keeps without regenerating) the same file.

In my mind, that would be a clearer division of responsibilities; in other words, CodeGeneration.Roslyn would be there to simply offer the pipeline events and necessarily hooks into those events, whereas the code generation author provides whatever syntactical nodes are appropriate at the unit level.

Perhaps. But as most/all the CG.R use cases so far wanted to respond to attributes, and wanted to generate members (whether they are namespaces, types, or methods) in response to them, offering a lower level framework would just make more work for the common case. Optimizing for both use cases may be possible, but would require more work and until now we haven't had a request for more control over what code emit allows. I'm hoping you can explain why absolute control over the file(s) generated is required.

mwpowellhtx commented 5 years ago

I am recasting the package a little bit, and I see a couple of steps involved generating code in order to purge such a Code Gen Manifest.

compilation.SyntaxTrees.Select(x => x.FilePath)
    .Where(x => compilationInputDescriptorSet.All(y => x != y.SourceFilePath)).ToList()
    .ForEach(x => compilationInputDescriptorSet.RemoveWhere(y => y.SourceFilePath == x))
    ;
compilationInputDescriptorSet.Select(x => x.SourceFilePath).ToArray()
    .Where(x => compilation.SyntaxTrees.Any(y => x == y.FilePath)
                && File.GetLastWriteTimeUtc(x) > assemblyLastWritten).ToList()
    .ForEach(x => compilationInputDescriptorSet.RemoveWhere(y => y.SourceFilePath == x))
    ;
compilation.SyntaxTrees.Select(x => x.FilePath)
    .Where(x => compilationInputDescriptorSet.All(y => x != y.SourceFilePath)).ToList()
    .ForEach(x => compilationInputDescriptorSet.Add(InputSyntaxTreeFileDescriptor.Create(x)))
    ;

After that I simply identify generated code by UUID, or simply <uuid/>.g.cs, documented in the manifest file.

I am also formalizing the assembly/generated manifests in terms of a first class JSON serializable domain entities.

AArnott commented 5 years ago

@mwpowellhtx I'm not sure what you're talking about here. I thought we were discussing documents vs. members in this issue.

amis92 commented 5 years ago

Mostly this issue seems to ask for a possibility to create additional files. Full extensibility proposal #137 would allow an Engine to be created which could also generate additional files, probably. I'll close this issue in favor of the mentioned issue. Also, if OP is unhappy about that, please open a specific feature request issue for additional generated files feature, however it probably will be marked as wontfix.