Proposal: logging APIs for source generators.

Tylertron1998 commented 3 years ago

It may be useful to log certain information in source generators. Currently, the only "way" to do that is via diagnostics, which does not feel "correct" in the sense that diagnostics should be for issues with code - where as there are many other causes for issues in source generators. Some form of logging would be really nice, to inform the user why the source generator may be behaving the way it is, without them digging through documentation or even the source code of the generator itself.

Some examples are:

Source generator did not generate X because of reason Y.
Source generator is generating X without Y features, because of Z.
Source generator has found X in source code, so it is doing Y.

Another useful reason would be for those who are writing a complicated source generator, and would like to see if a certain part is working without writing out everything - i.e. for a generator that generates certain code based on a attribute, instead of writing out both parts (the part that looks for the attributes, and the part that generates the code) you could firstly write out the part that looks for the attribute, log this, and then determine if it is working as intended.

Some questions arise with this proposal, mainly around the API. How would it look to use? I think the standard logger approach of logging a message & loglevel would be fitting. The loglevels could follow msbuild's loglevels, as I would expect that would be where the output would come. How about seperating out logs from different source generators - would it be a case of just formatting the message string differently, i.e. [MyCoolXMLGenerator] generating XML for class "CoolXMLClass"? Is there a better way of doing this?

I believe that the main thing this boils down to is, source generators are part of the build process, and therefore; should be able to log just like any other build process. This information is useful to the end user - being able to give them more insight into what is going on, and why.

jaredpar commented 3 years ago

I'm skeptical that we need a new API here. In general the compilation tools, analyzers for instance, can provide this information today via informational level diagonstics (or suggestions). Why are source generators special here that they can't use the tools that analyzers are using?

Tylertron1998 commented 3 years ago

Do diagnostics make sense for information? I always thought Diagnostics were used for problems in the consumers code - not for information about say, what or how the source generator is doing.

Edit: I'd like to note I'm very new to source generators & Roslyn API's in general. So please forgive me if I don't make sense or understand much!

jaredpar commented 3 years ago

Diagnostics can be informational level though. Also can have them disabled by default and ask customers to do enable them for diagnostics by using editorconfig.

In general though my mental approach here is roughly: why is this something that only source generators need? Basically we've had analyzers for 6+ years now and they have roughly all the same characteristics here and this request hasn't come up. Is this a real gap that we've missed or is there an existing approach we can re-use here?

Tylertron1998 commented 3 years ago

I think this could be best explained by my usecase. I am building a command library for the discord api, that uses source generators to do the lifting in place of where reflection would be used. Something I do is look for classes that implement an interface, which I considered a "command" - and then generate the appropriate handler accordingly. A feature I'd like to support is scanning referenced assemblies for type information to build commands from also - so the user can install a command package and have that work just the same. It would be very useful to log this information - that the generator is scanning X assembly or has found type Y to be constructed, or that it cannot construct a handler for a certain type because it's using an outdated api or is generally incompatible.

From some internal discussions I've had in the C# discord server, I understand that scanning dependencies isn't a great idea in the first place - but that's out of scope for this discussion, the main issue here is there is useful information I'd like to provide to the user about what is going on, but it doesn't really make sense to be in diagnostics.

There are a few more examples I can think of, such as:

'falling back to [less optimized method] due to [reason]' (user/customer wonders, "why is my build so slow?")
testing/debugging - while I understand that logging generally is not a good way to see what's going on, sometimes it can be easier (especially with how difficult actually using the debugger can be with source generators), it can be useful to log while building the source generator to see if your SymbolFilter is filtering the right things, etc.
more detail into what has gone wrong. Being able to view what is happening in every step of the process is invaluable, both for the user to understand if they have an issue with there setup or code, or to give the person who wrote the generator a bug report.

as I said - I'm very new to everything discussed here, and I might just have the wrong mentality about Diagnostics. To me, a Diagnostic is for information about the end users code - where as logging would be for information about the generators process itself. As for the question of why do generators need this, but not analyzers, I cannot comment there. I have never used or written an analyzer, so I'm completely unsure of how to answer.

I think to me, this all boils down to "more information is good" - for bug reports, understanding what is going on, tracking down issues; and these reasons don't really seem relevant for diagnostics.

thargy commented 2 years ago

BUMP! +1 for this. I have a source generator that builds a lot of source files, and would like to output diagnostic information as part of the build process - which is something I would typically do with any other build task. Currently, the only way is to save to a dummy source file in comments or save to an output log file. Whereas I was hoping the info diagnostics would be output in the build, just as warnings and errors are.

Skyppid commented 1 year ago

I do think this would be useful as well. We're using Source Generators to generate Model classes and Response classes in ASP.Net. Doing this manually would be painfully tedious. Yet some odd interactions result in the generated files suddenly disappearing. They don't get re-generated until I restart VS. Since one of ~40 files remains, I assume either there's a bug when generating them in general or there's something unpexected happening in the analyzer. Yet I don't get any feedback. It's like finding a needle in a haystack.

I'd prefer having a basic logger which at best outputs to the Build output directly than having to deal with the complexity and issues coming with Diagnostics just to trace some bugs and issues in my source generator. Not like having to restart VS for every change would be enough pain already finding and fixing bugs in it.

IanKemp commented 8 months ago

Why are source generators special here that they can't use the tools that analyzers are using?

I dunno, maybe because a source generator isn't an analyser, and as such the intended audience of generator logs is completely different to the intended audience for analyser diagnostics? It's as if the OP has asked you to build a tool to clean the floor and instead of agreeing to create a mop or broom, you've told them to use a hammer because the latter already exists... yeah, no.

When people are asking for more ways to log, especially in complicated components like source generators where logging is almost essential for debugging when a debugger isn't available, the correct response is not "use a hammer" - it's "let us think about how we can build that".

default-kramer commented 7 months ago

I'm skeptical that we need a new API here. In general the compilation tools, analyzers for instance, can provide this information today via informational level diagonstics (or suggestions). Why are source generators special here that they can't use the tools that analyzers are using?

Because informational level is not supported for source generators: https://github.com/dotnet/roslyn/issues/50574 Apparently that behavior is "by design" but it leaves source generators with no useful way to display anything less severe than a warning.

Addendum: For my use case, I don't need a new API, it would be enough if I could configure MSBuild to respect some/all Info-level diagnostics.

CyrusNajmabadi commented 7 months ago

SourceGenerators should not issue any sorts of diagnostics to begin with (it breaks incrementality). I'd recommend in all these cases to jsut use an analyzer @default-kramer . They already work well, are plugged in properly to every stage of the pipeline, do not affect incrementality, and then have no perf impact on higher layers getting compilation instances. Thanks!

default-kramer commented 7 months ago

@CyrusNajmabadi I don't understand. Are you suggesting that an analyzer can be used to generate code, or that I don't actually need to generate code?

default-kramer commented 7 months ago

In case it helps anyone else, I've found a workaround for my use case. I discovered you can emit the generated files like this:

<PropertyGroup>
    <EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles>
    <CompilerGeneratedFilesOutputPath>DesiredPathForGeneratedStuff</CompilerGeneratedFilesOutputPath>
</PropertyGroup>

This would allow you to "log" to a C# file (using // style comments is probably safest; #if FALSE might work okay too) which you could then see amongst the generated files. This approach still doesn't let you write to the MSBuild output, but it's good enough for me for now.

IanKemp commented 7 months ago

SourceGenerators should not issue any sorts of diagnostics to begin with (it breaks incrementality). I'd recommend in all these cases to jsut use an analyzer @default-kramer . They already work well, are plugged in properly to every stage of the pipeline, do not affect incrementality, and then have no perf impact on higher layers getting compilation instances. Thanks!

There is an impedance mismatch here.

We users of source generators are asking "why can't we log from source generators?" and you are saying "because it breaks incrementality". Which may be perfectly correct, but doesn't explain what "incrementality" is, why it's important, and how logging would break it. You have that context, but you're not sharing it with us to allow us to understand.

This is also true of your suggestion to "just use an analyser". We are trying to write a source generator, sowhy do we need an analyser too? Why is it necessary to go through the effort of writing an analyser, versus adding a logging API that e.g. transparently creates an analyser for you? Why is an analyser the "right" way to do logging from a source generator?

You have the answers to all these questions but you aren't sharing them with us. I don't believe it's due to any malice on your part, just the fact that you are a domain expert and understand these things implicitly and assume anyone here is also such an expert. But that assumption is incorrect - most of us are very much just getting our feet wet and as part of doing so, are expecting to be able to log things for debug purposes to build that understanding ("learn by doing"). That we can't is frustrating, that you aren't explaining why in ways that we can understand is frustrating, and that is the opposite of what you want if you want more people using source generators.

Please consider authoring a comprehensive MSDN article/blog post/file in this repo, looking at source generators from the viewpoint of novices like us, that answers the questions I've posed above and allows us to get some sort of log output that will enable us to further our understanding and grasp of this powerful tool.

CyrusNajmabadi commented 7 months ago

Which may be perfectly correct, but doesn't explain what "incrementality" is

Incrementality is the ability to have a change result in an updated entity far more efficiently. It works by reusing the prior computed result along with the change to intelligently get to the new result much quickly and usually with far less memory. The result you get should be the same as if you did all the work over again, just with drastically less cpu/memory.

As an example, thing about how parsing works in roslyn today. When a user opens a file, we parse the file getting the full syntax tree in memory. However, as they edit the file, we do not reparse the file from scratch. That would be too expensive (both in CPU and in memory). So we "incrementally" parse the file. In that world we pass in:

the old tree.
the new text
the 'change' indicating how the new text was created from the version that produced the old tree.

With this information, we can often reparse jsut the code taht changed (and a small section around it), forkign the old tree at the change point to contain the reparsed code, to get the final tree. The final tree is the same as the tree you would get if you reparsed the file fully, except that it is orders of magnitude cheaper to get, and takes up orders of magnitude less memory.

In real-world numbers, this takes parsing down from taking many milliseconds (which blows past nearly every feature budget we have), to microseconds (leaving most of the cpu timeslice available to features). Similarly, memory usage can often drop by 99.99% in cases like this. Which not only means less usage of user memory, but also far less pressure on the GC, which leads to things like user pauses as hosts need to garbage collect.

Incrementality is part and parcel of how you achieve an efficient and scalable analysis system (which is what Roslyn aims to be). Without it, you might as well just have a "Batch Compiler" where every change involves reanalysis of everything in the system. Note: the latter is how generators used to work. And, as you can imagine, was unbelievably slow and inefficient. We had an enormous number of cases where generators would literally reanalyze everything on effectively every edit, leading to non-stop CPU usage, and insane garbage churn (think upwards of 10GB of allocs a second).

Incremental generators change the equation by giving a framework for generators to operate incrementally as well. In our real world testing, and our real world adaptation of all of the runtime generators, this move to incremental generators dropped CPU usage over 99.9% and dropped memory usage over 99.99%. Those are real numbers, and indicate how critical incremental generation is to our story of fast and scalable analysis.

CyrusNajmabadi commented 7 months ago

Why is an analyser the "right" way to do logging from a source generator?

Because an analyzer consume a compilation, it does not produce it. Think of it this way. Practically every feature depends on compilations. Without it, they can't function at all. Prior to generators, the entire design of roslyn was that getting a compilation was fast (think microseconds) because we'd designed everything to be properly incremental. And everything was built with the expectation that this was virtually free. Note: when we say 'everything' we also mean all the first party, second part, and third party code out there. The entire ecosystem built on roslyn is built on the speed expectations we designed for and delivered since our initial prototypes.

Generators completely upended that. Now, a compilatino isn't something that is nearly free. A compilation takes as long as your slowest generator. In reality that means we were seeing sometimes on the order of 7+ orders of magnitude slowdown due to a single non-incremental generator. That's not an exaggeration again. This was practically what was occurring. This of course impacted the entire ecosystem which had all been built on the speed guarantees of the pre-generator world.

Incremental generators are a means to get much more closely back to where we started. Where compilations can be created quickly, and the ecosystem's existing investment continues to work. Without this, everyone would have to throw away almost everything they had, and would have to recreate most of it from scratch.

Now, why is an analyzer ok? Well, as i said above because unlike generators, nothing depends on them. They consume the compilation, and produce results that don't impact anyone else. It's the difference between a leaf feature with no dependencies, versus the most core layer that everyone depends on.

CyrusNajmabadi commented 7 months ago

You have the answers to all these questions but you aren't sharing them with us.

I have probably about a thousand things on my plate to juggle. I share as much as i can, and i'm active in our discord where you can come talk to us directly. You can also read https://github.com/dotnet/roslyn/blob/main/docs/features/incremental-generators.cookbook.md (and contribute to it) if you'd like. I'm sorry that i can't put in more time here, but this is one of an insane number of things the team needs to give attention to.

ust the fact that you are a domain expert and understand these things implicitly and assume anyone here is also such an expert.

I definitely do not assume that anyone is an expert. That said, my strong recommendation is to consider if SGs are the right tool for a particular job. They are an extremely complex and advanced system, and they require heavily on a deep understanding of your own problem space, the solution space generators are good for, and if those two mesh well. This is the nature of compiler-development (and writing SGs effectively makes you have to now understand that well in order to execute well). Do i wish we had better docs and ways of teaching this all to make that learning curve better? Definitely. But there are only so many hours in the day, and so much that needs to be worked on.

that you aren't explaining why in ways that we can understand is frustrating

Look, all i can say is that you are in very advanced waters here. If you want more info, come to our discord. We work heavily with the community there esp on generators.

Please consider authoring a comprehensive MSDN article/blog post/file in this repo,

Honestly, i really do not have hte time or inclination. SGs aren't even my area of ownership. I just have a ton of knowledge on them because i had to adapt so much code to use them effectively, and i ended up writing the v2 attribute-based api to solve so many perf issues we were seeing. A lot of that was work on my own time, which consumed just a tremendous amount of time and energy as well. I wish i had infinite energy and time to get to all that other stuff, but i really don't. Maybe someone else can invest there (maybe it can be you! :)).

ChristophHornung commented 7 months ago

@CyrusNajmabadi I understand both concepts but fail to see how that answers the question of this issue. This issue basically boils down to:

During our incremental, fast, but probably complex code generation, how do we log what is happening so we have a chance to diagnose issues?

An analyzer can analyze the output of our generation but does not give us insight into the generation itself unless we inefficiently add the log to the generated output somehow.

Ideally I would expect there to be a way to log output just like MSBuild does in the verbose setting e.g., to be able to somehow get more introspection into a problem in the generation that a user experiences.

fandrei commented 3 weeks ago

@ChristophHornung even if the developers hate us users, we still can use the old fashioned DebugView.

Skyppid commented 2 weeks ago

TL;DR: Logging would be nice even though diagnostics exist. I would say diagnostics are not the best tool to quickly identifiy problems. A simple log to the compiler output / whatever feeds the IDE's output during builds would be great. Console.Log isn't working reliably depending on the IDE afaik.

I currently upgrade our old primitive source generators to incremental generators. While diagnostics is an option I feel like a simple logger would be so much more convenient to use as a developer. I don't see diagnostics as debugging tool for me to write a generator but as a mean to show actual issues to the enduser of the generator (in this case also me, but still).

I don't get why one of the most basic principles in software engineering has been ignored here. Microsoft.Extensions.Logging has been around for probably the same time and is being actively used in all parts of .NET. I could see fit in Roslyn as well. But in the end it's a matter of preference. I mean I can still hack logging into it, but it would be nice if it was actually supported by the tools around it.

Source generators have significant flaws in a lot of places (mostly due to external factors as well, but it's a fever dream working with them). Starting at the limited .NET Standard 2.0 framework which results in issues like: enforced old coding style (no newer C# versions nor .NET features allowed to use), deprecated references in your projects when using the generators (still need to reference some ancient Microsoft.CodeAnalysis.CSharp version which interferes with other code that uses newer versions). And after that comes the sheer lack of support in Visual Studio. I have moved to Rider because Visual Studio is still not able to run my code generator. It just doesn't output anything. But nowhere does it communicate why it fails. Where we land back at the lack of logging.

So in the end, while I absolutely love the feature of Source Generators the overall implementation is a nightmare and hasn't improved much ever since they launched years ago. And please: To every single developer who worked on them, this is no front against you. I know how things work and lots of things fall behind due to other priorities. It's just a fact that source generators haven't been integrated very well from a developer perspective. It feels almost like a hack due to the poor support in IDEs (Rider currently being the only one that works which really isn't a good sign for Microsoft's own products).

CyrusNajmabadi commented 2 weeks ago

Source generators have significant flaws in a lot of places (mostly due to external factors as well, but it's a fever dream working with them). Starting at the limited .NET Standard 2.0 framework which results in issues like: enforced old coding style (no newer C# versions

This is not true. You can use the latest versions of c# just fine.

still need to reference some ancient Microsoft.CodeAnalysis.CSharp version which interferes with other code that uses newer versions).

This is not true. You can reference the latest CodeAnalysis libs.

And after that comes the sheer lack of support in Visual Studio. I have moved to Rider because Visual Studio is still not able to run my code generator. It just doesn't output anything. But nowhere does it communicate why it fails. Where we land back at the lack of logging.

File an issue with a repro please.

and hasn't improved much ever since they launched years ago.

We've completely changed the implementation. Improved performance over 1000x, and have redone the ide integration so that it can run without destroying performance.

In the latest release we updated them so that they can automatically reload when they change without having to restart vs.

fandrei commented 2 weeks ago

@CyrusNajmabadi

You can use the latest versions of c# just fine.

Aren't .NET Standard projects limited to C# 8?

CyrusNajmabadi commented 2 weeks ago

No. :-)

See Roslyn for example. We use the latest version of the language in our own generators.

fandrei commented 2 weeks ago

@CyrusNajmabadi this must be something new, because they definitely were. A freshly created .NET Standard project with <LangVersion>latest</LangVersion> doesn't even allow file-scoped namespaces. Does it require to cast some extra spells?

CyrusNajmabadi commented 2 weeks ago

Just set the language version to whatever version you want it to be. :)

fandrei commented 2 weeks ago

@CyrusNajmabadi i.e. latest doesn't really work in this case?

CyrusNajmabadi commented 2 weeks ago

No clue.

fandrei commented 2 weeks ago

In any case, logging is essential. It should be possible to write logs to the file, at very least.

CyrusNajmabadi commented 2 weeks ago

Why not just unit test these (which is our recommendation) and see what's going on?

fandrei commented 2 weeks ago

@CyrusNajmabadi why not both? Unit tests simply are not the real environment.

CyrusNajmabadi commented 2 weeks ago

Standard reasons: Costs, maintenance, complexity, etc.

Unit tests have served our needs for years, so investing in redundant systems isn't a good allocation of resources for us IMO.

ChristophHornung commented 2 weeks ago

@CyrusNajmabadi UnitTests are nice to test and verify locally. But if others are using my library they are not an option. And since no possibility for logging exist there is no easy way to diagnose what the issue might be. To give a real world example: I coded a very simple code generator that just generates code to access EmbeddedResources here e.g. but some problems with edge cases occurred only when others starting to use the generator. Having the ability to get a log like from Msbuild would have been massively helpful. What is your recommendation here? Do we agree on the usefulness of logs?

CyrusNajmabadi commented 2 weeks ago

What is your recommendation here?

Write a unit test with the scenarios their code is hitting.

fandrei commented 2 weeks ago

@CyrusNajmabadi logging is not "redundant". It helps to diagnose bugs that nobody expected when developing the code and writing the unit tests.

Skyppid commented 2 weeks ago

This is not true. You can use the latest versions of c# / CodeAnalysis library just fine.

Well, C# that's true. I phrased it badly. What I meant is that by .NET Standard 2.0 limitation you cannot use any library made with newer versions. Our generator is tightly coupled with our database library and needs to use some of it's types. Since it's .NET 8 though, I cannot use it and thus have to rely on unsafe string-based type lookups.

CodeAnalysis might work nowadays, but when I wrote the initial generator almost 2 years ago, it failed with a very well hidden log output saying that it cannot reference a newer one due to reasons.

We've completely changed the implementation. Improved performance over 1000x, and have redone the ide integration so that it can run without destroying performance.

I agree it has improved a lot. As I said, this was not meant as a front towards the developers working on it anyways. But let's focus back on the topic. What @ChristophHornung said is a valid point. We also want to open source our library (and thus the generator with it) and I can already sense issues with odd edge-cases. While you can write UTs once you identified them, logging is the step to get there finding out what's actually wrong. Logging is just as common a tool for devs as UTs are. It's not either or, they both have their advantages and disadvantages. Rather have their symbiosis I'd say.

@CyrusNajmabadi logging is not "redundant". It helps to diagnose bugs that nobody expected when developing the code and writing the unit tests.

Especially in an area where you often can't debug at all. The times I was irritated by an empty output or missing lines in my generated files because something was wrong and I just couldn't see where things went south.

dotnet / roslyn

Proposal: logging APIs for source generators. #50208