dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.56k stars 4.54k forks source link

Generate strings instead of Roslyn SyntaxNodes in interop source generators #95882

Open jkoritzinsky opened 6 months ago

jkoritzinsky commented 6 months ago

Since the start of the interop source generators, we've always decided to use Roslyn SyntaxNodes as both our source-of-truth as well as our go-to emit strategy. This choice has provided us a number of benefits:

However, it has also lead to a number of problems:

Additionally, the Roslyn team recommends using a string-based approach.

Finally, @Sergio0694 updated ComputeSharp to use an "indented text writer" approach and saw a ~460x performance improvement in CPU time and a ~36x Memory footprint improvement. Even if we only get a fraction of that improvement due to our desire to have an extensibility model, doing the work to rewrite the APIs seems worthwhile.

Additionally, with that amount of improvement, we may also see build-time improvements for assemblies with lots of LibraryImports, such as CoreLib.

ghost commented 6 months ago

Tagging subscribers to this area: @dotnet/interop-contrib See info in area-owners.md if you want to be subscribed.

Issue Details
Since the start of the interop source generators, we've always decided to use Roslyn `SyntaxNode`s as both our source-of-truth as well as our go-to emit strategy. This choice has provided us a number of benefits: - C# constructs are represented the same way as the C# compiler represents them. - We can easily define an API surface that specifies "this API returns a name/expression/statement" without building our own type hierarchy. - We can copy tokens from the source of truth to our emitted code. - We never try emitting code that is only understood by Roslyn compilers newer than we can safely confirm we'll run against as the APIs don't exist. - We could reuse the code in a code-fix to inline the source-generated code. However, it has also lead to a number of problems: - The SyntaxFactory APIs are very verbose. We've basically had to add comments above each block to explain what the code is doing. - `NormalizeWhitespace` (what we use to handle adding whitespace) is extremely slow. - We have at times accidentally added additional GC roots to the `SyntaxNode`s in the source `SyntaxTree`s due to how we structure some of our internal data types. - The source generator API surface requires us to pass strings to Roslyn, so we need to go from `SyntaxNode` to `string` for Roslyn to parse anyway. - We have a few places where we produce invalid syntax nodes because writing out the truly accurate node is a pain (in particular type names). - We never did the code-fix reuse idea and I doubt we'll ever do so (and we can't without additional work due to the above item). Additionally, the Roslyn team recommends using a `string`-based approach. Finally, @Sergio0694 updated ComputeSharp to use an "indented text writer" approach and saw a ~460x performance improvement in CPU time and a ~36x Memory footprint improvement. Even if we only get a fraction of that improvement due to our desire to have an extensibility model, doing the work to rewrite the APIs seems worthwhile. Additionally, with that amount of improvement, we may also see build-time improvements for assemblies with lots of `LibraryImport`s, such as CoreLib.
Author: jkoritzinsky
Assignees: -
Labels: `area-System.Runtime.InteropServices`, `source-generator`
Milestone: -
Sergio0694 commented 6 months ago

Love it! For context, here's my IndentedTextWriter, if you want to reuse or adapt this for the interop generators. It also has a couple of interpolated string handlers included (I'd imagine you already have polyfills for the necessary types, but if not you can either just link them from corelib or perhaps use PolySharp) 🙂

stephentoub commented 6 months ago

FWIW, the regex generator has been using the built-in IndentedTextWriter and it's worked fine. https://github.com/dotnet/runtime/blob/0fae364617f03d1003af0f0b49e57876e51875d4/src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs#L106

huoyaoyuan commented 6 months ago

In case if anyone is unaware, there is also an indented string builder proposed at roslyn side: https://github.com/dotnet/roslyn/issues/71162