Open CyrusNajmabadi opened 9 months ago
Generally speaking, i want us to start with something reasonable. And then drive additional capabilities as additional API requests here.
Love it. I have needed & have built such a utility several times. Having it in the platform would be great.
Some (small) design feedback. This pattern of signatures would appear to be biased towards C-based languages:
public Region EnterBlock()
=> EnterIndentedRegion("{", "}");
public Region EnterIndentedRegion()
=> EnterIndentedRegion("", "");
private Region EnterIndentedRegion(string open, string close)
...
I believe the final (private
) signature is probably more useful as a public
member.
Even if we limit ourselves to C#
we would likely be restricted without it. For example, I may be declaring a complex (multi-line) array, so need to use [ ]
, or a complex parameter set ( )
. In other language such as SQL
or VB
I may need BEGIN END
, or creating a SQL
table or procedure (directly or within a C#
string) may need ( )
. And even a large comment block may be served by using /* */
, and a verbatim string by """, """
. And so on.
Perhaps:
// Same, curly is default
public Region EnterIndentedRegion()
// Note public
public Region EnterIndentedRegion(string open, string close)
// Nice but not necessary
public Region EnterIndentedRegion(IndentKind kind) // Enum with common delimiters, eg Curly, Round, Square
This seems like something that would be better suited to individual source generators, and not a public API for Roslyn.
Some (small) design feedback. This pattern of signatures would appear to be biased towards C-based languages:
Correct. This is intended entirely for use by Roslyn source generators.
This seems like something that would be better suited to individual source generators, and not a public API for Roslyn.
We have found nearly all our partners needing this. As such, we should provide out so that we don't have N teams having to roll their own version.
Our existing strategy has simply not worked out, with teams seeing massive perf costs trying to do this themselves. We have to lead by example, with best in class libraries and examples they can leverage.
Effectively, we have to start digging ourselves out of the major hole we're in. Hoping for our partners and this parties to pick this up has proven ineffective.
I believe the final (private) signature is probably more useful as a public member.
Sure. This seems reasonable to me.
There's been some push from Roslyn that not all code-generation tools need to be Roslyn source generators. Would there be any appetite for shipping a high-performance intented text writer type outside of Roslyn (possibly in the BCL in an out-of-band assembly) and providing the integration with Roslyn types (in particular the public SourceText GetSourceTextAndClear()
method) as an extension method in Roslyn on top of the type?
We're looking at moving to an text writer model for the interop source generators and we'd love to use this type; however, many interop code-gen tools don't need to be source generators (and as per the Roslyn team's advice, will not be source generators). We'd like to share the code between the two scenarios without pulling in references to Roslyn for the non-SG scenarios.
As an extension, we likely could not get access to the internals needed to make GetSourceTextAndClear fast, which would be an issue.
Would there be any appetite for shipping a high-performance intented text writer type outside of Roslyn
I'd be ok shipping this as a source-package, with the roslyn bits coming in as an additional partial type.
Would there be any appetite for shipping a high-performance intented text writer type outside of Roslyn
Note: outside of roslyn, i don't think you need a high-performance writer. Roslyn needs it because we have to source-gen to get compilations, and any edit may end up rerunning generators. For normal apps that just write out code, any normal path is fine. Heck, use the roslyn syntax-model, with all the allocs/CPU that entails.
In other words, this is to get a generator down from hundreds of MS, to a handful of them, so that those costs don't add up as we run them continuously. For something that runs only on build, saving a few hundred ms is just not that big a deal :)
We'd like to share the code between the two scenarios without pulling in references to Roslyn for the non-SG scenarios.
Fair. Making this easily source-consumable would be nice.
I spoke offline with Cyrus and we came up with a possible alternative solution for the "define outside of Roslyn, consume within Roslyn efficiently" proposal of mine:
Instead of storing a SourceText
in the IndentingStringBuilder
, the builder would store/produce a ReadOnlySequence<char>
. We would have a ReadOnlySequence<char> GetTextAndClear();
method that would transfer out the existing data storage and reset it. Roslyn would provide an extension method SourceText GetSourceTextAndClear(this IndentingStringBuilder builder);
which would make a SourceText
around the ReadOnlySequence<char>
without copying it.
We though ReadOnlySequence<T>
would be a good fit as it allows us to represent the generated text like how SourceText
abstracts; it allows us to split a buffer to keep it under the LOH limit but still present a good API surface for reading like how StringText
vs LongText
does in Roslyn. By using a "Get and Clear" method like above, the extension method could assume that there will not be any more writers and as such could avoid an expensive copy.
There's been some push from Roslyn that not all code-generation tools need to be Roslyn source generators.
Having been the author/maintainer of a non-Roslyn source generator, I would not recommend this path even to someone I disliked. It is unpleasant on the best of days. The integration points are poorly defined, changed numerous times over the years, and are impossible to test to any level of confidence since there is no record of when/how things changed.
Background and Motivation
Today, many SG authors attempt to create code using the SyntaxModel. This is both arduous and extraordinarily slow. Code has to figure out how to properly create C# syntax constructs (non trivial in many cases like indentation, doc comments, and the like), or avoid trivia entirely, and then use NormalizeWhitespace to normalize it. This is extremely slow in practice, and consumes lots of memory with all the intermediary constructs.
Emperical testing with the real world ComputeSharp and MVVM toolkits show a real gain of nearly 100x in terms of CPU usage and 100x in terms of memory usage by having the generators write directly to a string builder.
However, directly using a stringbuilder is slightly unpleasant, as a user has to keep track of the most common construct that makes code legible: indentation.
Proposed API
The sketch of what i'm proposing is a dedicated IndentingStringBuilder that one could use to generate indented code, designed to be good at that job, and otherwise as bare metal as possible.
The major design points are:
{ ... }
and other indented regions of code.Note: We should move our own Syntax-generator to this, and work with community to try out our prototypes of this to help refine the final shape. As such, we should likely ship this as experimental so we have some time to revise.
Sketch:
Usage Examples
API draft here: https://github.com/dotnet/roslyn/pull/71163
You can also imagine helpers/extensions (or instance members) for common patterns. For example:
Which could then be used like so:
This would help for the common cases of wanting blank lines between things, but not at the start or end. And so on.
Other helpers the community has found useful are:
Which allows for common patterns of only writing out data if a condition holds, helping to avoid lots of
if (...)
tests in code before writing things out. It's likely we want this, especially as it woudl allow us to do the interpolation-handler form ourselves, without forcing users to have to provide that.