dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.04k stars 4.03k forks source link

Design a formatter extension API #31691

Open sharwell opened 5 years ago

sharwell commented 5 years ago

Currently the formatter is only configurable by a predefined set of options. We should design a public API allowing extensions to the formatter, so automatic formatting operations triggered by user actions in the IDE are better able to handle uncommon coding patterns. Projects using code formatting styles not supported by the current set of options could install a formatter extension to provide the missing functionality.

Related

CyrusNajmabadi commented 5 years ago

I'm interested in this (especially around the idea of easily being able to add formatting analyzers to ensure things like Roslyn/IDE consistency). When you do this design work, can you include me? Thanks!

sharwell commented 5 years ago

The first part of the evaluation will be determining if it will be sufficient to allow extensions to add items to this list:

https://github.com/dotnet/roslyn/blob/def06e4bd7f53ab7989bd0c86505aac6f80d9e95/src/Workspaces/CSharp/Portable/Formatting/CSharpSyntaxFormattingService.cs#L33-L46

CyrusNajmabadi commented 5 years ago

@sharwell Makes sense to me. I'm of two minds about it.

  1. it's a good idea since this is the same system the rest of hte formatter is built off of.
  2. it's not clear to me if the current approach to a formatting rule is one we want users to use. From my own experience trying to write a formatting rule, it's both exceptionally complex, and often very hard to diagnose what went wrong.

There's a part of me that just likes the idea of allowing someone to add just simple tree-transforms. We could even make it more narrow and just allow trivial updates. Or allow different types of "rules" for these more common cases.

lostmsu commented 5 years ago

@sharwell currently, IndentBlockFormattingRule handles all types of blocks internally. So to cover #24129 , it has to be refactored. Just adding a new rule to that list will cause conflicting processing.

JC3 commented 3 years ago

This sounds neat but... effectively it seems the same as adding better options to the editor settings then writing a .editorconfig file, except this is a lot more effort for the user who just wants to e.g. not indent namespaces.

The C++ editor has a mature and very satisfactory set of options, and could serve as a very good basis of options to provide. Then a .editorconfig file would basically be the same as a "formatter extension", except written in a different language (it'd express the same intents, as a settings document instead of a C# source file).

The formatter extension API seems cool but realistically how much usage is it going to get? It seems like it will primarily be users adding feature that are similar to the C++ editor and then, once somebody nails them all in one extension and makes that publicly available, that's about it for that -- there will be one or two "Better C# Formatting" plugins in the extensions marketplace and nobody will really need to touch the formatter extension API again (I hypothesize). It's a lot of extra layers between the current state and "add spaces around binary operators" or whatever.

Plus, I mean, progress on this doesn't exactly seem promising, does it?

CyrusNajmabadi commented 3 years ago

@JC3 It's unclear to me what you're asking for here.

sharwell commented 3 years ago

effectively it seems the same as adding better options to the editor settings then writing a .editorconfig file, except this is a lot more effort for the user who just wants to e.g. not indent namespaces.

No, there is a huge difference in quality expectations between the two approaches. In is algorithmically impossible for us to test all the combinations of options we have today, and every single option we add at minimum doubles the target test space. On the other hand, an extensibility layer means individual projects can design a single set of options with known values, bringing the test space down to one known and well-tested set. Also, bugs can be addressed by that team if/when they arise, and those fixes can be made quickly with understanding of the specific needs of a single project (or set of projects). Changing built-in options is nearly impossible, even in cases where most users would disagree with them.

The formatter extension API seems cool but realistically how much usage is it going to get?

StyleCop Analyzers is one project that would likely provide formatter extensions, so all projects using those analyzers would fall into the category of using formatter extensions, even if they aren't aware those extensions were created. Over time, it's reasonable to expect large numbers of projects using formatter extensions.

rjgotten commented 2 years ago

The formatter extension API seems cool but realistically how much usage is it going to get? It seems like it will primarily be users adding feature that are similar to the C++ editor and then, once somebody nails them all in one extension and makes that publicly available, that's about it for that -- there will be one or two "Better C# Formatting" plugins in the extensions marketplace and nobody will really need to touch the formatter extension API again (I hypothesize). It's a lot of extra layers between the current state and "add spaces around binary operators" or whatever.

FWIW even if there were to be only one or two big-name formatting plugins out there, I would (almost) kill for the C# equivalent to ESLint. The current formatter just smacks of limitations. Both in what rules and rule configurations it supports and in how much actual development effort goes into it.

Not having control over how the editor formats your code is already one of the top-ranking annoyances you can saddle developers with. Visual Studio in a way makes it worse by also not even giving you full control over if or when it formats your code. E.g. #51389 -- why are we only able to disable auto-mangling of aligned white-space for assignments within declaration assignments, but specifically not for the much, much more useful cases of object initializer syntax or pattern matching?

In any environment that wouldn't have monopolistic lock-in this kind of thing would build in annoyance to the point where it would viably become a reason to drop the offering and seek alternatives instead. But you don't get that with Visual Studio or VSCode. Because its Roslyn all the way down and the formatting API is clamped down.

At the very least, a succesful third party running dedicated maintenance time on a formatting plugin, means you can actually get shit done and offer actual solutions -- even to small nitpicks. Because small as they may seem to the Roslyn maintainers on their grand scale, they will build major frustration with developers that have no recourse but to deal with the fallout, day after day; week after week. Death of a thousand paper cuts.

One need only look at ESLint to see that the principle behind a dedicated team working on a formatter actually works. And in the small cases where it doesn't a small - maybe somewhat hacked - formatter can be used as a stop-gap. Better yet; separate pluggable formatters can (and in many cases are !!) used as proof-of-concepts before being formally adopted into the core ESLint rule sets.

jez9999 commented 1 year ago

Would like to see this get implemented. I'd like #pragmas to be indented along with my code (yeah, I know they're scopeless, but not indenting them still looks horrible).

tycobbb commented 1 year ago

please.

levicki commented 1 year ago

@sharwell I hope this extension API is going to allow for different vertical column alignment (i.e. with tabs and with tabs+spaces)?

void testfunc()
{
    int             a;
    DateTime            b;
    Dictionary<string, string>  c;
    int             i, j, k, l, m;

    i =  1;
    j =  2;
    k =  4;
    l =  8;
    m = 16;

    if ((i == 1) &&
        (j == 2) &&
        (k == 4)) {
        a = l + m;
    } else {
        a = l - m;
    }
}
jez9999 commented 10 months ago

IMHO this should be bumped up to high priority now. The formatting functionality is rather limited to a set of built-in options, everything bundled under the "IDE0055 Fix formatting" rule so you don't always know what exactly is wrong with the formatting, and the formatting functionality is failing to keep up with new C# language features properly. Two examples I encountered upon upgrading my project to .NET 8:

https://github.com/dotnet/roslyn/issues/71218 https://github.com/dotnet/roslyn/issues/71219

I mean, not even having a way to get the formatter to do [..list] instead of [.. list] with an awkward space in? Please, guys, overhaul the formatting engine. Make it much more flexible, don't group everything under one IDE0055 rule, and make it easy for plugins to customize!

CyrusNajmabadi commented 10 months ago

IMHO this should be bumped up to high priority now

This is not a question of priority. The main issue is that it is not apparent what the design would be for a formatting engine that was fully flexible like it want, while also being simple to extend, while also being fast enough.

and the formatting functionality is failing to keep up with new C# language features properly.

Formatting is always updated to handle new constructs.

to do [..list]

This is not a formatting style the team considers acceptable for collection expressions. Like with the corresponding pattern (x is [var start, .. var middle, var end]), it is very expected that the space be present here.

Make it much more flexible, ... and make it easy for plugins to customize!

The question is how to do this while also being fast enough, and having API be understandable and flexible enough.

As a starting point, how do you design such that any behavior be customizable?

jez9999 commented 10 months ago

This is not a formatting style the team considers acceptable for collection expressions. Like with the corresponding pattern (x is [var start, .. var middle, var end]), it is very expected that the space be present here.

Well sorry, but why should "the team" dictate what is "acceptable"? In JS, the standard for the spread operator doesn't use a space, I prefer it without a space, and it doesn't have a space on the "What's new in C# 12 page" examples:

https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-12#collection-expressions

There are a million customizable options for where to put spaces in various circumstances, I see no reason why this shouldn't be another one.

CyrusNajmabadi commented 10 months ago

Well sorry, but why should "the team" dictate what is "acceptable"?

That's our job. We design the language and implement the features.

In JS,

We're not JS. That's a separate design group and ecosystem.

and it doesn't have a space on the "What's new in C# 12 page" examples:

Thanks. Ill fix.

jez9999 commented 10 months ago

Well sorry, but why should "the team" dictate what is "acceptable"?

That's our job. We design the language and implement the features.

That's not the same as dictating people's exact syntax though, is it? That's why there are a ton of options as to what kind of formatting will be used when it comes to spacing.

And yeah, you'll "fix" the page. Whatever. The fact that it didn't have a space there in the first place shows that I'm not the only one thinking that style looks best. Why would you want to take away the option to customize this? Please consider adding such an option.

Also, if it's such a bad idea to not have a space after the spread operator, why does C# allow it? Why didn't you design a forced space into the language?

CyrusNajmabadi commented 10 months ago

That's not the same as dictating people's exact syntax though, is it?

In a real sense yes. If you want to use these tools (which are optional). Since C# 1.0 we've had many features with no control over their formatting. That's been the normal case for C# for 20+ years now. It's the rare feature that gets an extension point. And generally only when there is enough feedback for it.

And yeah, you'll "fic" the page

Thanks. The change was taken. It'll be updated soon. Thanks for bringing this to our attention.

Why would you want to take away the option to customize this?

Generally, because customization adds complexity and maintenance costs. It also bifurcates the ecosystem. So, generally, the position is that if you want to use these (optional) tools, you're opt'ing into the formatting choices they make (like many linters). We totally get that that may not be desired, which is why it can be trivially disabled so you can format your code however you want.

CyrusNajmabadi commented 10 months ago

Please consider adding such an option.

We will definitely consider adding such an option. Thanks.

jez9999 commented 10 months ago

That's not the same as dictating people's exact syntax though, is it?

In a real sense yes. If you want to use these tools (which are optional). Since C# 1.0 we've had many features with no control over their formatting. That's been the normal case for C# for 20+ years now. It's the rare feature that gets an extension point. And generally only when there is enough feedback for it.

So why did you even allow the spread operator not to have a space after it? Why isn't [..list] invalid syntax? And who wrote that style in the "What's new in C# 12" page, apparently thinking it looked fine?

Generally, because customization adds complexity and maintenance costs. It also bifurcates the ecosystem. So, generally, the position is that if you want to use these (optional) tools, you're opt'ing into the formatting choices they make (like many linters).

This is ridiculous. Whether or not a space is there doesn't "bifurcate the ecosystem", people can obviously understand the code just fine whether the space is there or not.

We totally get that that may not be desired, which is why it can be trivially disabled so you can format your code however you want.

Only by putting a big ugly couple of #pragma lines around various individual lines of code, which is highly undesirable.

What do you think about this @sharwell ? Is it reasonable that C# formatting options already provides 22 different customizations for customizing where spaces should be placed because people have different C# styles, but that the formatter always forces a space after a spread operator with no customization?

CyrusNajmabadi commented 10 months ago

So why did you even allow the spread operator not to have a space after it?

Because the language doesn't care. There are almost no places where whitespace is relevant to hte langauge. However, the reverse is true of the formatter/linter. Where whitespace is relevant almost everywhere :)

Whether or not a space is there doesn't "bifurcate the ecosystem",

It definitely does. And it's why some ecosystems (like Go) just have removed options entirely from formatting with tools like gofmt. Precisely because this causes so many religious wars. Heck, just see the passion being demonstrated here :)

Is it reasonable that C# formatting options already provides 22 different customizations

Note: by my rough count, there are around 300 non customized places (or even more depending on how you want to count certain variations). Non-customization is the norm for the language.

jez9999 commented 10 months ago

Ability to have my own custom code style is actually one of the reasons I like C#. Attempts to move it towards One True Style are extremely frustrating and I am 100% opposed to that kind of prescriptivism.

Frankly, what this discussion demonstrates to me is that the code formatting should be a lot more easily customizable with plugins so that this kind of thing becomes a non-issue. The enforcement for spaces after the spread operator should be implemented with a plugin that I could just remove, or modify to have the behaviour I want. Why isn't that the end goal?

CyrusNajmabadi commented 10 months ago

Ability to have my own custom code style is actually one of the reasons I like C#.

You can 100% have your own custom code style. The language and tools allow for it. But the formatter itself is not extensible in this fashion (and we don't know of a design currently that would allow it to be). It's something we're interested in, but haven't found worthwhile to do yet. As such, for the last 20 years, that has meant the majority of the language just formats in a single way by default.

is that the code formatting should be a lot more easily customizable with plugins so that this kind of thing becomes a non-issue.

If you have a design that allows for that to happen, and can be fast and maintainable, def let ys know.

Why isn't that the end goal?

It is a goal. We just are unaware of a system that can effectively provide that in a cohesive fashion with enough performance. For example, how do you expression constraints between the different formatting plugins? How does one override another? How does one suppress another? How do you make sense of it when tehre are hundreds (or thousands of them in play). Note that thsi is the state of hte formatter today (which is internal). And even in this state it is near impossible to touch any part of it without breaking some formatting in some place in an unintended fashion. That's a deep challenge here.

jez9999 commented 10 months ago

Out of interest, why does your team think whitespace after the spread operator is a good idea? It's a unary operator, and every other symbolic unary operator in C# that I can think of is used without whitespace. This: if (! true) { ... } looks silly. Which is probably why I can find various examples scattered around MS Docs of the spread operator being used without whitespace; that's the natural style for people. IMHO you guys should rethink whether your preferred coding style is right.

CyrusNajmabadi commented 10 months ago

It's a unary operator,

It's not a unary operator. It's not an operator at all.

why does your team think whitespace after the spread operator is a good idea?

Spacing strongly indicates it's not binding to, or affecting the thing right next to it (like an operator, or unary operator would).

For example, if this was a unary operator, then:

.. x ? y z

should be thought of as (.. x) ? y : z

But that's 100% not the semantics or intuition for what this is. As above, it's not an operator. And that code example should be interpreted and thought of as if it was .. (x ? y : z). As such, our strong syntactic preference is not to have these abut as it's honestly very misleading.

Note that ! is, as you mentioned, an operator. And using the logic above should be next to the next hting. That because !x ? y : z should be thought of as (!x) ? y : z and not ! (x ? y : z).

jez9999 commented 10 months ago

Not an operator? Yikes. There are places all over MS Docs and tutorials websites calling it a "spread operator".

In your example, it seems like it is an operator, just one with low precedence. It's operating on ((x ? y : z)), presumably meaning that y and z are enumerable, and that it has lower precedence than the ternary operator. From the docs: "The operand of a spread operator is an expression that can be enumerated. The spread operator evaluates each element of the enumerations expression."

CyrusNajmabadi commented 10 months ago

Not an operator? Yikes

Definitely not 'yikes'. As per above, it would be very bad if it was. You'd have to write things out as .. (x ? y : z) instead of the desired .. x ? y : z (and the same with any other sort of expression case. It's not a unary operator. It never was, and we have no desire for it ever to be.

jez9999 commented 10 months ago

As I just edited my above comment to say, how is it not an operator with lower precedence than the ternary operator? It operates on an enumerable operand, no?

CyrusNajmabadi commented 10 months ago

just one with low precedence.

Even if that was how we viewed it (and we we don't), pperators with low precedence should not be formatted as if they tightly bind to an expression and thus have higher precedence than the code around them. That's exactly why spacing here is important. So even with that view, it woudl justify this spacing.

jez9999 commented 10 months ago

If it's not an operator, what is it?

CyrusNajmabadi commented 10 months ago

It operates on an enumerable operand, no?

It doesn't operate at all :) It's not an expression. It's just part of the collection syntax. It's like saying that , is an operator in a collection-expression that adds the next thing.

jez9999 commented 10 months ago

In that case there's a bunch of documentation out there that's very misleading.

CyrusNajmabadi commented 10 months ago

If it's not an operator, what is it?

It's a type of collection expression element. We currently have two types of elements, and we're planning on adding at least one more (but possibly two more).

We have in C# 12:

  1. Expression elements. Like [a, b, c]. That is just an expression.
  2. Spread elements. Like [.. a, .. b ? c : d]. It's a .. followed by any expression. There is no precedence or operating.

In c# 13 we intend to add:

  1. Dictionary elements. Like [a: b]. This is an expression followed by a colon, followed by an expression. As with .. there is no 'operating' on anything.

We are also potentially considering a "conditional element". Which would be something like [? expr] where expr would only be added if it was non-null. Similar to .. this is not operating on anything, and it would be expected that if you had a complex express on the right of this, that ? was distinct and not bound tightly like a unary operator.

CyrusNajmabadi commented 10 months ago

In that case there's a bunch of documentation out there that's very misleading.

Sounds like it. We'll fix it up. Thanks!

jez9999 commented 10 months ago

If it's not an operator, what is it?

It's a type of collection expression element. We currently have two types of elements, and we're planning on adding at least one more (but possibly two more).

From a programmer's point of view, that feels like a distinction without a difference, even if it means something to the compiler.

We have in C# 12:

  1. Expression elements. Like [a, b, c]. That is just an expression.
  2. Spread elements. Like [.. a, .. b ? c : d]. It's a .. followed by any expression. There is no precedence or operating.

There's obviously precedence because you need to decide whether it's (..b) ? c : d or .. (b ? c : d).

In c# 13 we intend to add:

  1. Dictionary elements. Like [a: b]. This is an expression followed by a colon, followed by an expression. As with .. there is no 'operating' on anything.

Yeah... this is why some people say you can have too much syntactic sugar. All this stuff and it's often not clear whether it's some kind of syntactic sugar or a core operator and what the implications of that are.

We are also potentially considering a "conditional element". Which would be something like [? expr] where expr would only be added if it was non-null. Similar to .. this is not operating on anything, and it would be expected that if you had a complex express on the right of this, that ? was distinct and not bound tightly like a unary operator.

Maybe it's time to stop and let the language be more verbose? This kind of obsession with conciseness is how Perl and write-only code came about.

By the way, here:

int[] single = [..row0, ..row1, ..row2];

... the spread operator fits very naturally right next to its "operands" (for want of a better word).

CyrusNajmabadi commented 10 months ago

From a programmer's point of view, that feels like a distinction without a difference

I can't help with that. Some things are operators, others are not. And we treat them accordingly. If it was an operator, you'd be able to use it anywhere were operators are allowed. That is def not ok or appropriate (especially as it would actually conflict with the real operator we have, the range .. operator).

If someone comes and tells me they think that commas are operators in our language, i'll tell them the same.

There's obviously precedence because you need to decide whether it's

No. Precedence literally doesn't exist here. Precedence occurs with constructs where the interpretation is affected by what comes afterwards. for example, if i have x + a * b then the precedence of + matters vs * because it will impact how a is treated (should it be pulled into the + or not). That literally doesn't occur here. In no way, shape, or form does the .. impact parsing of anything.

Yeah... this is why some people say you can have too much syntactic sugar.

Honestly, that's on them. Dictionaries are a core collection type (they make up nearly 20 of all collections made the last time we surveyed the ecosystem (including nuget). Being able to create them has always been part of hte collection-expression design. We just pulled it out of 12 for time and resourcing needs.

Maybe it's time to stop and let the language be more verbose?

As I said, ? is something we're considering, not something we're taking. If you want to give feedback on the design of the language, please give that in the design discussions happening on those topics at dotnet/csharplang.

CyrusNajmabadi commented 10 months ago

the spread operator fits very naturally right next to its "operands" (for want of a better word).

Then you're welcome to format your code that way. The language allows for it. :)

jez9999 commented 10 months ago

the spread operator fits very naturally right next to its "operands" (for want of a better word).

Then you're welcome to format your code that way. The language allows for it. :)

Yeah, but I shouldn't have to waste 2 lines wrapping it with #pragma just to stop the formatter complaining about it (thereby losing all other formatter stuff on that line as well), or even worse, disable all formatting analysis. If you're honestly suggesting I do that everywhere I use the spread operator, I've gotta assume you're being sarcastic. If the language designers saw fit to allow for it, the formatter should at least be able to be configured not to enforce spacing rules of any kind on the spread operator.

No. Precedence literally doesn't exist here. Precedence occurs with constructs where the interpretation is affected by what comes afterwards. for example, if i have x + a * b then the precedence of + matters vs * because it will impact how a is treated (should it be pulled into the + or not). That literally doesn't occur here. In no way, shape, or form does the .. impact parsing of anything.

What I mean by precedence is a more broad decision about what order things will be done given a block of code. You literally can't interpret that code without having the precedence of deciding what order things will be done in. So I don't know, maybe I meant to say "execution order", but I don't see why it should be separated from "precedence" because it's all about needing to know the order of operations. Another distinction without a difference.

CyrusNajmabadi commented 10 months ago

If you're honestly suggesting I do that everywhere I use the spread operator, I've gotta assume you're being sarcastic.

That is waht i'm suggesting. That's our suggestion for any of the hundreds of cases where we do not support customizing formatting and a user wants to do something custom.

(Our first suggestion is simply: do what the formatter does. But if you don't want that, then disabling is our recommendation).

What I mean by precedence is a more broad decision about what order things will be done given a block of code.

That's not really what these terms mean for the actual language though. And, as i pointed out above:

  1. it's def misleading/confusing to think of things those ways.
  2. even if you thought about things those ways, it would motivate formatting in the way we do.

Another distinction without a difference.

Fundamentally, we see these are extremely different and relevant. With your perspective, literally everything is an operator. Comma becomes an operator (because you need to know the thing before the comma executes before the thing after the comma), etc.

The language (and tools) are built out of real rules about how things are actually designed and composed. It's very relevant, for example, that we do not think of expressions as statements, or signatures as types, etc. etc. The distinctions are very relevant, even if some want to conflate them. Thanks! :)

jez9999 commented 10 months ago

Why isn't it reasonable to think of everything as an operator with precedence, though? Yeah, the comma could be considered that way. That's kind of how my brain thinks of code because it needs to know the order thing will execute:

var foo = bar.Baz(.. xyz(1 + 2 * 3), 456);

You need to know that the above is going to end up being:

(var foo) = (bar.Baz(.. (xyz((1 + (2 * 3))), 456)));

You could even make a case for that being the language syntax, though obviously it looks pretty unreadable (though Lisp kind of went that way...)

Interestingly, , seems to be considered an operator in C.

CyrusNajmabadi commented 10 months ago

Why isn't it reasonable to think of everything as an operator with precedence, though?

Because that's not how our language is designed. And it's not a goal to make all our tools work with any arbitrary interpretation of any particular user. You're welcome to think things work a particular way, but we're not going to necessarily support that (just like we would not support someone who think it works in different way diametrically opposed to your own interpretation).

You could even make a case for that being the language syntax

Sure. That would not be our language though. So...

Interestingly, , seems to be considered an operator in C.

Yes. But not ours. Intentionally. Different languages are different. That's why they exist. They have different designs, goals and approaches. You can attempt to use what you know from one to help intuit to anotehr. But that intuition may be vastly incorrect. For example, if i use what i know from C# and apply it to many other languages, i'll find myself incorrect in many cases. Intution and collapsing ideas can be ok to try to get by. But, fundamentally, if you want to work well with a particular language, you'll have to understand that language itself deeply and how it actually works.

CyrusNajmabadi commented 10 months ago

@jez9999 At this poitn, if you have feedback on language design, you should take it to dotnet/csharplang. I get that you think of hte language in a particular fashion, but there's not much we can do about that here. And it's certainly not going to make much difference here in the formatting engine extension API discussion.