New Analyzer: Warn on local/parameter reassignment

HaloFour commented 2 months ago

It seems that proposals to add readonly local/parameter support to C# has stalled. This seems largely due to the feature being overly noisy given it requires an additional modifier of some kind and it would likely apply in the vast majority of cases. Given that C# cannot (or is extremely unlikely to) go readonly by default, I would like to suggest an analyzer that would give developers the option to enforce default readonly-ness of locals and/or parameters to achieve the same benefits.

Tuning the strictness policies

Different developers likely prefer different degrees of reassignment protection based on the scenario. Rather than having a single policy to issue a diagnostic on any reassignment, I propose categorizing reassignment based on the situation and allowing the developer to adjust their severity. For example, a developer might want to warn on reassignment of parameters, but doesn't care about reassignment of locals. Or a developer might want to be much more cautious about potential reassignment of captures in lambdas or local functions.

The following is a list of possible reassignment scenarios:

Reassignment of non-ref/out parameter

void fn(string p) {
    p = "something else"; // diagnostic
}

Reassignment of local

var local = "foo";
local = "bar"; // diagnostic

Passing local/parameter as ref

var local = "foo";
fn(ref local); // diagnostic

Reassignment of captured local/parameter within lambda

var local = "foo";
var fn = () => {
    local = "bar"; // diagnostic
};

Reassignment of captured local/parameter outside of lambda

var local = "foo";
var fn = () => {
    Console.WriteLine(local);
};
local = "bar"; // diagnostic

Reassignment of primary constructor parameter

class Foo(string p) {
    void fn() {
        p = "foo"; // diagnostic
    }
}

This is not meant to be an exhaustive list. My intent is to drive conversation around the categorization of different scenarios of reassignment based on how "dangerous" or "undesirable" they may be considered by a developer. That would provide the policy options to make the analyzer as strict or lenient as desired.

A simpler alternative would be to mark all diagnostics the same, making this a one-size-fits-all analyzer.

Opting into mutability

I believe it would be desirable to provide a simple way to mark that a particular local/parameter is "mutable", suppressing any diagnostics on reassignment. Additionally, a diagnostic could be issued for a declaration marked as "mutable" which is never reassigned. For parameters, this could be implemented through an attribute that the analyzer recognizes by name:

void fn([Mutable] string p) {
    p = "foo"; // no diagnostic
}

For locals there are no existing ways to attach attributes or other custom metadata. In lieu of that, we have a couple of options we can explore to allow a developer to indicate that the local should allow reassignment:

Naming Convention (thanks @Richiban)

The name of the local could be used to indicate whether it should be mutable. Perhaps a prefix like m_ or mutable_ could be prepended and the analyzer would then not issue diagnostics on reassignment:

var mutable_local = "foo";

mutable_local = "bar"; // no diagnostic

Comments

A specially-formatted comment could be added to the declaration which the analyzer would interpret as declaring the local as mutable:

/*mut*/ var local = "foo";
// or
var local = "foo"; //mut

void fn(/*mut*/ string p) { }

Additionally, perhaps a reassignment could also be marked as mutable:

var local = "foo";
local = "bar"; // diagnostic
/*mut*/ local = "baz"; // no diagnostic

fn(/*mut*/ ref local); // no diagnostic

While a comment is a little messy and error prone, at least the developer would have to get it right in order to suppress the diagnostic in that case, and a fixer could probably be provided to automatically add the correct comment. In the future something like "source attributes" would definitely be a preferred metadata mechanism.

Alternatively, the diagnostic could be suppressed using standard C# pragmas.

/cc @RikkiGibson

HaloFour commented 2 months ago

This is a bit of a moon-shot, for the sake of discussion.

RikkiGibson commented 2 months ago

I will try and look in more depth later, but, as a first impression, the // mut comments remind me of // language=regex comments over string literals to force regex highlighting. blog post link to when that was introduced. I personally find it a reasonable solution for the analyzer.

I think marking the declaration with // mut to permit reassignment makes sense. Marking the assignment doesn't really make as much sense to me, as you're just asking for permission to do the thing you're already doing at that exact spot. If user is insistent that only this one place be allowed to reassign then I'd go with a pragma.

RikkiGibson commented 2 months ago

If local variable is not definitely assigned, then assigning it should be fine (not produce a diagnostic even under strict settings).

string str;
if (cond)
    str = GetString(); // ok
else
    GetString(out str); // ok

str = "b"; // warning

Maybe something similar would be used to make out parameter handling more strict. Not sure about the value as much in that case though.

M(out int x)
{
    x = 42; // ok
    // ...
    x = 43; // warning?
}

Further feedback: It feels simple enough to distinguish scenarios we want to be individually configurable by using different diagnostic IDs. I don't have a ton of input on the exact specifics of which to give their own IDs.

I would also say that it feels right to keep it oriented around readonly-by-default, and not build any kind of "mode" which would make mutable the default and opt-in with comments to readonly.

HaloFour commented 2 months ago

Excellent ideas. I like limiting out parameters to a single assignment.

HaloFour commented 2 months ago

I do mention it in the proposal but I do want to call out any potential controversy in using comments as a mechanism to mark locals or parameters as mutable/reassignable. Apart from being potentially clumsy, I don't know that there's much precedent in an analyzer using comments that could potentially affect the build, particularly if they are removed. There's also potential interactions with other tools, such as "minifiers", although I cannot fathom why anyone would want to "minify" C# prior to compilation.

Obviously if there were a better choice, such as source attributes, they would be the preferable option. However, those don't exist, and even if they did that wouldn't help with older versions of the language where this analyzer would still be useful. As an optional opt-out mechanism within an optional analyzer to suppress optional diagnostics, I personally don't find this to be terribly problematic.

Since parameters can support attributes, I think it's worthwhile to also (or only) support a custom attribute being applied to that parameter.

RikkiGibson commented 2 months ago

I would expect people to find the line-comment form of // mut to be more palatable than inline /* mut */. So, maybe attributes will be preferable for parameters, which are often declared multiple on the same line. e.g. void M([Mutable] int param, [Mutable] bool flag) { }.

This is the kind of thing which I would suggest prototyping prior to any commitments. We have to see how this actually feels when onboarding a big project to it. Are the ergonomics acceptable, and does it mostly meet the goals that a language-level readonly locals feature would meet.

HaloFour commented 2 months ago

I would expect people to find the line-comment form of ... to be more palatable than inline ...

I agree. I don't know if we'd want to keep both forms as the inline form could be applicable in more places, e.g.

int.TryParse(str, /*mut*/ out var i);

Although that can be revisited based on demand for such a feature in the future.

This is the kind of thing which I would suggest prototyping prior to any commitments.

👍

HaloFour commented 2 months ago

I'd also be fine if the mutability opt-in is left out out a first phase of such an analyzer. Perhaps it would be useful so infrequently that suppressing the diagnostic via pragma is sufficient? Feedback from people who would use the analyzer around the friction they experience with it could inform the shape such an opt-in feature might take.

IS4Code commented 2 months ago

I appreciate the discussion on the appropriateness of using comments for this. I would much rather prefer sticking with the existing "language" of attributes, used in all other places for code analysis, to keep it consistent with parameters (and potentially other storage locations). I also believe there is no need to bring in a whole new system of source-only attributes just to support this "properly"; just extending the existing attribute facilities to local variables could be enough ‒ see dotnet/csharplang#8460.

HaloFour commented 2 months ago

Even if C# gets source attributes (or some flavor of local attributes), this analyzer should be able to target previous versions of C# that wouldn't support either.

Richiban commented 2 months ago

Nice idea! Have you considered a naming convention for triggering mutability? E.g. num is immutable but m_num is mutable?

HaloFour commented 2 months ago

@Richiban

Have you considered a naming convention for triggering mutability?

Excellent idea, I'll add it to the proposal.

RikkiGibson commented 2 months ago

m_ prefix is used in some codebases for instance fields, so would want to consider some different options, but the general idea of a naming convention seems reasonable.

HaloFour commented 2 months ago

Added a comment that the analyzer could issue a diagnostic for a variable marked as "mutable" which is never reassigned.

DavidArno commented 1 month ago

@RikkiGibson, I do not like the idea of it not warning on after-declaration assignment other than for out params (which have to be allowed do to their nature; in fact I'd treat them as mutable by default). I also do not want such an analyzer to need to do semantics/flow analysis to determine if a re-assignment is occuring. It wants to be fast. I'll live with imprecise warnings instead. So I'd want you code example to behave like this:

string str; // warning - readonly variable is not set when declared
if (cond)
    str = GetString(); // warning - (possible) reassignment  of readonly variable
else
    GetString(out str); // warning - (possible) reassignment  of readonly variable

str = "b"; // warning - reassignment of readonly variable

getting around those first warnings will be trivial when we get expression blocks:

var str = cond ? GetString() : { GetString(out var s); return s; };

DavidArno commented 1 month ago

Nice idea! Have you considered a naming convention for triggering mutability? E.g. num is immutable but m_num is mutable?

Please, no. Hungarian notation has been (almost completely) consigned to the dustbin of history for very good reasons.

DavidArno commented 1 month ago

I would expect people to find the line-comment form of // mut to be more palatable than inline / mut /

That would surprise me greatly:


/* mut */ var x = something; // - it's clunky, but it makes sense and is unambiguous
var y = something; // mut      <- this just looks wrong. Why is control of mutability at the end of the line?
                   // And can that extra part of the comment work with it? Or did I make it readonly again by adding 
                   // stuff after <-
var z = something; // mutual something or other etc comment <- and what happens here if extra comments are allowed?

HaloFour commented 1 month ago

I also do not want such an analyzer to need to do semantics/flow analysis to determine if a re-assignment is occuring. It wants to be fast.

Why do you believe such analysis would be slow? I wouldn't rule it out unless it was demonstrated to be unusable, and I doubt it would even be noticeable.

Please, no. Hungarian notation has been (almost completely) consigned to the dustbin of history for very good reasons. That would surprise me greatly:

I think Hungarian notation is fine here. It encodes semantic meaning that the type system itself doesn't convey, for the purpose of making the variable stand out. However, I wouldn't ascribe one particular way to encode mutability in the name, and several options should be considered.

However, I don't think an analyzer would be as constrained as a language feature as to how it would need to be designed. I think it would be fine if multiple mechanisms are offered by the analyzer, and the developer can choose the one that they like the best.

RikkiGibson commented 1 month ago

The reassignment analysis would probably just use the Roslyn dataflow APIs, which are used all over the place in the IDE, for example, for greying out unnecessary assignments. They're certainly quite a bit less expensive than nullable analysis.

DavidArno commented 1 month ago

The reassignment analysis would probably just use the Roslyn dataflow APIs, which are used all over the place in the IDE, for example, for greying out unnecessary assignments. They're certainly quite a bit less expensive than nullable analysis.

Interesting. I didn't know that functionality existed as APIs. Are they documented anywhere, do you know?

RikkiGibson commented 1 month ago

The methods: https://learn.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.csharp.csharpextensions.analyzedataflow?view=roslyn-dotnet-4.9.0

This may actually be the wrong method for this use case because it doesn't tell you where something was read/written, just that it happened somewhere in the given region. Looking at usages of it, it's more suited for implementing 'extract method', etc. The right way forward will probably involve looking for the most similar existing IDE features and taking some inspiration from how they actually solve it.

To speak more to my original point, I wouldn't expect a flow analysis performed by this analyzer to incur a significant cost, which is not already "amortized" by existing analyzers, which are already causing things like IOperation trees etc. to be realized during compilation.

Basically, these warnings should show up at a similar cadence that "remove unused assignment" grey-outs appear in the IDE.

dotnet / roslyn-analyzers