`SimpleDiagnostic` class uses reference equality when comparing message arguments

dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.

https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/

MIT License

18.92k stars 4.02k forks source link

`SimpleDiagnostic` class uses reference equality when comparing message arguments #68291

Open eiriktsarpalis opened 1 year ago

eiriktsarpalis commented 1 year ago

Encountered the following issue when trying to debug an incremental source generator false positive:

Minimal reproduction

[Fact]
public void ReturnsEqualValuesForEqualArguments()
{
    var descriptor = new DiagnosticDescriptor(
        id: "0001",
        title: "title",
        messageFormat: "Prameterized format {0}.",
        category: "category",
        defaultSeverity: DiagnosticSeverity.Warning,
        isEnabledByDefault: true);

    string arg1 = new string('a', 5);
    string arg2 = new string('a', 5);
    Assert.Equal(arg1, arg2);
    Assert.NotSame(arg1, arg2);

    Diagnostic diag1 = Diagnostic.Create(descriptor, location: null, arg1);
    Diagnostic diag2 = Diagnostic.Create(descriptor, location: null, arg2);

    Assert.Equal(diag1, diag2); // Assertion failing
}

The issue appears to be caused by the following line:

https://github.com/dotnet/roslyn/blob/687921ffae8ad91a5464473cb1759fd463c345f2/src/Compilers/Core/Portable/Diagnostic/Diagnostic_SimpleDiagnostic.cs#L165

CyrusNajmabadi commented 1 year ago

Diagnostics do not have value-semantics (and arbritrary parts of our stack produce different instances of them which themselves are not comparable). You should store the data you collect to create the diagnostic in some value-record and store that in your model.

agocke commented 1 year ago

@CyrusNajmabadi This was not true in the compiler, as far as I know, which is why the abstract Diagnostic class implements IEquatable. If this isn't the case, what are the expected semantics for Diagnostic.Equals?

agocke commented 1 year ago

In particular, the observed behavior is caused by this line:

https://github.com/dotnet/roslyn/blob/59727ee4103c75edb799bb46e081427a2e1940ef/src/Compilers/Core/Portable/Diagnostic/Diagnostic_SimpleDiagnostic.cs#L165

I suspect that the use of == there, which always produces a reference-equality comparison of the message arguments, is a bug. The right thing to do was probably a.Equals(b), which would have done a virtual call.

This interpretation as a bug also lines up with the hash code implementation, which does a virtual call to GetHashCode. If only reference equality mattered, a hash code implementation specific to the reference (RuntimeHelpers.GetHashCode) could be used instead, but it wasn't.

agocke commented 1 year ago

OK, went back and looked at the pre-github source code. I'm confident this is a bug.

At first, we had arguments in SimpleDiagnostic. Then, we briefly decide to move them elsewhere in the hierarchy. Later, we decided to bring them back, in much the same form they are in right now. However, the contact added (a, b) => a == b where it wasn't there before, despite the semantics being equivalent to the previous design. I think this was a typo, and the behavior should have matched the original version, which effectively called object.Equals for each argument.

jasonmalinowski commented 1 year ago

Not entirely sure why this landed on me -- @mavasani or @agocke does this need to be moved to the compiler?

Klotzi111 commented 1 year ago

I just encountered this bug while developing my source generator. After some time investigating what was causing the bug in my source generator I found this bug via debugging. I had to use a silly Dictionary<string, string> that looks up the first instance of that string (in my used context at least) by equal strings so that the Equals method of SimpleDiagnostc would give me true for two "equal" instances.

333fred commented 12 months ago

Whether or not diagnostic arguments are compared sequentially is somewhat of a moot point from the perspective of source generators. Diagnostics know their Locations, and because Locations know what SyntaxTree they come from, that will change every time a file is edited. Making the message arguments compare correctly will not make Diagnostic safe to be used in a source generator model.

Arguably, we should never have made it possible to report diagnostics from a source generator itself, but instead forced the use of analyzers to report issues. It's basically impossible to do this correctly.

eiriktsarpalis commented 1 month ago

Whether or not diagnostic arguments are compared sequentially is somewhat of a moot point from the perspective of source generators.

Following up from https://github.com/dotnet/runtime/issues/92509#issuecomment-2277595397, the incremental source generator API has been shipped to customers with diagnostics support, so it stands to reason that a trivial bug like this one should be fixed. It's without question substantially less expensive compared to asking sg authors to migrate all their diagnostics to separate analyzers.