dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.92k stars 4.02k forks source link

Round Tripping Causes SyntaxNode.IsEquivalentTo() to Fail When There Are No Changes #32737

Open aolszowka opened 5 years ago

aolszowka commented 5 years ago

I have written a small utility to enforce some formatting rules using the Roslyn Formatter; after putting it in CI under our code base a single file kept popping out; the resulting format resulted in no changes which may indicate that there is an issue somewhere in SyntaxNode.IsEquivalentTo(SyntaxNode).

I stripped down our utility program to the bare minimum and ran the trouble file through Delta Debugging (https://www.st.cs.uni-saarland.de/dd/) using this utility which produced this nonsensical minimal test case, any removal of any character in this file will cause it to pass (I have also attached the file just in case this is something weird with the file format it is minifail.cs in the zip file).

e t
{
    I
    //

Below is the test program (also included in the ZIP file) that will help reproduce the issue

namespace RoslynRoundTripBug
{
    using System;
    using System.IO;
    using System.Linq;
    using Microsoft.CodeAnalysis;
    using Microsoft.CodeAnalysis.Formatting;
    using Microsoft.CodeAnalysis.Options;

    class Program
    {
        static void Main(string[] args)
        {
            int exitCode = 0;
            string targetFile = args.First();
            string tempFileName = Path.ChangeExtension(targetFile, ".tmp");
            (bool WasFormatted, bool WasIdentical) roundtripFormatFailed = FromFile(targetFile, tempFileName);

            if (roundtripFormatFailed.WasFormatted && roundtripFormatFailed.WasIdentical)
            {
                exitCode = 1;
            }

            Environment.Exit(exitCode);
        }

        internal static (bool WasFormatted, bool WasIdentical) FromFile(string pathToDocumentToFormat, string tempFile)
        {
            bool wasFormatted = false;
            bool wasIdentical = false;

            ProjectId projectId = ProjectId.CreateNewId();

            using (AdhocWorkspace adhocWorkspace = new AdhocWorkspace())
            {
                Solution solution =
                    adhocWorkspace
                    .CurrentSolution
                    .AddProject(projectId, "AdhocProject", "AdhocAssembly", LanguageNames.CSharp)
                    .AddDocument(DocumentId.CreateNewId(projectId), Path.GetFileName(pathToDocumentToFormat), new FileTextLoader(pathToDocumentToFormat, new System.Text.UTF8Encoding(false)));

                OptionSet formattingOptions = _GetFormattingOptions(adhocWorkspace);

                foreach (Project project in solution.Projects)
                {
                    foreach (Document document in project.Documents)
                    {
                        SyntaxNode formattedDocument = Formatter.Format(document.GetSyntaxRootAsync().Result, adhocWorkspace, formattingOptions);

                        // If there are no changes to the document then we would not format
                        SyntaxNode originalDocument = document.GetSyntaxRootAsync().Result;

                        if (formattedDocument.IsEquivalentTo(originalDocument))
                        {
                            wasFormatted = false;
                        }
                        else
                        {
                            wasFormatted = true;

                            // If its going to be formatted lets save it out
                            using (StreamWriter sw = new StreamWriter(tempFile, false, new System.Text.UTF8Encoding(true)))
                            {
                                formattedDocument.WriteTo(sw);
                            }

                            wasIdentical = CompareFiles(pathToDocumentToFormat, tempFile);
                        }
                    }
                }
            }

            return (wasFormatted, wasIdentical);
        }

        private static bool CompareFiles(string pathToDocumentToFormat, string tempFile)
        {
            return File.ReadAllText(pathToDocumentToFormat).Equals(File.ReadAllText(tempFile));
        }

        private static OptionSet _GetFormattingOptions(AdhocWorkspace adhocWorkspace)
        {
            return adhocWorkspace.Options
            .WithChangedOption(FormattingOptions.UseTabs, LanguageNames.CSharp, true);
        }
    }
}

RoslynRoundTripBug.zip

The only other reference to round tripping issues I can find that might be related are #30212 but that is on Linux (we're on Windows).

CyrusNajmabadi commented 3 years ago

I debugged through this and there's definitely something strange going on. I don't know the root cause, but teh symptom is that the formatter ends up taking the // trivia off of the EOF token and instead add's it to the end of the PropertyDeclaration node. This is an invalid tree that would not have been generated by the parser. as such, the formatter is violating our lexical/syntactic invariants.

That said, i don't know how important this is to fix. But we would certainly take a small fix here that addressed things if done properly.

sharwell commented 3 years ago

@aolszowka I'd recommend using Formatter.GetFormattedTextChanges instead of Formatter.Format. This will allow you to look at the actual changes instead of an intermediate state that isn't directly used.

It would still be good to have it produce the correct tree for this case though.

aolszowka commented 3 years ago

@sharwell Doesn't that only give me the list of changes that need to be made? In the above I need the ability to make the fixes and push them back up as a PR. Or are you saying do both?

CyrusNajmabadi commented 3 years ago

You can take the edits, tehn apply them to the text of the document to get the new document state that can be "push[ed] back up". :)

sharwell commented 3 years ago

It depends on the application. If you need syntax trees and you need them to be accurate, you can use an approach like this:

https://github.com/dotnet/roslyn-sdk/blob/5487d3f9122364427be1f77506cd9912d05f1563/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/CodeActionTest%601.cs#L169-L214

Specifically, you can start with an initialTree and obtain a truly-correct recreatedTree using this:

https://github.com/dotnet/roslyn-sdk/blob/5487d3f9122364427be1f77506cd9912d05f1563/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/CodeActionTest%601.cs#L181-L183

The recreatedTree would not fail the IsEquivalentTo test.

aolszowka commented 3 years ago

tbh we don't care about the syntax trees honestly only the final product.

The way it works in production is we run the tool, if it returns false, tell git to generate a PR based on the changes for that file.

Thank you for the links though, I'll dig on it!