dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.92k stars 4.01k forks source link

Endless loop in 'Microsoft.CodeAnalysis.ChildSyntaxList.ItemInternal(SyntaxNode, int)' #57343

Closed RalfKoban closed 2 years ago

RalfKoban commented 2 years ago

Version Used: 3.11.0

Steps to Reproduce: I've written a codefix provider that returns a new document which somehow gets handled.

However - unfortunately - I'm not able to provide a repro, but I don't call Microsoft.CodeAnalysis.ChildSyntaxList.ItemInternal() directly inside my code (it seems to get called via GetDocumentationCommentXml()).

The codefix provider works on a DocumentationCommentTriviaSyntax and is coping with XmlTextSyntax elements (replacing them), so it might be that it creates wrong syntax nodes.

Expected Behavior: Microsoft.CodeAnalysis.ChildSyntaxList.ItemInternal() quickly returns.

Actual Behavior: The call seems to never return. There is an endless loop inside the while(true) loop when attempting to find a slot.

According to the comment inside that method:

            // find a slot that contains the node or its parent list (if node is in a list)
            // we will be skipping whole slots here so we will not loop for long
            // the max possible number of slots is 11 (TypeDeclarationSyntax)
            // and typically much less than that
            //
            // at the end of this loop we will have
            // 1) slot index - slotIdx
            // 2) if the slot is a list, node index in the list - idx
            // 3) slot position - position
            while (true)
            {
                greenChild = green.GetSlot(slotIndex);
                if (greenChild != null)
                {
                    int currentOccupancy = Occupancy(greenChild);
                    if (idx < currentOccupancy)
                    {
                        break;
                    }

                    idx -= currentOccupancy;
                    position += greenChild.FullWidth;
                }

                slotIndex++;
            }

the loop should not take a long time because of the max. possible number of 11.

However, when I paused in the debugger I saw that the local variable slotIndex is already negative. As it was initially set to zero (0), it seems that there was an overflow.

When stepping through the code I saw that greenChild is set to null (as returned by green.GetSlot(slotIndex)), so the condition inside the if block will never be true and the complete while loop will therefore take forever.

RikkiGibson commented 2 years ago

Are you able to give us a complete stack trace for the scenario?

RalfKoban commented 2 years ago

I've made a screenshot, hopefully it'll help. parallelstacks

What I can say is that my codefix provider most probably messes-up the XML documentation in the code but I don't know exactly how the resulting code would look like.

The codefix provider works on following C# document:

/// <summary>
/// 
/// </summary>
public class TestMe
{
}

I assume (but I don't know due to the endless loop) that the result that my provider produces would look something like:

/// <summary>
<para/></summary>
public class TestMe
{
}

So there is definitely an issue in my provider I have to fix but the unit test that would show up the result never completes.

jcouv commented 2 years ago

@RalfKoban Thanks for the additional information. Might there be a way for you to share a dump from the situation as well? When you have the infinite loop in the debugger, you can pause the process and dump the process from the debugger menu. You can then share the file via some online drive and give us a link here or in email if there's anything confidential (jcouv@microsoft.com).

RikkiGibson commented 2 years ago

You can confidentially upload a crash dump using the How to report a problem documentation.

RalfKoban commented 2 years ago

Steps to reproduce:

  1. Fork from commit "https://github.com/RalfKoban/MiKo-Analyzers/commit/5c6260a2c46abe89d5d600cad5308c3b234ea78a"
  2. In MiKo-Analyzers/MiKo.Analyzer/Rules/Documentation/MiKo_2214_CodeFixProvider.cs replace the "GetReplacements" method with following code:

        private static IEnumerable<SyntaxNode> GetReplacements(XmlTextSyntax text)
        {
            var tokens = text.TextTokens;
    
            var replacements = new List<SyntaxNode>();
    
            var tokensForTexts = new List<SyntaxToken>();
    
            int i;
            var tokensCount = tokens.Count;
    
            for (i = 0; i < tokensCount - 1; i++)
            {
                var token = tokens[i];
                var nextToken = tokens[i + 1];
    
                if (token.IsKind(SyntaxKind.XmlTextLiteralToken) && nextToken.IsKind(SyntaxKind.XmlTextLiteralNewLineToken))
                {
                    if (token.ValueText.IsNullOrWhiteSpace())
                    {
                        // that's an issue, so add the already collected text as a new XML text
                        replacements.Add(XmlText(tokensForTexts));
    
                        // now add a <para/> tag
                        replacements.Add(SyntaxFactory.XmlEmptyElement(Constants.XmlTag.Para));
    
                        tokensForTexts.Clear();
                    }
    
                    tokensForTexts.Add(token);
                }
            }
    
            if (replacements.Count == 0)
            {
                // nothing to replace
                return new[] { text };
            }
    
            return replacements;
        }
  3. In MiKo.Analyzer.Tests\Rules\Documentation\MiKo_2214_DocumentationContainsEmptyLinesAnalyzerTests.cs run the test "Code_gets_fixed_for_empty_line_comment"

-> Test fails with a timeout.

jcouv commented 2 years ago

Re-assigned to load-balance

AlekseyTs commented 2 years ago

I think the problem is in the user code above https://github.com/dotnet/roslyn/issues/57343#issuecomment-951975628. It creates XmlTextSyntax node with no tokens, tokensForTexts is empty at the time. I think this isn't a right thing to do. A node cannot be based on absolutely nothing, it should have at least one token.

RalfKoban commented 2 years ago

From an API consumer's point of view I would like to be informed if I make a wrong usage of the API. If it is not allowed to provide no tokens when creating an XmlTextSyntax, I - as a proposal - would like to get an exception (such as an ArgumentException) thrown that points me on that invalid call so that I can fix it.

Unfortunately, in the current case I'll end in an endless loop later on inside Roslyn and I don't know where/what to fix.

CyrusNajmabadi commented 2 years ago

The rule of Roslyn is that you must make the same trees that the parser would produce. Any other tree results in undefined behavior.

Adding all the checks would be an enormous undertaking and would effectively mean duplicating all the behavior of the parser in another location.

Our advice is thus: if generating an equivalent tree is too much of a burden, don't. Instead just generate the right text and run it through the parser so you get the right result.

You can also do this like so: SyntaxFactory.Parse(your constructed node.ToFullString())