KirillOsenkov / XmlParser

A Roslyn-inspired full-fidelity XML parser with no dependencies and a simple Visual Studio XML language service
Apache License 2.0
328 stars 49 forks source link

Double closing slash creates null trivia #59

Closed grokys closed 8 months ago

grokys commented 8 months ago

The following XML:

"<a><b><//b></a>"

Creates a root element with leading trivia containing null (which incidentally seems to break the VS debugger ;) )

image image
grokys commented 8 months ago

Seems to be caused by SyntaxTriviaList.this[] failing to cast a SkippedTokensTriviaSyntax to SyntaxTrivia:

image
KirillOsenkov commented 8 months ago

Indeed ;)

image

grokys commented 8 months ago

I would like to have a go at fixing this but I'm not sure what the correct fix should be. SyntaxTriviaList enumerates SyntaxTrivia via either a SyntaxNode with IsList set, or by casting a SyntaxNode to a SyntaxTrivia. Should SkippedTokensTriviaSyntax be a SyntaxTrivia? Not clear to me how this should work.

grokys commented 8 months ago

Adding the following check to SyntaxTriviaList.ctor shows up a few places where this could potentially fail:

            if (!node.IsList && node is not SyntaxTrivia)
                throw new ArgumentException($"Node {node} must be a SyntaxTrivia");
KirillOsenkov commented 8 months ago

This is how it works in C#: https://github.com/dotnet/roslyn/blob/7ecf61f6d02b5605c97da1408e0440548f6bf863/src/Compilers/Core/Portable/Syntax/SyntaxTriviaList.cs#L138

KirillOsenkov commented 8 months ago

I have no idea about this space but I'm trying to figure it out.

grokys commented 8 months ago

Yeah, not really sure how this should work: the C# version is sufficiently different that I'm not sure what applies here and what doesn't.

KirillOsenkov commented 8 months ago

what I'm seeing is this cast returns null: https://github.com/KirillOsenkov/XmlParser/blob/a407b1a4d353fe22e641f5d6b49170c74128c85a/src/Microsoft.Language.Xml/Syntax/SyntaxTriviaList.cs#L738

_current is a valid SkippedTokensTriviaSyntax, which is a syntax node, not syntax trivia

KirillOsenkov commented 8 months ago

Completely unrelated, but making sure you know about https://xmlsyntaxvisualizer.azurewebsites.net

KirillOsenkov commented 8 months ago

Published https://www.nuget.org/packages/GuiLabs.Language.Xml/1.2.91, try it out and see if it works?

I'm not 100% that this is the right fix, but conceptually I think when you iterate to get trivia from the list it needs to be "flattened", since skipped tokens is stored as a structured tree with nodes, but trivia pretty much seems to be glorified plaintext.

I'll try and double-check later with the Roslyn compiler team to see if this makes sense. But at least this is better than crashing for now and I added a test that verifies that if you iterate over all tokens and trivia you get the exact original text back.

grokys commented 7 months ago

Thanks @KirillOsenkov - that seems to fix the null reference exception at least!

KirillOsenkov commented 2 months ago

FYI @grokys I got an explanation from @cyrusnajmabadi about structured trivia in Roslyn, hope you find this educational: https://github.com/KirillOsenkov/Bliki/wiki/Roslyn-Structured-Trivia