DevToys-app / DevToys

A Swiss Army knife for developers.
https://devtoys.app/
MIT License
27.54k stars 1.47k forks source link

XML formatter should accept some invalid XML #358

Open BreeceW opened 2 years ago

BreeceW commented 2 years ago

What's the Problem?

The XML formatter will refuse to format if there are any issues with the XML such as undefined namespaces or multiple root elements. For example, I was trying to format an Office ribbon configuration file, which looks like this:

<mso:cmd app="Word" dt="1" />
<mso:customUI
    xmlns:mso="http://schemas.microsoft.com/office/2009/07/customui">
    <mso:ribbon>
        <mso:qat>
            <mso:sharedControls>
                <mso:control idQ="mso:RedoOrRepeat" visible="true"/>
            </mso:sharedControls>
        </mso:qat>
        <mso:tabs>
            <mso:tab idQ="mso:TabHome">
                <mso:group idQ="mso:GroupUndo" visible="false"/>
            </mso:tab>
        </mso:tabs>
    </mso:ribbon>
</mso:customUI>

This isn’t strictly valid because there are multiple root elements and the mso prefix is undefined at the top level. However, some formatting websites will format this correctly nonetheless.

Solution/Idea

The XML formatter (and the formatters in general, really) should accept some malformed content, and format code as best they can, rather than being strictly correct.

Alternatives

Perhaps there should be an indication if the XML is not well-formed, but it should still try to format as much as possible.

Priorities

Capability Priority
This proposal will allow developers to format XML with undefined namespaces Must
This proposal will allow end users to format XML with multiple root elements Must
This proposal will allow developers to format XML with some syntax errors Should

DevToys Version

Version 1.0.2.0 | X64 | RELEASE | b972462 | b972462

Comments

Editor displaying “'mso' is an undeclared prefix. Line 1, position 2.” message

jwfxpr commented 2 years ago

I agree. In general, I think the job of validating XML is separate to the task of formatting XML. Currently, the input string is loaded directly into an empty, unconfigured XmlDocument:

https://github.com/veler/DevToys/blob/6c5fe5583557b952932763c565e80dc2520e9a93/src/dev/impl/DevToys/Helpers/XmlHelper.cs#L64-L65

I think a better approach would be to use an XmlReader with the loosest possible validation settings, to try to be sure we are sticking just to formatting adequately formed XML fragments as best as possible.

This has been a busy week, but on the weekend I might have a chance to play with this a little, if nobody else takes a stab at it. Though probably wisest to wait for PR #364 to be settled first, as the changes will conflict.