Kotlin / kotlinx.html

Kotlin DSL for HTML
Apache License 2.0
1.6k stars 130 forks source link

SAXParseException when trying to add an unclosed, raw `<link>` tag into a `head { ... }` block #247

Closed bitspittle closed 3 months ago

bitspittle commented 9 months ago

Specifically, org.xml.sax.SAXParseException; The element type "link" must be terminated by the matching end-tag "</link>".


Expected: kotlinx.html can receive unclosed <link> tags in the <head> element. Actual: The <link> element, which is a void element and does not have to close in valid html (and in fact is how kotlinx.html generates it) is getting triggered with a parse exception by kotlinx.html

Repro steps

Here's a very simple example to show the issue:

println(createHTML().head {
    link {
        rel = "stylesheet"
        href = "https://example.com/fake.css"
    }
})

which outputs:

<head>
  <link rel="stylesheet" href="https://example.com/fake.css">
</head>

Writing the kotlinx.html code to represent that...

document {
    append {
        head {
            unsafe {
                +"<link rel=\"stylesheet\" href=\"https://example.com/fake.css\">"
            }
        }
    }
}

results in the unexpected stack trace:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 74; The element type "link" must be terminated by the matching end-tag "</link>".
        at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:261)
        at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
        at kotlinx.html.dom.HTMLDOMBuilder$UnsafeImpl$1.unaryPlus(dom-jvm.kt:98)
        at ...
qwertukg commented 5 months ago

u should close tag like this: "<link rel=\"stylesheet\" href=\"https://example.com/fake.css\"/>" - / before >

bitspittle commented 5 months ago

@qwertukg ah, I think you're missing the point. My example shows that the very html that kotlinx html itself generates, meaning it is valid html, turns into a parse error when you feed it back into itself.

The reason I ran into this is because I had a pipeline where one part generated html from kotlinx and then another part consumed it. For the second part of the pipeline, the output of the first part is opaque, and its consumption automatic. You can't just edit it by hand because there's no human in the process.

I worked around it in an ugly way long ago but this should not be a parse exception.

severn-everett commented 3 months ago

The issue is that you're not quite "feeding it back into itself". The document() function that you're using in the second example - along with createHTMLDocument() - is the Java-based builder for creating a full XML structure, so passing in a raw string that contains an unclosed tag is causing the exception in your reproduction code. When building the HTML structure directly, this is not a problem due to the code in this library rectifying the difference between HTML and XML; passing a string in directly bypasses the code and goes straight to the SAX Parser, hence the exception. Given the unsafe() function is primarily for dealing with Javascript and CSS code, this issue might be out of the scope of the library.

bitspittle commented 3 months ago

@severn-everett That's fair. I'll go ahead and mark the issue closed. I'm sure the team is busy, but you can of course reopen if the team wants to look into it more.

It's too bad SAX can't be configured to be more flexible about void elements (or if it can, it's probably really tricky to do :) I remember fighting with SAX a decade ago and I'm not advocating the kotlinx html team spend any time fighting SAX because it seems like this isn't a common problem :)

At some point, our codebase switched to using Jsoup to parse the incoming html text, which seems like a clean enough workaround to recommend in case anyone else comes across this thread in the future.