andybalholm / cascadia

CSS selector library in Go
BSD 2-Clause "Simplified" License
703 stars 65 forks source link

Error that doesn't allow to parse RSS. #22

Closed Mansiper closed 7 years ago

Mansiper commented 8 years ago

This selector passes test:

    {
        `<item><link>Any link</link><title>Any title</title></item>`,
        "item link:empty",
        []string{
            "<link>",
        },
    },

All other tags work fine. Only link always returns empty text. For example, "item title:empty" returns 0 elements.

andybalholm commented 7 years ago

The link element is a void element in HTML. It can't have any content.

RSS is not HTML, it is XML. Parse it with an XML parser, not an HTML parser.

Trying to parse it as HTML gives you the equivalent of

<item><link/>Any Link<title>AnyTitle</title></item>