Closed bjesus closed 2 months ago
The HTML spec only allows <td>
elements inside tables. So a spec-compliant HTML parser ignores <td>
tags if they aren't inside a table. Try opening an HTML file like that in your browser, and inspecting it with the developer tools, and you'll probably get the same results.
I understand. Is this part of net/html
or cascadia
? I wonder if there's a way around it? Perhaps to parse my DOM as if it's just XML without deliberately removing elements because they're not HTML compliant?
It's part of net/html
. Depending on what you're trying to do, ParseFragment
might be what you need. Give it a table or tr node as context for what you want to parse.
Got it! I'll figure out some workaround then. Thank you very much for the help, and for awesome cascadia!
hey, I'm working on scrapper that uses cascadia.
I recently noticed that cascadia ignores
<td>
s if they appear without a wrapping<table>
element:why is that? is there any workaround for this? and are there other elements that are ignored in whatever situations? it seems like naming the element
foo
works completely fine, so it doesn't even have to be a real HTML element, but for some reasontd
s fail.