Closed mooreryan closed 1 year ago
Just for reference ...I tested it out with Nokogiri (a ruby xml/html parser)
require 'nokogiri'
data = File.open ARGV.first
xml_doc = Nokogiri::XML data
xml_doc.css('a > bb').each do |x|
puts x[:which]
end
and got the first and second nodes selected as expected.
This is almost certainly because you are having Lambda Soup read the input as HTML. Part of the HTML parser is to do error recovery (it's specified in the spec).
The <a>
tag can have nested <b>
tags, but a <bb>
tag is something that triggers error recovery and gets rotated outside the <a>
tag, per the spec, changing the structure of the loaded DOM.
For the example, you might be able to replace <a>
with something else. But you probably need to parse the input as XML. See here.
Ahhh okay I see the problem, makes sense. Thanks!!
Not sure if this is a bug or me doing something weird.
First here is some lambdasoup code. It reads the soup from standard in, and then there's a tiny function to print out the name and an attribute of the nodes given a selector. Finally a driver at the bottom that uses two different selectors.
Given this xml file
running that code would give this:
Looks good: when using the css child combinator (
>
) I don't get thefirst-of-first
b
node as it is under ac
node.Now, the weird thing is, if I change the
b
nodes tobb
(or anything with more than one character), and then adjust the selector accordingly, I get this:Only the first
bb
node is printed and not the second one.