Open Jared-Sprague opened 7 months ago
I'm just a user, not a maintainer here, but I can at least answer and say that there isn't a good way to achieve this.
Everything in lol-html revolves around it being a streaming HTML transformer, and as a result it doesn't hold on to that stream for very long as it passes through. set_inner_content
is trivial, since it can just ignore its stream for a while and use the provided content, and likewise remove_and_keep_content
doesn't block the stream since it just removes some of the values passing through.
Anything like get_inner_content
though would require buffering an arbitrary amount of data from the stream, which would inherently stop it from streaming. If you called get_inner_content
on the html element itself, lol-html would cease to be a streaming parser and would have to store the whole document.
There are some solutions for inner text, by adding a text!
handler and appending the text to a buffer yourself, but there isn't an equivalent for HTML as far as I'm aware. You would have to create an element!
handler and re-serialize the tag/attributes yourself, alongside the text!
handler for the text nodes.
The only viable path I can think of would be to use the el.prepend()
and el.append()
methods to insert some delimiters into the stream, then processing that yourself afterward to extract the innerHTML between the delimiters.
For anything more involved, I'd look at using kuchiki(ki) instead.
Related #40 #78
Hello!
I'm trying to use lol_html as an HTML parser to extract all content within an element, it's inner HTML. However I haven't figured out the right methods for this yet. Here is what I want to do, given the following HTML snippet:
I want to extract all the content within the element matching css selector
article#main-content
and store it in a String namedcontent
after this runs the value ofcontent
will be equal to:I've what would be perfect would be a method on the element called
get_inner_content()
there seems to be all the methods for manipulating the inner content but not actually getting it such as:set_inner_content
remove_and_keep_content
Maybe I'm thinking about this wrong, any help just getting the inner content of an element would be so hepful!