Closed sharmi closed 6 years ago
Hello,
Thanks for the kind words! Regarding the Text()
behaviour, you're right, and this is something that comes up quite a bit. However, I think there is no one right way to solve this in goquery (see my comment on this topic in this closed PR: https://github.com/PuerkitoBio/goquery/pull/239#issuecomment-372057754).
I think a neat way to solve this would be to implement a pretty-printer/html formatter package. I started working on something, it's too early to say if it will be finished and released, but basically it would take an *html.Node
and format the tree based on a config (i.e. pretty-print HTML, minify HTML, print only the text with e.g. newlines for block elements and spaces for inline elements, etc.).
In the meantime, for a quick & simple solution that just inserts a space between text nodes, you can recursively process the Text-type *html.Node
s of the *goquery.Selection
, writing the text in a bytes.Buffer
(or strings.Builder
if you're on Go1.10) and adding a space after each write. Of course you may end up with multiple spaces if there was already space around the text, but if that's a problem you can trim the text prior to writing it.
Hope this helps, Martin
Hi, Thank you for this wonderful library as it is one of building blocks of the go-colly crawler, which I use.
I have been running into a corner case of late. Consider this piece of html.
<span>The Item Name<p>Some Content about the item.</span>
Though there is no space between the two text items, yet these are rendered with proper spacing in the browser because of the p tag.Unfortunately in the Text() function the output ends up merged without space like this
The Item NameSome Content about the item.
There are multiple cases where the spacing between text are not accounted for in code but rendered properly in the browser because of html tags or css.
Is it possible to have another function TextWithSeparator(sep string) which takes a separator as input and appends that separator after each node.Data? Text() function could be rewritten as a call to TextWithSeparator with empty string for input.
Text() = TextWithSeparator("")
I am a golang novice but I am willing to implement this if you agree. If there is a better way to handle this, I would like to know about it too.
thank you