Closed AwolDes closed 6 years ago
@AwolDes thanks for your feedback. This behavior comes from the goquery package and I found it intuitive - perhaps it isn't. Goquery returns the text of all the descendants of the matched element.
Maybe, we should add an e.ChildAttrs()
function which returns with a list of all matching element's attributes. What do you think?
@asciimoo I think it would be good to add the e.ChildAttrs()
function so that a HTML snippet like the following:
<div class="block-elem">
<div class="container">
<span class="span-class">Text</span>
</div>
<div class="container">
<span class="span-class">Text2</span>
</div>
</div>
Could be easily parsed with something like
c.OnHTML("div.block-elem", func(e *colly.HTMLElement) {
spanClass := e.ChildAttrs("span", "class")
})
To get all the span classes, instead of just the first match
Added in ac6587e
[Not really an issue] Hey mate
I've been using Colly for a small scraping project and I've come across a weird bit of behaviour.
The
e.ChildText()
function returns the text in all of the children as one string. However, usinge.ChildAttr()
only returns the first match. I read through the code incolly.go
and understand this is the intended behaviour, but I was wondering why you wouldn't want to return all child attributes?Loving this package though, it's been a lot of fun to use. Thank you for keeping it up to date! Cheers