causal-agent / scraper

HTML parsing and querying with CSS selectors
https://docs.rs/scraper
ISC License
1.79k stars 98 forks source link

Make element traversal more convenient #157

Closed Caellian closed 8 months ago

Caellian commented 8 months ago

I don't like how scraper requires me to check whether each node is an element while I'm traversing the tree when all I want to do is traverse elements. 70% of the time tree traversion occurs on Element, and there's very few convenience methods for doing so.

For instance, a very common case where you'd like to select the first element child of an element requires:

head_row.children().filter(|it| it.value().is_element()).next()

because head_row.first_child() might be a Text node.

Suggestion

I suggest adding convenience methods for traversal which would significantly reduce this boilerplate:

NodeRef<'a, T>::elements(&self) -> impl Iterator<Item = ElementRef<'a>> + '_ {
  self.children().filter_map(ElementRef::wrap)
}

#[inline]
NodeRef<'a, T>::first_child_element(&self) -> Option<ElementRef<'a>> {
  self.elements().next()
}
// Same as for NodeRef::first_child_element except reversed
NodeRef<'a, T>::last_child_element(&self, f: F) { ... }

<F: Fn(&Element) -> bool> NodeRef<'a, T>::filter_descendant(&self, f: F) -> impl Iterator<Item = NodeRef<'a>> + '_ {
  self.descendants().filter(|it| it.value().as_element().map(f).unwrap_or_default())
}
// Same as for NodeRef::filter_descendant except it uses NodeRef::children()
NodeRef<'a, T>::filter_children(&self, f: F) { ... }

// Same as for NodeRef...
ElementRef<'_>::elements() { ... }
ElementRef<'_>::first_child_element() { ... }
// ... the rest ...
adamreichold commented 8 months ago

I'd like to add that if we do this, I would argue that they should be functions (because they don't need to macros) and they should return iterators over ElementRef instead of NodeRef, i.e. their implementation should use ElementRef::wrap, e.g.

self.children().filter_map(ElementRef::wrap)
Caellian commented 8 months ago

their implementation should use ElementRef::wrap

Thanks, that's better. Updated the issue.