fimad / scalpel

A high level web scraping library for Haskell.
Apache License 2.0
323 stars 43 forks source link

(#28) Fixes bug with nested selectors #36

Closed fimad closed 8 years ago

fimad commented 8 years ago

This commit fixes two bugs that occur when using the // operator:

1) The first is that using any of the pluralized scrapers with a selector created from one or more // operators will potentially return duplicate tags. This occurs if there are multiple ways in which a given tag may satisfy the selector.

2) The // operator does not force a decent from the current context. For example, if one attempts to match a nested div <div id=a><div id=b></div></div> with the selector "div" // "div" the outer div will be matched.

In order to fix this bug without regressing on performance the entire selection engine had to be re-written. The engine now features linear time matching for tags as opposed to the probably exponential prior implementation.