Closed WithoutPants closed 4 years ago
htmlquery
package will auto remove duplicated text since #20 .
htmlquery
package will auto remove duplicated text since #20 .
Except that it doesn't consistently.
Take the source string <sup>24</sup><sup>24</sup><sup>24</sup>
Perform a QueryAll
using //sup/text()
.
Expected output would be:
24
24
24
Actual output is the same - it works as intended.
Now perform QueryAll
using //sup/text()|/div/text()
. The only difference here is the extra option. The expected output should be the same.
Actual output for this one is:
24
So at the very least there is inconsistency between the behaviour of the two queries, and my expectation would be that it is the second case that is performing incorrectly.
This is a bug, I had fixed at https://github.com/antchfx/xpath/commit/a015dcb7b81e05da303d5e5a91ef80688d3b6515 Thanks for your feedback.
Thank you. I can confirm that this has fixed the issue.
When are we likely to see a new release with this fix?
Using a modified version of the html sample in
query_test.go
:(added extra
Tokyo
li
element)Using xpath
"//nav/ul/li/a/text()|//div/ul/li/a/text()"
(the second condition after the|
is just an example and not meant to find anything additional), I would expect four elements to be returned -London
,Paris
,Tokyo
,Tokyo
. Instead, it returns three, dropping the secondTokyo
element. Removing the|
character and second condition returns four elements. Similarly, if I remove thetext()
part, then it also returns four elements.Small unit test illustrating the issue:
Gives the following results: