Unfortunately I don't think position would help with that example since there is currently no way to select bare text nodes. One of the assumptions scalpel makes is that anything you'd want to select is between <tags>.
It's also not immediately clear how to expose bare text selection in a way that would be backwards compatible. My current thinking is to create an additional value for SelectNode for text nodes. That would let you do something like the following to grab the second text node under an <h2>:
chroot "h2" $
chroots textSelector $ do
p <- position
guard (p == 1)
text textSelector
With an API like the one proposed in #21 you could do something even more snazzy like: text ("h2" /// textSelector) to grab just the text nodes that are direct children of the <h2>.
The potential issue here though is that allowing selection of bare text nodes would create a breaking change in the behavior of anySelector. For example, scrapeStringLike "<a>text</a>" $ texts anySelector currently returns Just ["text"] but if we treated each text node as selectable then it would return Just ["text", "text"].
This might be an OK breaking change though since I think the most useful use of anySelector is to select the current root node in a chroot block like the examples in the read me.
Unfortunately I don't think
position
would help with that example since there is currently no way to select bare text nodes. One of the assumptions scalpel makes is that anything you'd want to select is between<tags>
.It's also not immediately clear how to expose bare text selection in a way that would be backwards compatible. My current thinking is to create an additional value for SelectNode for text nodes. That would let you do something like the following to grab the second text node under an
<h2>
:With an API like the one proposed in #21 you could do something even more snazzy like:
text ("h2" /// textSelector)
to grab just the text nodes that are direct children of the<h2>
.The potential issue here though is that allowing selection of bare text nodes would create a breaking change in the behavior of
anySelector
. For example,scrapeStringLike "<a>text</a>" $ texts anySelector
currently returnsJust ["text"]
but if we treated each text node as selectable then it would returnJust ["text", "text"]
.This might be an OK breaking change though since I think the most useful use of
anySelector
is to select the current root node in achroot
block like the examples in the read me.Originally posted by @fimad in https://github.com/fimad/scalpel/issues/48#issuecomment-462009620