Open bitparity opened 2 years ago
Two differences:
descendant-or-self::
is correct, because //
can also find the root node, not only descendants of it;node()
is technically correct, but irrelevant in practice—among other things it allows the XPath to match nodes other than elements, but I don't think attributes, text nodes, processing instructions or comments will ever have child nodes—at least not in the kind of XML we're likely to need to work with.I think that the two definitions are functionally equivalent though. Can you find an example of an XPath match for /descendant-or-self::node()/element
that gives different results or counts from /descendant::element
?
So I've managed to draw up a test example illustrating the issue.
Sample xml:
<body>
<p lang="la" id="p-1">
<s id="s-1">sent 1</s>
<s id="s-2">sent 2</s>
</p>
<p lang="en" id="p-2">para 2</p>
</body>
The goal is to find all elements that have an @id
attribute where the parent or self element has a @lang
attribute.
The below two xpath searches are identical, as per the aforementioned definitions of //
in the XQH book and the workshop youtube video. However, they don't seem to note the <p>
element which has both @lang
and @id
attributes:
.//*[./@lang = "la"]/descendant-or-self::node()/*[./@id != ""]
.//*[./@lang = "la"]//*[./@id != ""]
returns
<s id="s-1">sent 1</s>
<s id="s-2">sent 2</s>
The below xpath search DOES note the <p>
element with both @lang
and @id
attributes, raising the point of the dissimilarity between this xpath and the above two.
.//*[./@lang = "la"]/descendant-or-self::*[./@id != ""]
returns
<p lang="la" id="p-1">
<s id="s-1">sent 1</s>
<s id="s-2">sent 2</s>
</p>
<s id="s-1">sent 1</s>
<s id="s-2">sent 2</s>
I'm sure most of the time, this is just theoretical, but this is a specific instance where it affected one of my queries. I agree thinking of //
as descendant::
is easier, which is why i was puzzled by the XQH book's full definition of /descendant-or-self::node()/
, which appears to be both true AND confusing (since it apparently cancels the self part out).
Interestingly, this looks like it has just proved that when you want descendant-or-self::*
you can't just use .//*
, which in practice means descendant::*
So while I have no doubt the XQH definition is correct, it doesn't look like ours is wrong after all…
I think I realized what the problem was, from p.53 of the Walmsley XQuery book (which also gave the same node definition for //
).
Whenever you type the name of an element after a /
, it is technically child::element
.
So //element
is technically /descendent-or-self::node()/child::element
, which forces the search for <element>
down to the descendant but not the self of the context, making it different from /descendent-or-self::element
.
I think anyways.
So at 11:42 of "Advanced Digital Editing: Introduction to XPath II", it says
./descendent::head
is the same as//head
which I do find to be the case. But in the book XQuery for Humanists (p.62) it saysA double slash (//) stands for /descendant-or-self::node()/
.I know from having debugged a problematic query that
/descendant-or-self::node()/head
is not the same as/descendent-or-self::head
(particularly when it comes to looking for attributes within/head
which I think are technically siblings, not descendents), but I don't know why, especially since functionally it seems to just make // equivalent to, as mentioned in the video,./descendent::head
.Can you possibly explain the difference between the two definitions (yours and XQH's) for // ?