Open rhdunn opened 6 years ago
I think it is useful to have both a named tuple type based on maps like you are suggesting and have implemented in Saxon, and a sequence based tuple type.
Regarding the examples I have mentioned in this proposal, some of them may be better represented as named tuples (e.g. the point and complex/rational numbers). It depends on the library implementor on what design choices they make, like the choice of sequence or array (both of which have different advantages and disadvantages).
In addition to being able to better type-check existing code that makes use of tuple sequences (like the MarkLogic math:modf
example), a vendor may choose to optimise these in assembly -- either specific instructions (like the sincos example) or help vectorise operations it knows can be packed into SSE and related types (such as a matrix library).
Having to specify these functions as named/map tuples would make it harder to convert to efficient assembly instructions.
Regarding the disadvantages you mention:
(a) I don't see being limited to single items is a disadvantage here -- the examples I gave in this proposal do not have variable numbers of items;
(b) Again, I don't see only being able to refer to the items by position instead of name a disadvantage here, especially when you allow the possibility of optimising the code the query runs;
(c) Again, I don't see being able to replace items as a disadvantage -- they are an immutable value, so the modified version will be a new value returned from the expression/function. For example:
declare function swap(
$pair as (xs:anyAtomicValue, xs:anyAtomicValue)
) as (xs:anyAtomicValue, xs:anyAtomicValue) {
$pair[2], $pair[1]
};
One other disadvantage of representing complex numbers (say) as sequences is that you then can't have a sequence of complex numbers. I agree that for very simple tuples such as complex numbers, positional access to components works just as well as named access. But (a) named access is more scaleable to more complex problems, and (b) if you're going to use positional access, then surely using arrays rather than sequences gives much more flexibility.
These are all design decisions that a developer writing XPath/XQuery code will need to make regardless of whether or not this proposal is accepted. This proposal does not change the behaviour of sequences with regards to flattening. It is designed to provide better static checking in two cases:
math:modf
example).In the current XPath/XQuery syntax, a user can already specify the common type of each item in a sequence and the cardinality (zero or one, zero or more, only one, one or more). This proposal is just expanding on that so an XPath/XQuery processor can provide better static checking.
I think that with the XQuery 1.0 / XPath 2.0 data model this would be a useful enhancement, but I think the use cases for it are obsoleted by maps and arrays.
There's an ambiguity in the grammar that needs to be sorted out. In 3.1, (xs:integer)
is a legal instance of ItemType, and under this proposal it becomes a legal instance of SequenceType. You could argue that isn't a problem because it has the same semantics whichever way you parse it, but we shouldn't have this kind of ambiguity in the EBNF. One solution is to require a ParenthesizedSequenceType to contain at least two ItemTypes, but that still leaves you needing indefinite lookahead to resolve the ambiguity (indefinite, because an ItemType can be arbitrarily long).
I am not opposed to this, but I wonder if we need to so far as adding explicit types - https://github.com/expath/xpath-ng/pull/8#issuecomment-432582158
@michaelhkay I'm open to having a different syntax that does not conflict. Maybe something like:
let $c as xs:double+ sequence of (xs:double, xs:double) := (1.0, 2.0)
I'm thinking also in terms of supporting another proposal for union item types:
let $json-doc as ( function(*) union of (map(*), array(*)) )? := load-json("test.json")
This would have the advantage of allowing sequence types in the union of
syntax, should be easy to parse without infinite lookahead, and not have the ItemType
restriction from ParenthesizedItemType
. I'll leave the discussion of the union of
syntax to a separate proposal, or is it worth including here?
It may also be possible to drop the initial sequence type:
let $c as sequence of (xs:double, xs:double) := (1.0, 2.0)
let $json-doc as union of (map(*), array(*))? := load-json("test.json")
Using your predicate type syntax, these would be something like:
declare function math:modf($v as xs:double) as xs:anyAtomicValue+[
count(.) = 2 and
.[1] instance of xs:integer and
.[2] instance of xs:double
] external;
let $c as xs:double+[ count(.) = 2 ] := (1.0, 2.0)
let $json-doc as function(*)[ . instance of map(*) or . instance of array(*) ]? :=
load-json("test.json")
For a union of sequence types it would be neat to reuse the SequenceTypeUnion syntax from typeswitch -- specifically
SequenceTypeUnion ::= SequenceType ("|" SequenceType)*
Because a SequenceType can appear in "treat as", it has to follow some syntactic constraints and this is best achieved by sticking to the "keyword(details)" convention. I think the term "union" is strongly associated with XSD union types, so I would go for the syntax "anyOf(" SequenceTypeUnion ")"
I like the idea of the anyOf
(and correspondingly for this, a sequenceOf
) syntax. One consideration is that currently there are no keyword tokens in any of the XPath or XQuery specifications (core, full text, updating, scripting) that separate words using capital letters -- instead, they use -
like in boundary-space
, attribute-or-self
, and empty-sequence
. As such, I would propose using any-of
and sequence-of
, i.e.:
"any-of" "(" SequenceTypeUnion ")"
I've updated this proposal based on feedback. The text for this version is at https://github.com/expath/xpath-ng/blob/4cac38e54646fe6486b85b2635168d681a5fba5a/restricted-sequences.md.
I'll add a separate proposal for the any-of
construct once I have updated my other proposals.
Thanks for this. Sorry if the comments are fairly lengthy, but it tends to be the case that the closer you get to agreement in principle, the more reviewers start noticing the edge cases.
Comments on latest version. Substantive comments where further technical work (rather than editorial work) is needed highlighted in bold.
sequence-of(xs:integer, xs:integer)
and
an sequence of two xs:untypedAtomic
values is supplied, are they converted? (I think they should be; similarly atomization and type promotion from integer to double).https://www.w3.org/Bugs/Public/show_bug.cgi?id=30302
which I found on reading the existing specification text. Perhaps the subtype(A,B) rules can be refactored as follows.
A sequence type constrains:
(a) the permitted cardinality of the sequence, as a (possibly infinite) set of non-negative integers. For empty-sequence(), the permitted cardinality is {0}, for xs:error the permitted cardinality is {}, for any other item type it depends on the occurrence indicator: {1}, {0..infinity}, {1..infinity}, or {0,1}; for a restricted sequence type it is the number of component item types.
(b) the required item type of the Nth item in the sequence. For empty-sequence() and xs:error this is xs:error (for all N), for a restricted sequence type it is the Nth component item type, for an item type with occurrence indicator, it is the item type (for all N).
subtype(A,B) is true if and only if both of the following are true:
(a) the permitted cardinality of A is a subset of the permitted cardinality of B
(b) Let Ai be the required item type of item i in A and let Bi be the required item type of i in B; for all i in the permitted cardinality of A, itemtype-subtype(Ai, Bi) is true.
Of course, because this involves infinite sets, this doesn't provide a computable algorithm for determining subtype(A, B), but I believe it provides an adequate mathematical specification.
Thanks for your detailed response. Regarding point 6, that was intended for things like:
let $x := (1, 2.0)
here $x
has no explicit type. An XQuery processor may infer the type of $x
as either item()*
(if it does not support inferring types), xs:decimal+
or similar (if it only supports general cardinality/type inherence), or sequence-of(xs:integer, xs:decimal)
(if it supports inference of restricted sequences).
Would it make sense to write up something like below as a separate proposal (so the new sequence type and subtype
judgement semantics are defined in a single place), or document the parts separately in the different proposals?
I wonder if it makes sense to revise how these rules are structured to account for all the new types. Maybe adapting and extending something like https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20IntelliJ%20Plugin%20Data%20Model.md#214-part-4-sequences, which I have been documenting to define how I am intending on approaching static analysis in my plugin. This would have the advantage of removing the subtype
table and making the rules more like the subtype-itemtype
rules.
Those rules define Sequences as Part 4 of the type system (section 2.7.4 of the XQuery and XPath Data Model) as a lower bound, upper bound, and item type, where the upper and lower bounds form the cardinality. I then define 3 type operations:
any-of
proposal). NOTE: I am using union type in that text to refer to the XML Schema union type.This proposal could then be implemented such that the upper bound of the resulting sequence type is the number of ItemType
s specified in the subtype, and the ItemType
of that sequence is the item type union of the specified ItemType
s. The lower bound would be 0 if ?
was used after sequence-of(...)
, otherwise it would be 0.
Yes, I understand what you're saying about inferring types, but it doesn't need to be said. It's no different from inferring a type for let $x := 3
, and the specification doesn't say anything about that; it doesn't need to.
Yes, in reviewing your other proposal I also came to the conclusion that it's worth refactoring the section on subtype judgements. Looking at your IntelliJ page, you've defined the cardinality constraint in terms of a range of integers, whereas I did it as a set of integers; the advantage of using a set of integers is that it allows the empty set, which is what xs:error gives you. With "restricted sequence types" we have the new concept that the required item type becomes dependent on the position of the item in the sequence, and I think the best way to handle that is to make it this dependency a general property of sequence types, and then make the dependency trivial for existing sequence types.
Note that once we establish the idea that a SequenceType determines a required item type for the Nth item in a matching sequence, we can use this concept in the function conversion rules.
Actually this isn't as easy as I first thought. Suppose the required type is sequence-of(xs:integer, xs:double)
and the supplied value is @readings
and the type annotation of @readings
is list(xs:positiveInteger)
. To make this work, we first have to atomize the node @readings, giving a list of two xs:positiveInteger
values, then we have to promote one of these to xs:double
.
But what are the conditions under which we atomize, given that the required type could also be sequence-of(xs:integer, attribute())
? One approach is to say that we only atomize if all the component item types are (generalized) atomic. Messy, but probably workable.
@michaelhkay I have updated the proposal to address your suggestions. See https://github.com/expath/xpath-ng/blob/b9800b9882ded23d812641ef82209cc59861d6cc/restricted-sequences.md for the non-diff version of the latest changes (version 3 of the proposal).
I have updated the subtype(A,B)
judgement to use our combined terminology as we were thinking more or less along the same lines, and use your more precise cardinality notation using sets. The main difference with my XQuery IntelliJ plugin text is to make the upper/lower bounds optional to support xs:error (i.e. representing the empty set), and to allow any value for the upper bound (allowing the restricted sequences in this proposal).
I haven't looked at your last comment regarding attributes and atomization yet.
This adds the tuple sequence type concept to
SequenceType
.