Proposal for Sequence, Map, and Array Decomposition

ChristianGruen commented 5 years ago

+1 for the proposal, looks good.

If tuple arrays are returned, I would be in favor of having the array syntax. This would make it easier to process sequences of arrays:

let ($array1, $array2) := ([1,2], [1,2])
return ...

I guess it shouldn’t apply to context item declarations? I’m not sure either if it makes sense for group by clauses.

rhdunn commented 5 years ago

I plan on defining a separate proposal for defining the type of a tuple sequence using the formal semantics style syntax -- as (xs:string, xs:string). This specific proposal is about sequence/array decomposition, which is separate from defining specific types for them -- i.e. either proposal can be accepted or rejected, and other additional proposals can be introduced like a proposal for defining a syntax for tuples based on maps.

I don't think that just because maps are available, syntax and extensions to support sequences and arrays should not be proposed. There are existing functions that make use of fixed length sequence values, and it may be easier to write functions that accept/return sequences/arrays than maps. I've listed some examples in my proposal (sincos, muldiv, points, and complex/rational numbers).

What I meant by "fixed length" sequence/array is where the size of the sequence/array does not change depending on context. For example, a 2D point will always have two items. A counter example would be a function that doubles the values of a sequence -- the length of the sequence here is variable (not fixed). I'm happy to use different terminology if the terms in this proposal are confusing.

Having said that, provided that there are a matching number of items in the sequence/array and variables being assigned to that, it should not matter.

The behaviour of assigning more variables than there are items in the sequence/array is defined in the proposal. The proposal should define the behaviour of assigning fewer variables than there are items in the sequence/array.

Providing a proposal for decomposing map based tuples (named tuples?) is something I would be interested in, but should be a separate proposal. As a rough idea, using a map-like syntax analogous to declaring maps, something like:

let { x: $x, y: $y as xs:double } := { x: 2.0, y: 3.0 }
return ...

adamretter commented 5 years ago

I am watching this with interest.

From my perspective we don't necessarily need tuple sequence types or tuple array types, rather I see the decomposition as just syntactic sugar.

I do like the idea of varying syntax for sequence and array, e.g.:

let ($x, $y) := (1.1, 2.2)

let [$x, $y] := [1.1, 2.2]

ChristianGruen commented 5 years ago

I agree with Adam’s point of view: I regard the extension of the syntax as a nice addition, but I could live without new tuple types.

rhdunn commented 5 years ago

To be clear, this specific proposal is not intending to add any new types. It is just about the decomposition of sequence and array values. I will update the proposal to make this clearer, and to add a section for decomposition of map values. I'll also rename the file and pull request to reflect these changes.

adamretter commented 5 years ago

@michaelhkay how do you feel about this PR just being syntactic sugar for the time being?

michaelhkay commented 5 years ago

I think the let () := and let [] := syntax is fine in principle.

Need to see the detailed semantics, e.g. for the case where the sequence/array has a different number of items from the number of variables.

There are also some syntax details to sort out:

adding "let" to the list of reserved function names (so that let() works).
let[$x, $y] := is another case that requires infinite lookahead, because let[$x, $y] is a valid XPath expression in its own right.
And it seems odd to resolve let() by reserving the function name while relying on lookahead to resolve let[].

The extension to maps/tuples doesn't work for me. The proposed syntax offers no benefits over let $x := $m?x, $y := $m?y return .... And in any case, the lookup syntax is sufficiently terse that I don't think you often need to bind variables to each component of a map/tuple in this way.

michaelhkay commented 5 years ago

It's not very pretty, but the following would parse more cleanly:

let $(x, y, z) := 1 to 3 return ...
let $[x, y, z] := array{1 to 3} return ...

and then perhaps map/tuple assignment could be

let ${x, y, z} := $map (binding named components of the tuple to variables of the same name)

adamretter commented 5 years ago

@michaelhkay I actually prefer your new syntax, less $'s to type

michaelhkay commented 5 years ago

Here's a suggestion for the semantics:

let $(a, b, c, ...) := EXPR return EXPR2

Amend the existing text:

If a let clause contains multiple variables, it is semantically equivalent to multiple let clauses, each containing a single variable. In particular:

(a) the clause

let $x := $expr1, $y := $expr2

is semantically equivalent to the following sequence of clauses:

let $x := $expr1
let $y := $expr2

(b) a sequence-decomposition let $(x, y, z, ...) := expr is equivalent to the following sequence of clauses:

let $x := expr[1]
let $y := expr[2]
let $z := expr[3]
...

(but the expression expr is only evaluated once)

If the sequence contains more items than the number of variables being bound, excess items are ignored. If the sequence contains fewer items than the number of variables being bound, excess variables are bound to an empty sequence.

(c) an array-decomposition let $[x, y, z, ...] := expr is equivalent to the following sequence of clauses

let $x := expr?1
let $y := expr?2
let $z := expr?3
...

(again, the expression expr is only evaluated once)

A type error [XPTY0004] is raised if the result of evaluating expr is not an array. A dynamic error is raised [FOAY0001] if the array contains fewer members than the number of variables being bound. If the array contains more members than the number of variables being bound then excess members are ignored.

(d) a map-decomposition let ${x, y, z} := expr is equivalent to the following sequence of clauses, in the case where x, y, and z are simple NCNames.

let $x := expr?x
let $y := expr?y
let $z := expr?z
...

(again, the expression expr is only evaluated once)

In the case where the variable name is a QName q, the equivalence is let $q := expr?(xs:QName("q")).

A type error [XPTY0004] is raised if the result of the expression is not a map. [[Assuming map-based tuples are introduced], a type error [XPTY0004] MAY be raised if the processor is able to establish that the static type of expr is a tuple type and that x (etc) is not one of the permitted key names for that tuple type.] In other cases, if the map does not contain an entry with the specified key, the corresponding variable is bound to an empty sequence. Unreferenced entries in the map are ignored.

ChristianGruen commented 5 years ago

This looks sound and solid.

Just one thing: Maybe we should not simply ignore returned values that cannot be bound but rather raise an error. Swallowed data may result in erroneous code.

Thinking more about this, maybe we should indeed find different solutions for sequences, arrays and maps, as the three data structures have different semantics anyway:

For sequences, it would feel more natural to me to bind all remaining items to the last variable, and never raise any errors.
For arrays, which have fairly strict boundary semantics in XQuery, I would expect an error if too few or too many items are returned.
For maps, unreferenced values could be ignored indeed, as the proposed solution reminds of a map lookup.

michaelhkay commented 5 years ago

Yes, I toyed with allowing let $(head, tail) := sequence which certainly has some nice use cases. This also means that let $(x) := expr means the same as let $x := expr which is logical. But should arrays work the same way? That's tricky because you want let $[x, y] := [1,2] to set $x=1, $x=2, not $x=1, $y=[2]. So you end up with an asymmetry between sequences and arrays. (You suggested requiring the number of variables to exactly match the array size. That feels a bit severe to me.)

ChristianGruen commented 5 years ago

Maybe we can think about use cases for which ignoring returned results is a better solution than raising an error? In other words, when does a user create results and expect parts of it to be ignored?

I have some sympathy for the assymetry between arrays and sequences, as the data structures are assymetric one way or the other (mostly because of the decision in XQ31 that a supplied array index must be larger than 0 and must not exceed the array size). Moreover, my impression is that arrays and sequences are used quite differently in practice. As arrays are not implicitly flattened, it would possibly come as a surprise if we created something like a tail result for arrays.

However, we could also provide explicit semantics for binding the tail of a sequence or even an array to the last variable. In Python, *var is used, in JavaScript it seems to be ... var. The function parameter syntax that we are discussing in parallel may be a better choice for us:

let $(head as xs:string, tail as xs:string*...) := ('head', 't', 'a', 'i', 'l')
return string-join($tail)

rhdunn commented 5 years ago

I have updated the proposal to address the above feedback, use the new syntax, and add a possible grammar. The revised text is viewable at https://github.com/expath/xpath-ng/blob/bc6cb1b579d688ba0088abfe0e73b7e633f964aa/sequence-map-array-decomposition.md (this includes the TypeDeclaration parsing fix pushed below).

expath / xpath-ng

Proposal for Sequence, Map, and Array Decomposition #8