Proposal (3 options) for concise inline function syntax

michaelhkay commented 5 years ago

Proposes various options for simplified syntax for declaring inline functions, and makes recommendations

rhdunn commented 5 years ago

The text for this is a copy of https://github.com/expath/xpath-ng/pull/3.

michaelhkay commented 5 years ago

Hopefully corrected now. I'm still getting used to this way of working.

ChristianGruen commented 5 years ago

I would clearly be in favor of having options of this proposal accepted. My thoughts on…

Option 3

If we started from scratch, this would probably be the clear winner: Users can name their variables as they like, and the syntax is well-established.

I expected that the necessity of the unbounded look-ahead would make it a clear no-go. It’s good to have your assessment; maybe I’ll give it a try and gain some experience.

Option 2

One criticism on this approach could be that it is not as general-purpose as some users may believe at first sight. If the last parameter of a function is not referenced, the code cannot be refactored:

Example:

declare function local:inc-filter($numbers, $filter) {
  for $n in $numbers
  let $i := $n + 1
  where $filter($i)
  return $i
};
local:inc-filter((0 to 3), function($n) { boolean($n mod 2) }),
local:inc-filter((0 to 3), function($n) { true() })

Rewritten function calls:

local:inc-filter((0 to 3), -> boolean($1 mod 2),
local:inc-filter((0 to 3), function($n) { true() }) (: no rewriting possible :)

On the other hand, in some cases, this may indicate to a user that there is a better way of writing the original code.

Apart from this restriction, I really like the option: It’s even more concise than Option 3, and…

Option 1

…I would appreciate if we could use the same syntax for focus functions.

adamretter commented 5 years ago

In general I support the idea of a simpler syntax for inline functions.

I am still chewing through the detail of the proposal but I have a comments and questions:

Focus Functions

Personally I don't like the fn{ EXPR } approach, only because of the existing use of the fn prefix association with the namespace for functions in the standard library. I worry it could lead to user confusion. However something like f{ EXPR } would be fine for me.
I really like -> EXPR. Very neat :-)
When I read the syntax -> EXPR, the narrator in my head reads it as "apply". Perhaps that is a better name than "Focus Functions"?
Are there any rules for empty sequences? Is an empty sequence on the LHS passed to the function on the RHS? If not, then this actually seems to me to be the classic function programming map operator as implemented in Lisp, Haskell, Scala, etc. If so, we should maybe consider if what we actually want here is a map operator, if not perhaps we additionally want a map operator.
Like other constructs that rely on the context item, it breaks down if you want to do more complex things like joins. It's a simple syntax for simple cases.

How about if we also implicitly bound the context item to some variable or symbol ($$), this would allow the context item to be used in both the implicit manner and explicitly through $$. Otherwise a form which allows it to be explicitly bound like $x -> ($x + 1), looks like your Arrow Syntax with Declared Parameters proposal anyway.

Short Inline Functions

I am wondering in terms of both parsing and static analysis about the process of establishing the function signature of the inline function? As there is no parameter list, it seems that I would need to either:
1. parse the body of the inline function, which could be substantial.
2. examine each pass-by-reference or application of the inline function and infer the parameters from that. Either way seems laborious. @michaelhkay I guess there is an obvious simpler approach which I have missed?

Syntax with Declared Parameters

Likely I haven't had enough coffee yet, but why do we need unbounded lookahead here? Is it because we are using the same syntax as a sequence constructor for the parameter list?
So this seems like the obvious syntax to me. I like it very much, but that is probably because I am familiar with it.

I did a quick look at the languages I use the most (and a couple that I respect) to see how their syntax represents such things:

Java

parameter -> expression
(parameters) -> expression
```
(parameters) -> {
    body
}
```

Scala

parameter => expression
(parameters) => expression
```
(parameters) => {
    body
}
```

C++

1.

    [captures](parameters) -> returnType {
        body
    }

Slightly different to others! You have to specify any captures, these are the variables that are captured and used inside the closure. I suspect this offers better support for strong static analysis during compilation. The returnType may be omitted.

Haskell

\parameters -> expression

A nice piece of trivia, the \ was chosen as it is meant to remind users of the greek lambda character λ.

Lisp

lambda (parameters) (body)

Rust

|parameters| expression
|parameters| {body}

I understand that you can also place the characters -> after the |parameters|.

michaelhkay commented 5 years ago

Thanks for the detailed comments. Some observations:

reading -> as "apply" seems like a misreading. It's all about declaring the function, not about applying it. Some of your other comments, e.g. regarding empty sequences, also seem to be about function application rather than declaration.
Inferring the signature of a "short inline function". I don't think this is particularly difficult. You read "->" so you know you're expecting an expression in which $N parameter references are meaningful; you parse the expression with that knowledge, perhaps using a variant of the standard code for resolving variable references; when you've parsed that expression, you know what parameter references are present; you find the maximum value of $N, and then you have a function signature of function($_1 as item()*, $_2 as item()*, .... $_N as item()*) as item()*.
My inclination is that any syntax (like ($x, $y) -> ($x+$y)) that involves declaring the argument names probably isn't a big enough improvement on what we have today to be worth including. If we did it, this is probably the syntax I would go for, despite the problems of arbitrary look-ahead in the parsing. But the lookahead could cause problems for some parser technologies. (I've observed that syntax-directed editors for Java struggle with it too.)

While forming these ideas I did consider mechanisms from other languages (though my survey wasn't as wide as yours). The idea of "focus functions" is influenced by the ability to write _+1 as a function in Scala, which seems neat in the case of single-argument functions, though I really don't like the idea of _+_ as a 2-argument function (I've never seen a satisfactory explanation of the rules). I think $1+$2 is much clearer for that case.

Another thing I looked at is dropping the "->" prefix, so we recognize $1+1 and $1+$2 as functions merely because of the presence of the parameter references $1 and $2. But I don't think that works, how do we know whether ($1, $1) is a function that returns its first argument repeated twice, rather than an expression that returns a sequence of two functions? The only way to do this seems to be context-sensitivity, whereby we recognize an expression as a function by virtue of the fact that it appears in a context where a function is required. That's very alien to the XQuery/XPath tradition so I didn't pursue it further.

adamretter commented 5 years ago

reading -> as "apply" seems like a misreading. It's all about declaring the function, not about applying it. Some of your other comments, e.g. regarding empty sequences, also seem to be about function application rather than declaration.

Hmm, I can see your point here. I don't think I am communicating my ideas very well on this issue.

Given the example:

filter(//employee, ->@salary gt 20000)

I can see that the thing on the RHS is going to be evaluated once for each thing on the LHS. Perhaps with is a better verb than apply.

It also reminds very much of the Simple Map Operator... How bad would a syntax like filter(//employee, !@salary gt 20000) be? ...runs and ducks ;-)

benibela commented 5 years ago

Another language is

Kotlin

{ parameters -> expression }

{ expression }

If the parameters are omitted, it uses an implicit default parameter it ->, which we probably would use . for. And you can write the lambda after the function, i.e., function({...}), function() {...} or function {...} is the same.

That syntax would work well in XQuery (unless {...} would become an abbreviation for map {...}. The map prefix is really pointless). The above example would be filter(//employee, { @salary gt 20000 }) or filter(//employee, { foo -> foo/@salary gt 20000 }) or perhaps //employee => filter { foo -> foo/@salary gt 20000 }

adamretter commented 5 years ago

Lately I have been writing a lot of nested anonymous functions for a research paper.

I realised that there may be a shortcoming with the "Short Inline Functions" form. How would we handle nested inline functions?

An example which is a function that both takes a function and returns a function. I wonder what this would look like when rewritten as "Short Inline Functions":

declare function other:something($f as function(xs:string, xs:integer) as function(xs:QName, xs:QName) as xs:string) as xs:integer external;

other:something(function($a, $b) {
  function($c, $d) {
     "place holder"
  }
});

michaelhkay commented 5 years ago

We have a syntax that works for big complicated functions, we don't have one that works for trivial inline functions like filter($seq, {a gt 2}).

The syntax for the trivial inline functions needs to be unambiguous, and it needs to be very usable for little functions, but it doesn't need to be have high usability for complex cases.

The "." variable itself is optimized for simple cases; with only one variable, you can't do joins. That was a design choice that kept XPath simple and accounted for a lot of its success. Optimize for simple and common cases.

Michael Kay Saxonica

On 16 Jan 2019, at 03:03, Adam Retter notifications@github.com wrote:

Lately I have been writing a lot of nested anonymous functions for a research paper.

I realised that there may be a shortcoming with the "Short Inline Functions" form. How would we handle nested inline functions?

An example which is a function that both takes a function and returns a function. I wonder what this would look like when rewritten as "Short Inline Functions":

declare function other:something($f as function(xs:string, xs:integer) as function(xs:QName, xs:QName) as xs:string) as xs:integer external;

other:something(function($a, $b) { function($c, $d) { "place holder" } }); — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/expath/xpath-ng/pull/5#issuecomment-454634424, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSIIt_FRDLLC_GprfbiMz_FTHPeJT39ks5vDpZygaJpZM4Xd6nV.

ChristianGruen commented 5 years ago

I still like the 2 proposed options a lot.

Option 3 would surely be the most general one. It seems we can handle it in our implementation, but I don’t know if that will be true for everyone else who might be interested in supporting this feature in the future?

adamretter commented 5 years ago

@michaelhkay Okay understood, and I agree. I wasn't trying to criticise, rather I was trying to understand if there was a clever way it could be achieved with the "Short Inline Functions" form, that I hadn't understood... I guess not.

adamretter commented 5 years ago

@ChristianGruen I would certainly be interested in whatever options we can agree on :-)

rhdunn commented 5 years ago

I like the idea of using the Kotlin-style syntax as a variant of option 3, replacing it with .. This resolves the issue of unbounded lookup, as the concise function is initiated by a {. This is also unambiguous in the current syntax as {...} is only allowed in direct element content, and other uses require a type indicator (such as array { ... }).

Simple case (both equivalent):

{ 1 }
{ -> 1 }

Implicit context item -- single parameter function:

sort(employee, { @salary })
{ . * 2 }
{ -> . * 2 }

Explicit (named) context item -- single parameter function:

sort(employee, { $e -> @salary })
{ $n -> $n * 2 }

Multi-parameter function:

{ $k, $v -> $k }
{ ($k, $v) -> $k }
{ $k as xs:string, $v -> $k }

Combined with sequence, map, and array decomposition:

{ ${key, value} -> $k }
{ $[key, value] -> $k }
{ $(key, value) -> $k }

Multiple parameters combined with sequence, map, and array decomposition:

{ $x, ${key, value} -> $k }
{ ($x, $(key, value)) -> $k }

michaelhkay commented 5 years ago

There is of course competition for the scarce resource of bare-brace expressions. Other contenders include:

map syntax, a la Javascript
execution blocks in XQuery scripting

and there's an argument against using it at all, because of confusion with the use of braces in AVTs

It's because of this competition that we've always prefixed "{" with something else, e.g. in EQName syntax and in map syntax.

Michael Kay Saxonica

On 16 Jan 2019, at 12:12, Reece H. Dunn notifications@github.com wrote:

I like the idea of using the Kotlin-style syntax as a variant of option 3, replacing it with .. This resolves the issue of unbounded lookup, as the concise function is initiated by a {. This is also unambiguous in the current syntax as {...} is only allowed in direct element content, and other uses require a type indicator (such as array { ... }).

Simple case (both equivalent):

{ 1 } { -> 1 } Implicit context item -- single parameter function:

sort(employee, { @salary }) { . 2 } { -> . 2 } Explicit (named) context item -- single parameter function:

sort(employee, { $e -> @salary }) { $n -> $n * 2 } Multi-parameter function:

{ $k, $v -> $k } { ($k, $v) -> $k } { $k as xs:string, $v -> $k } Combined with sequence, map, and array decomposition:

{ ${key, value} -> $k } { $[key, value] -> $k } { $(key, value) -> $k } Multiple parameters combined with sequence, map, and array decomposition:

{ $x, ${key, value} -> $k } { ($x, $(key, value)) -> $k } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/expath/xpath-ng/pull/5#issuecomment-454757486, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSIImHWNpftWhi1Ica9ll4gPMQgJU3cks5vDxcygaJpZM4Xd6nV.

expath / xpath-ng