JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.32k stars 5.45k forks source link

headless anonymous function (->) syntax #38713

Open rapus95 opened 3 years ago

rapus95 commented 3 years ago

Edit 3: Still sold on the original simple idea with optional extensions, as in the previous edits, or combined with https://github.com/JuliaLang/julia/pull/53946. Previous edits highlight and explore different ideas to stretch into, all adding their own value to different parts of the ecosystem. For a glimpse on the simple approach, have a look at the description after all the edited cross-references.

Edit 2: again newly progressed state of this proposal: https://github.com/JuliaLang/julia/issues/38713#issuecomment-1436118670

Edit 1: current state of this proposal: https://github.com/JuliaLang/julia/issues/38713#issuecomment-1188977419

Since https://github.com/JuliaLang/julia/pull/24990 stalls on the question of what the right amount of tight capturing is

Idea

I want to propose a headless -> variant which has the same scoping mechanics as (args...)-> but automatically collects all not-yet-captured underscores into an argument list. EDIT: Nesting will follow the same rules as variable shadowing, that is, the underscore binds to the tightest headless -> it can find.

Before	After
`lfold((x,y)->x+2y, A)`	`lfold(->_+2_,A)`
`lfold((x,y)->sin(x)-cos(y), A)`	`lfold(->sin(_)-cos(_), A)`
`map(x->5x+2, A)`	`map(->5_+2,A)`
`map(x->f(x.a), A)`	`map(->f(_.a),A)`

Advantage(s)

In small anonymous functions underscores as variables can increase the readability since they stand out a lot more than ordinary letters. For multiple argument cases like anonymous functions for reduce/lfold it can even save a decent amount of characters. Overall it reads very intuitively as start here and whatever arguments you get, just drop them into the slots from left to right

      -> ---| -----|
            V      V
lfold(->sin(_)-cos(_), A)

Sure, some more complex options like reordering ((x,y)->(y,x)), ellipsing ((x...)->x) and probably some other cases won't be possible but if everything would be possible in the headless variant we wouldn't have introduced the head in the first place.

Feasibility

1) Both, a leading -> and an _ as the right hand side (value side) error on 1.5 so that shouldn't be breaking. 2) Since it uses the well-defined scoping of the ordinary anonymous functions it should be easy to 2a) switch between both variants mentally 2b) reuse most of the current parser code and just extend it to collect/replace underscores

Compatibility with #24990

It shouldn't clash with the result of #24990 because that focuses more on ~tight single argument~ very tight argument cases. And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

xitology commented 3 years ago

So far that looks good, but it isn't compatible with broadcasting which is a showstopper IMO

@rapus95 What compatibility issues are you concerned about? You could use . to build a function, and then use . to broadcast it over an array, for example:

julia> @show (it .> 5).(1:10)
(it .> 5).(1:10) = Bool[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
julia> @show (it[1] .+ it[2]).(1:10, 11:20)
(it[1] .+ it[2]).(1:10, 11:20) = [12, 14, 16, 18, 20, 22, 24, 26, 28, 30]

clarkevans commented 3 years ago

In ->g(_, a, h(_), _, f) as opposed to (x,y,z)->g(x,a,h(y),z,f) I, to be honest, find the first case way more readable.

You've chosen to bind _ to different inputs over scope of the expression -- I don't find it to be very intuitive at all. Moreover, it is quite special-purpose, where the arguments on the RHS happen to exactly correspond to what is on the LHS. That's a rather specific alignment of the stars, no?

Regarding maintainability, I don't see where that would need any maintenance.

I'm talking about maintenance in the regular sense -- where someone unfamilar with the program, and perhaps even the programming language (Julia is used often by data scientists and other accidental programmers) have to look at a piece of code and make sure the logic is doing what is expected, perhaps making adjustments to comply with the realities of an ever-changing world.

If you want to use _, perhaps one could use it to represent the entire input. So, for simple lambdas, it represents the 1st argument since there is only one argument.

x -> x^2 => -> _^2

but... let's say the input is a tuple....

(x,y) -> x^2 => ->_₁^2 (or perhaps just -> ₁^2 or -> _[1]^2)...

with case of tuple input, -> _^2 would be something like... ERROR: MethodError: no method matching ^(::Tuple{...}, Int64)

Regardless, I dislike the whole idea.

rapus95 commented 3 years ago

@xitology it wouldn't be compatible with #24990 _.+5 is where both proposals collide.

Also, I don't like that the meaning of the dot depends on subsequent code. The whole subjective beauty of my proposal was that it's enough to scan once from left to right.

mod.(_,10) is read as that it allows to use a collection as argument. I don't like that a placeholder which is meant to be used as such transforms the meaning of the surrounding code. It feels like an anti-feature somehow.

On the other hand I feel like there are 2 possible implementations for that proposal: a) syntactical. In that case, if we have to create new parsing behavior, why restrict broadcasting behavior by stealing its syntax? b) by using certain objects which automatically propagate through the extendable broadcasting machinery. In that case I don't see the conflict with the original proposal except for using an underscore. in this case we'd have to assign a proper object to the underscore which we intentionally prohibited.

rapus95 commented 3 years ago

@clarkevans

You've chosen to bind _ to different inputs over scope of the expression

yes, that was the exact purpose. Don't think of _ as a variable (since you can use variables to reference certain objects, which we disallowed for the underscore) but as a slot. almost physically. It's meant to be read as slots into which you drop the arguments from left to right. It's like having a stack of objects and a group of people, you can't give everyone the same object but you assign objects to them in the order in which you take them off your argument list.

Moreover, it is quite special-purpose, where the arguments on the RHS happen to exactly correspond to what is on the LHS. That's a rather specific alignment of the stars, no?

Since I'm usually in charge of supplying the arguments to my functions myself I don't find that very special-purpose.

curried=->f(_,g,_,2)
r=curried(3,h)
return curried(8,r)

where someone unfamilar with the program, and perhaps even the programming language (Julia is used often by data scientists and other accidental programmers) have to look at a piece of code and make sure the logic is doing what is expected

that also could be used as an argument to forbid all types of reflection and metaprogramming. Heck, to not to come up with anything more complex than natural language at all. And I hope that noone relies on a language foreigner for proof reading.

I feel like many here are arguing from the point of "we must not allow operator overloading because code can become non-trivial". While I argue from the point of "some cases of lambda usage (i. e. -> usage) could get a more concise feeling if we'd deduplicate argument occurence". That's also the reason why I'm still only proposing the very basic syntax which consists of a single -> and arbitrary _ where each underscore represents its own argument in order. I'm proposing a somewhat intuitive short syntax for -> (that's also a reason for why I am not sure why one would use the broadcasting dot in place of something that already has the known and wanted meaning of constructing functions)

bramtayl commented 3 years ago

Because I've grown to quite like it, here is all of the anonymous functions I sampled from Base with the bare subscript syntax. Apologies for any mistakes.

-> ₂ * (₁ - ₃)
-> prepend!(₂, ₁)
-> ₁.x = f(₂...)
-> itr.results[₁] = itr.f(₂...)
-> ₁ | ~₂
-> ~₁ | ₂
-> ~xor(₁, ₂)
-> ~₁ & ₂
-> ₁ & ~₂
-> ₁ .+ ₂
-> Dict{₁, ₂}
-> :($₁ || ($expr == $₂))
-> ₂.status == "503"
-> isa(₂, IOError)
-> IdDict{₁, ₂}
-> ₁:₂
-> ₁:₂:₃
-> first(₁)-first(₂)
-> isless(₁[2], ₂[2])
-> lt(by(₁), by(₂))
-> (₂; print(₁, idx > 0 ? lpad(cst[₃], nd+1) : " "^(nd+1), " "); return "")
-> (print(rpad(string(₁) * "  ", $maxlen + 3, "─")); Base.time_print(₂ * 10^9); println())
-> ₁(₂)
-> wait(Threads.@spawn ₁(₂))
-> f(₂) ? (₁..., ₂) : ₁
-> WeakKeyDict{₁, ₂}
-> ₂; true
-> printer(₁, ₂, ₃ > 0 ? code.codelocs[₃] : typemin(Int32))
-> _findin(I[₁], ₁ < n ? (1:sz[₁]) : (1:s)
-> I[₁][_findin(I[₁], ₁ < n ? (1:sz[₁]) : (1:s))]
-> firstindex(A,₁):firstindex(A,₁)-1+@inbounds(halfsz[₁])
-> ₁ == dims[1] ? (mid:mid) : (firstindex(A,₁):lastindex(A,₁))
-> idxs[₁]==first(tailinds[₁])
-> string("args_tuple: ", ₁, ", element_val: ", ₁[1], ", task: ", tskoid()), input)
-> (batch_refs[₁[1]].x = ₁[2]), enumerate(results))
-> Symbol(₁[1]) => ₁[2]
-> !₁.from_c && ₁.func === :eval
-> convert(fieldtype(T, ₁), x[₁])
-> (₁.filename, ₁.mtime)
-> !(₁ === empty_sym || '#' in string(₁))
-> ₁ == dims ? UnitRange(1, last(r[₁]) - 1) : UnitRange(r[₁])
-> ₁ == dims ? UnitRange(2, last(r[₁])) : UnitRange(r[₁])
-> getfield(sym_in(₁, bn) ? b : a, ₁)
-> !isempty(₁) && ₁ != "."
-> iperm[perm[₁]] == ₁
-> ₁ == k ? 1 : size(A, ₁)
-> ₁ == k ? Colon() : idx[₁]
-> ₁ isa Integer ? UInt64(₁) : String(₁)
-> !isa(₁, DataType) || !(₁ <: Tuple) || !isknownlength(₁)
-> at.val[₁] isa fieldtype(t, ₁)
-> !isa(₁, SSAValue) || !(₁.id in intermediaries)
-> last_stack[₁] != stack[₁]
-> (₁ = new_nodes_info[₁]; (₁.pos, ₁.attach_after))
-> ₁ != 0 && !(₁ in bb_defs)
-> !(isa(₁, LineNumberNode) || isexpr(₁, :line))

Observations:

Counting the number of characters saved is left as an exercise to the reader, but seems to be significant IMHO
The syntax has no problem handling any of the anonymous functions. The only wrinkle is unused arguments, but -> ₂; true doesn't look half bad
I think for the most part its pretty easy to read. The exception would be the anonymous functions that use integer indexing: ₁[1] is a bit confusing.

yurivish commented 3 years ago

Is it the case that each subscript appears as a single character but takes four keystrokes to type? \_2<tab>

bramtayl commented 3 years ago

Yes, I suppose so, but copy-paste can help quite a bit (as with any variable name I suppose)

rapus95 commented 3 years ago

the number of characters saved should be the same for the subscript and the original proposal in all cases where both are applicable. Regarding the number of characters needed, I'd probably rather count the number that needs to be read rather than the number that needs to be typed since the former is the better metric for mental load IMO. So I also like the subscript variant as an alternative that is always applicable but I again want to note that this and the original proposal are orthogonal. As such they can coexist ☺️ (and I still favor the underscore variant because it loads argument searching off my brain) So I'd be fine with having both at the end!

bramtayl commented 3 years ago

Maybe the bare subscripts are too easy to confuse with numbers? _, _₂, _₃ etc. might be nice. For the common use case of 1 argument, it's still just 1 ascii character.

-> _findin(I[_], _ < n ? (1:sz[_]) : (1:s)
-> I[_][_findin(I[_], _ < n ? (1:sz[_]) : (1:s))]
-> firstindex(A,_):firstindex(A,_)-1+@inbounds(halfsz[_])
-> _ == dims[1] ? (mid:mid) : (firstindex(A,_):lastindex(A,_))
-> idxs[_]==first(tailinds[_])
-> string("args_tuple: ", _, ", element_val: ", _[1], ", task: ", tskoid()), input)
-> (batch_refs[_[1]].x = _[2]), enumerate(results))
-> Symbol(_[1]) => _[2]
-> !_.from_c && _.func === :eval
-> convert(fieldtype(T, _), x[_])
-> (_.filename, _.mtime)
-> !(_ === empty_sym || '#' in string(_))
-> _ == dims ? UnitRange(1, last(r[_]) - 1) : UnitRange(r[_])
-> _ == dims ? UnitRange(2, last(r[_])) : UnitRange(r[_])
-> getfield(sym_in(_, bn) ? b : a, _)
-> !isempty(_) && _ != "."
-> iperm[perm[_]] == _
-> _ == k ? 1 : size(A, _)
-> _ == k ? Colon() : idx[_]
-> _ isa Integer ? UInt64(_) : String(_)
-> !isa(_, DataType) || !(_ <: Tuple) || !isknownlength(_)
-> at.val[_] isa fieldtype(t, _)
-> !isa(_, SSAValue) || !(_.id in intermediaries)
-> last_stack[_] != stack[_]
-> (_ = new_nodes_info[_]; (_.pos, _.attach_after))
-> _ != 0 && !(_ in bb_defs)
-> !(isa(_, LineNumberNode) || isexpr(_, :line))
-> _₂ * (_ - _₃)
-> prepend!(_₂, _)
-> _.x = f(_₂...)
-> itr.results[_] = itr.f(_₂...)
-> _ | ~_₂
-> ~_ | _₂
-> ~xor(_, _₂)
-> ~_ & _₂
-> _ & ~_₂
-> _ .+ _₂
-> Dict{_, _₂}
-> :($_ || ($expr == $_₂))
-> _₂.status == "503"
-> isa(_₂, IOError)
-> IdDict{_, _₂}
-> _:_₂
-> _:_₂:_₃
-> first(_)-first(_₂)
-> isless(_[2], _₂[2])
-> lt(by(_), by(_₂))
-> (_₂; print(_, idx > 0 ? lpad(cst[_₃], nd+1) : " "^(nd+1), " "); return "")
-> (print(rpad(string(_) * "  ", $maxlen + 3, "─")); Base.time_print(_₂ * 10^9); println())
-> _(_₂)
-> wait(Threads.@spawn _(_₂))
-> f(_₂) ? (_..., _₂) : _
-> WeakKeyDict{_, _₂}
-> _₂; true
-> printer(_, _₂, _₃ > 0 ? code.codelocs[_₃] : typemin(Int32))

bramtayl commented 3 years ago

Also, according to wikipedia,

Sometimes, subscripts can be used to denote arguments. For example, we can use subscripts to denote the arguments with respect to which partial derivatives are taken.

rapus95 commented 3 years ago

Once again, don't intermix the individual underscore with a more complex notation please. I see why you still try to push the "all inidividual underscores refer to the same argument"-concept but I guessed we already explained why it doesn't scale.

Also, for my own feelings, I don't count like "", "2", "3", .... So I suppose to count, everywhere, or don't count at all. That way it feels more consistent to me. Also, that way, we have 2 different syntaxes to define independently which is nice.

But again, if I could, I'd veto intermixing single-underscore with anything that is not single-underscore. For orthogonal design reasons.

clarkevans commented 3 years ago

How about unicode circled numbers, e.g. ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨. They could be given a nice shortcut \o1 == ① .

-> printer(①,②,③ > 0 ? code.codelocs[③] : typemin(Int32))

bramtayl commented 3 years ago

Oooh, I like that!

stevengj commented 3 years ago

I think this proposal saves so little typing, add adds little or no clarity, compared to explicit x -> ... that it doesn't seem worth it.

The only point of a new syntax here is to abbreviate (and add clarity) for anonymous functions in the common case of very short expressions. After all of this discussion, and repeatedly finding myself wanting an underscore syntax in real-world cases, I keep circling back to the conclusion that Scala's rule is best — consume a single function call, and anything else can use x -> .....

rapus95 commented 3 years ago

@stevengj I feel like you mixed both, albeit orthogonal, anonymous function issues. The capture scope is only relevant in the other issue while this issue is meant to focus on short lambdas with multiple arguments. There the largest savings in chars are for 2 arguments where it cuts the number of characters needed for the lambda syntax in half. I. e. 4 instead of 9 and as Stefan showed in https://github.com/JuliaLang/julia/issues/38713#issuecomment-740092621 it is at the same time capable of shifting the focus to where it belongs to. The actual action that is carried out. Anything that goes beyond a single underscore is out of scope of this proposal because it is orthogonal to it.

The only point of a new syntax here is to abbreviate (and add clarity) for anonymous functions in the common case of very short expressions.

And that's exactly what the proposal is capable of. ->mapreduce(_,+,_,init=0) imo is VERY clear and effectively reduces boilerplate to 0 because it only has the lambda indicator and a single char per argument which visually stands out. Do you see any way to reduce the amount of characters needed anymore without becoming ambiguous or less clear?

stevengj commented 3 years ago

And that's exactly what the proposal is capable of. ->mapreduce(_,+,_,init=0) imo is VERY clear

The other PR (#24990) already supports mapreduce(_,+,_,init=0) with no -> at all, which is even more compact and clear.

The only reason to have a -> fence is to capture nested calls, but at that point the expressions are becoming complicated enough that the savings vs. x-> or (x,y)-> are not really worth a new syntax in my opinion.

bramtayl commented 3 years ago

I know I mentioned this above, but maybe this is a better way to put it. Querying is not a domain-specific application, but a handy syntax style that can used for variety of problems. You can't really use tight currying for querying. That is, df |> filter(_.age > 50, _) would not transform into df |> x -> filter(y -> y.age > 50, x). On the other hand, df |> filter(-> ①.age > 50, ①) would do the job nicely.

stevengj commented 3 years ago

In that example the syntax of this PR saves you at most one character over df |> filter(x-> x.age > 50, _) ...

bramtayl commented 3 years ago

I'm not sure I'm following? If you compare:

df |> x -> filter(y -> y.age > 50, x) with df |> filter(-> ①.age > 50, ①) I see four less characters (but I think more importantly, much less of trying to figure out what x and y mean)

clarkevans commented 3 years ago

In that example the syntax of this PR saves you at most one character over df |> filter(x-> x.age > 50, _) ...

I have absolutely no idea what this could/should mean. The main problem isn't the number of characters to be typed, but code maintenance -- often by someone who didn't author the original work (and for me, all it takes is a few months, and I could swear I wasn't the implementer... till I look at the commit records and wonder what could have possibly possessed me).

stevengj commented 3 years ago

If you compare df |> filter(-> ①.age > 50, ①) to df |> filter(x-> x.age > 50, _) (tight underscore currying) you are saving 1 character.

I have absolutely no idea what this could/should mean

I was referring to the Scala-like tight underscore currying in the other PR. It's a matter of taste, but it doesn't seem hard to learn that f(x,_) is an anonymous function y -> f(x,y).

(To me, filter(-> ①.age > 50, ①) is confusing.)

stevengj commented 3 years ago

Put another way, I don't see much point in discussing a new completely general anonymous-function syntax, which is what this -> ①... syntax is on the verge of becoming (it handles 99.9% of the cases where -> is used). We already have such a syntax and we're not going to get rid of it.

clarkevans commented 3 years ago

(To me, filter(-> ①.age > 50, ①) is confusing.)

I agree; and so are the other implicit options.

stevengj commented 3 years ago

To me, there is no point in debating anything other than an implicit option, because explicit "fenced" options (like in this PR) will inevitably overlap too much with the current -> syntax and add too little improvement.

bramtayl commented 3 years ago

Hmm, well, actually, we already have two syntaxes for anonymous functions:

function (x)
    x + 1
end

x -> x + 1

Seems like not too big of a deal to add a third one:

-> ① + 1

Note that, as you move up the list, the syntax becomes more verbose but only slightly more powerful

rapus95 commented 3 years ago

I was proposing a cleaner style for a certain subset of lambdas which can't be handled by the fence-less proposal of the other issue. Hence my considerations about orthogonality. sure, the other proposal has some overlap where both can be used but I don't see any lambda proposal that gets near the clarity/conciseness of ->f(2_)+_

clarkevans commented 3 years ago

->f(2_)+_

What would this mean?

rapus95 commented 3 years ago

-> --| ---|
     V    V
->f(2_) + _

as such (x, y) -> f(2x) + y

simeonschaub commented 3 years ago

Seems like not too big of a deal to add a third one:
-> ① + 1

I have to agree with @stevengj here. This only saves one character over the current syntax, is more of a pain to type and IMHO also harder to read. I think it's also generally a good idea to try to stick to ASCII characters for language features, since Unicode is not always well supported in all editing environments.

rapus95 commented 3 years ago

even more important: circled numbers, once again, are an orthogonal issue. Can we please stick to the individual underscore variant which was proposed? Because increasing the amount of different approaches to discuss in a single issue never did anything else but stalling the actual proposal... Also, I don't want to complicate the proposal since if we make it a little more complicated it isn't easier or clearer than an ordinary lambda anymore. Also, please stop focusing your "calculations" on single argument lambdas because these are handled in the other proposal and thus you wouldn't use the fenced approach anyway since it wouldn't provide any benefit over the other proposal.

So please keep this focused on simple (nested) multi-argument lambdas.

stevengj commented 3 years ago

Also, please stop focusing your "calculations" on single argument lambdas because these are handled in the other proposal

Multi-argument lambdas are handled in the other proposal (or rather, in the working PR) as well. The new thing here is nested calls, using a headless -> fence, where you are saving typing x (single-arg) (x,y,...) (multi-arg).

clarkevans commented 3 years ago

Following Aaron's observation that the interpretation of multi-argument lambdas are in #24990, I've commented there instead of here. The skinny is, I think that using the Nth underscore for the Nth argument is not supported by the examples @bramtayl provided. With regard to nested lambdas, as someone who has to maintain other people's scripts, I think it's confusing. That said, it's not something I have a strong opinion.

StefanKarpinski commented 3 years ago

Personally, I think the "character counting" criteria for this feature is misguided. For me at least, this is not mainly about saving typing. What it is about, as I described above, is expressing an operation in a way that is syntactically focused on the operation and not on the arguments that are beside the point. Writing (x, y) -> x[y] puts the focus on x and y whereas -> _[_] puts the focus on the [ ] operation; admittedly not as much as just _[_] but the problems with automatically deciding how much expression to consume seems hard after all the discussion in #24990 (although I thought we had gotten to a pretty good rule towards the end). Note that we can introduce this explicitly delimited version and later add a rule for implicitly inserting the delimiter.

tpapp commented 3 years ago

Writing (x, y) -> x[y] puts the focus on x and y

Not for me — I think my eyes have learned to move automatically to the body of lambda expressions. This is also something that an editor can handle easily (gray out/visually de-emphasize the (x, y) ->).

I think that this kind of visual focus happens automatically with most notation after a bit of exposure, eg in ∫ f(x) dx most people look at the f(x) first. A lot of mathematical notation is seemingly "redundant" this way, but it serves an important purpose: clarity and readability.

Personally I prefer to write out the arguments (x, y) -> in exchange for not having to think about how the _ expansion works. While this proposal is clearest of all of the similar ones, it still involves locating and counting the _s (eg to determine arity).

stevengj commented 3 years ago

Put another way, "saving typing" is a proxy for the observation that for very short expressions, x -> and -> add visual noise that impedes clarity. Compare all(_ > 2, x) with all(y -> y > 2, x) or even all(-> _ > 2, x). Or, for the multi-arg case, compare reduce((x,y) -> f(x,y,data), a) with reduce(f(_,_,data), a). While, for anything that's not a very short expression, our current syntax is fine.

@tpapp, "locating and counting" the underscores is no longer a chore if all the underscores are required to be arguments of a single function call. (But it seems clear that the most common use cases of an implicit "headless" lambda syntax will be single-argument lambdas. See also https://github.com/JuliaLang/julia/pull/24990#issuecomment-752110486 for a survey of single-call lambdas in Base.)

knuesel commented 3 years ago

Regarding the readability by focusing on operations, I think it's not that clear cut: sometimes _ really helps to avoid meaningless names, but names can often be chosen judiciously to make code more readable. Compare this:

map(-> _[_], arrays, indices)   # looks like some Perl got copy-pasted here :)

with this:

map((A,i) -> A[i], arrays, indices)

The second one is arguably more readable or beginner-friendly.

But I think this proposal introduces serious issues in the interaction with #24990. The original comment says

And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

It's nice that a fix is only 2 characters, but the main problem is about readability and reliability of behavior:

Pipe issues

-> doesn't play very nicely with pipes, as mentioned on Discourse:

1:2 .|> x->x^2 |> sum |> inv    # Result may not be what you expect

24990 alone helps to side-step this problem:

1:2 .|> _^2 |> sum |> inv    # Probably what you expect

A headless -> however has the same issue as x->x^2, and the combination with #24990 brings additional troubles:

value |>
  -> _^2 |>
  log(3, _)

This would actually mean value |> x -> (x^2 |> log (3,x)). Thankfully this one would give an error rather than the "wrong" result.

The precedence of a headless -> could be made different from that of normal -> but that would be even worse (more confusing) I think.

Parser instability

With this proposal, something like map(log(3,_), A) takes a completely different meaning when copy pasted here:

f(A) = -> _ .+ map(log(3,_), A)

I find it more readable with an explicit name for the -> lambda:

f(A) = B -> B .+ map(log(3,_), A)

(The = -> _ .+ ASCII jumble was unintended but could be used to make another point :) )

The first point I want to make here is that this proposal enables larger expressions with _ in different places, and it quickly becomes hard to tell which _ are the same value and which are not.

By contrast, with #24990 alone, the nameless placeholder always has local effect which is great for readability.

My second point: it's unsettling that a "big" expression like map(log(3,_), A) gets parsed differently when -> is inserted higher in the AST. This is what macros do! When I see a big round @ I know that code is getting rewritten. For me this is a strong reason to prefer @_ for this behavior.

rapus95 commented 3 years ago

Comparison of Visual Penalties

Writing (x, y) -> x[y] puts the focus on x and y

Not for me — I think my eyes have learned to move automatically to the body of lambda expressions. This is also something that an editor can handle easily (gray out/visually de-emphasize the (x, y) ->).

well then, ditch the head and tell me which are the bound variables (i.e. the ones that will be filled in).

(a,h,m) -> somelargefunctionname(f, inv(a), g, h, x, b, kw=m, okw=r)

        -> somelargefunctionname(f, inv(_), g, _, x, b, kw=_, okw=r)

Now, how often did you move your eyes back to check the order of the arguments and which arguments are bound at all? For the underscore case it's enough to scan the line once. To know both. Since you know for sure that only underscores will be bound and they will be filled in the order of appearance.

I think that this kind of visual focus happens automatically with most notation after a bit of exposure, eg in ∫ f(x) dx most people look at the f(x) first. A lot of mathematical notation is seemingly "redundant" this way, but it serves an important purpose: clarity and readability.

That actually is an argument in favor of having -> as the indicator for a lambda since it will be handled intuitively in the same way as the integral sign in your example.

Personally I prefer to write out the arguments (x, y) -> in exchange for not having to think about how the _ expansion works. While this proposal is clearest of all of the similar ones, it still involves locating and counting the _s (eg to determine arity).

Underscore case: locating, determining order and counting: can be done in a single pass i.e. you don't have to go back and forth. identifier case: locating, determining order and counting: can take multiple passes because you need to look up whether an identifier name is bound or not and even if you spotted them right away you need to "unshuffle" the ordering if they don't appear in the same order as in the head. Since you can't rely on the head being in the order of occurence, you have to check it.

What to measure and what not

Put another way, "saving typing" is a proxy for the observation that for very short expressions, x -> and -> add visual noise that impedes clarity. Compare all(_ > 2, x) with all(y -> y > 2, x) or even all(-> _ > 2, x). Or, for the multi-arg case, compare reduce((x,y) -> f(x,y,data), a) with reduce(f(_,_,data), a). While, for anything that's not a very short expression, our current syntax is fine.

Sure, since #24990 started to have multiple underscores refer to multiple arguments and thus having more cases in common with this proposal these cases can't be counted against the advantage of this proposal anymore. Doing a comparison only on the cases where both are applicable and thus #24990 leads the comparison is a fairly unfair move. So lets look at the cases that are still left over since they only work with this proposal and for the shared cases one wouldn't consider using this one anyway. So let's ignore them (though, both proposals must keep the same behaviour on these shared cases). Thus, only the slighty nested cases remain here. Treat this proposal as a generalization of #24990 but still a strict subset of the ordinary lambda since we can't reuse arguments.

@tpapp, "locating and counting" the underscores is no longer a chore if all the underscores are required to be arguments of a single function call. (But it seems clear that the most common use cases of an implicit "headless" lambda syntax will be single-argument lambdas. See also #24990 (comment) for a survey of single-call lambdas in Base.)

well, locating underscores should generally not be difficult given they stand out a lot in average code lines. If that's still too hard, adding it to the highlighter will allow for them burning your eyes 😁

Regarding the readability by focusing on operations, I think it's not that clear cut: sometimes _ really helps to avoid meaningless names, but names can often be chosen judiciously to make code more readable.

Well, yes, if you want to make ugly code you can. Nothing will change for that. We already stated that a lot of cases are better handled by actual written out ordinary lambdas. The argument, I will refer to it as bad style measurement, is the same as "we must not allow operator overloading because people could go nuts with it". It's true. But we can't enforce good coding style anyway so let's focus on the opportunities that arise if we don't include bad style measurements in our decision making.

Claims about incompatibilities between #24990, piping and this proposal

But I think this proposal introduces serious issues in the interaction with #24990. The original comment says

And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

It's nice that a fix is only 2 characters, but the main problem is about readability and reliability of behavior:

Pipe issues

-> doesn't play very nicely with pipes, as mentioned on Discourse:
1:2 .|> x->x^2 |> sum |> inv    # Result may not be what you expect

so your argument is that because ordinary lambdas are not playing nicely with pipes we have a design issue in a shorter lambda notation that should not behave differently (as you conclude yourself further down)? I also think we should stick to equivalent behaviour here and rather think about overhauling the pipe precedence in Julia 2.0 because that's that actual problem underlying the argument.

24990 alone helps to side-step this problem:
1:2 .|> _^2 |> sum |> inv    # Probably what you expect

of course! because we have chosen the strict consume-1-call variant. If it'd resemble generic lambdas then it would induce the same problems. So that's not a special feature of the fence-less lambda but rather a lucky coincidence of the capturing rules. And to be fair I'm not sure if I'd consider values |> map(_<1, _) to be strictly more concise than values |> ->map(->_<1, _) or, if |> becomes special-cased as a properly scoped lambda fence, values |> map(->_<1, _). But since I said I don't want to care about bad style measurements for the quality of a feature I won't consider that to be a disadvantage. It's just, to me the last one is the most concise and reusable one because it clearly shows that there are different scopes and it allows to modify and move around the code without breaking.

A headless -> however has the same issue as x->x^2, and the combination with #24990 brings additional troubles:
value |>
  -> _^2 |>
  log(3, _)
This would actually mean value |> x -> (x^2 |> log (3,x)). Thankfully this one would give an error rather than the "wrong" result.

it would mean value |> (x,y) -> (x^2 |> log (3,y)) because different underscores become different arguments. still, it would error. but once again, that's a problem of the way how lambdas and pipes interact. I.e. this problem aswell boils down to the bad precedence interaction of the pipe operator. Having the pipe operator act as a headless lambda fence with proper scoping for unbound underscores would solve that issue for the general case (i.e. in more cases than #24990). E.g. values |> map(->_^2, _) |> map(f, _) but at the same time cost us the ability to use #24990 in pipes. So in order to not to lose the ability to use #24990 with the pipe operator syntax we could move that headless lambda meaning combined with tuple splatting into |>> which allows for really cool stuff like that:

(f, A, B) |>> (map(_, _), _) |>> + |> extrema |>> (2_+_)/3

which doesn't even have a compact written out form. But since it's a new operator that can be decided at a later point. (I.e. kind of orthogonal to this issue)

The precedence of a headless -> could be made different from that of normal -> but that would be even worse (more confusing) I think.

exactly, keep them behaving exactly the same.

Parser instability

With this proposal, something like map(log(3,_), A) takes a completely different meaning when copy pasted here:
f(A) = -> _ .+ map(log(3,_), A)

As I said earlier #24990 isn't meant to be moved around and reused since even the slightest modification can break it. And if you want to move it around blindly then you will most probably do so by copy-pasting. So the size of the snippet with an extra -> is not relevant. And blindly copying code around is known to be quite bug prone anyway. So if you want to be able to move around your code or make it a template you generally should rely on closed explicit forms like the ordinary lambda or the the proposal here.

I find it more readable with an explicit name for the -> lambda:
f(A) = B -> B .+ map(log(3,_), A)
(The = -> _ .+ ASCII jumble was unintended but could be used to make another point :) )

Again, bad example measurement. Too much ASCII jumble? go for ordinary form as you would have to if this proposal wouldn't take place.

The first point I want to make here is that this proposal enables larger expressions with _ in different places, and it quickly becomes hard to tell which _ are the same value and which are not.

Oh that one is easy! None are the same value since they all resolve to different arguments. And even then it's nothing more difficult than finding which x are which in x->map(x->x<2, x). So that again is a bad example measurement.

By contrast, with #24990 alone, the nameless placeholder always has local effect which is great for readability.

Within cluttered code you probably are right. But here's the trick: if the advantage of fenceless lambdas is so high that it would be an argument for not using a headless lambda (as designed in this proposal), then, well, don't use a headless lambda in that scope.

My second point: it's unsettling that a "big" expression like map(log(3,_), A) gets parsed differently when -> is inserted higher in the AST. This is what macros do! When I see a big round @ I know that code is getting rewritten. For me this is a strong reason to prefer @_ for this behavior.

I wouldn't call that "unsettling" but "strictly wanted behaviour". It's the same as scoping/shadowing if you suddenly introduce a local definition for a locally unbound variable somewhere before it in the AST. Result: it gets bound. If you want to avoid binding an unbound variable, then do a quick scan for an unbound occurence of that variable. If you want to avoid binding an unbound underscore, do a quick scan for an unbound underscore. Here the highlighter is also a valid way to assist spotting these by highlighting unbound underscores differently than bound ones. Also: You have be careful about copying underscores around anyway given that there are multiple packages with macros that rewrite underscores. So this one won't change much about having to be careful.

knuesel commented 3 years ago

so your argument is that because ordinary lambdas are not playing nicely with pipes we have a design issue in a shorter lambda notation that should not behave differently (as you conclude yourself further down)?

My argument is that when #24990 can be used, it provides a welcome workaround around this problem with ordinary lambdas and pipes, but this headless proposal breaks it.

If it'd resemble generic lambdas then it would induce the same problems. So that's not a special feature of the fence-less lambda but rather a lucky coincidence of the capturing rules.

It doesn't matter: #24990 works well with pipes, but this proposal breaks it.

to be fair I'm not sure if I'd consider values |> map(_<1, _) to be strictly more concise than values |> ->map(->_<1, _) [...] to me the last one is the most concise and reusable one [...]

Is this not overselling a bit your proposal? 😊 I'd hope we can at least agree that the first one is more concise...

Having the pipe operator act as a headless lambda fence with proper scoping for unbound underscores would solve that issue for the general case (i.e. in more cases than #24990). E.g. values |> map(->^2, ) |> map(f, _) but at the same time cost us the ability to use #24990 in pipes.

Special-casing the |> operator is an option. I'm not sure what to think of it. It has obvious advantages but it makes the |> and _ features less orthogonal, and it's one more rule to learn and to think of when trying to make sense of underscores. When it was discussed in #24990, there was some push-back asking what then of <| and ∘ and how it scales in the future with additions such as ⨟. See https://github.com/JuliaLang/julia/pull/24990#issuecomment-600268361 and https://github.com/JuliaLang/julia/pull/24990#issuecomment-600414318.

So in order to not to lose the ability to use #24990 with the pipe operator syntax we could move that headless lambda meaning combined with tuple splatting into |>> which allows for really cool stuff like that:
(f, A, B) |>> (map(_, _), _) |>> + |> extrema |>> (2_+_)/3

What I Iike most about #24990 is the simplicity and readability. For me this goes in the other direction (powerful but dense, almost opaque syntax, I'd rather support with a macro than in the core language).

As I said earlier #24990 isn't meant to be moved around and reused since even the slightest modification can break it.

It's this proposal that breaks it. On its own, #24990 can be moved around and reused without issue, a really nice property! I think it would be a shame to lose it.

if the advantage of fenceless lambdas is so high that it would be an argument for not using a headless lambda (as designed in this proposal), then, well, don't use a headless lambda in that scope.

My concern is not about writing code but reading it (maybe not my own code). With #24990 it's trivial to interpret any _. This is lost when introducing headless lambdas.

Also, of course we can always say "just don't do that" (reminds me of C++) but it's nice when the language design helps to avoid pitfalls.

It's the same as scoping/shadowing if you suddenly introduce a local definition for a locally unbound variable somewhere before it in the AST. Result: it gets bound.

Indeed, a source of pain (for beginners at least). It would be nice to avoid adding another one.

You have be careful about copying underscores around anyway given that there are multiple packages with macros that rewrite underscores.

Yes, but that's the whole point of macros: they rewrite code. And they come with this distinctive @ symbol. I think headless lambdas are a great fit for a macro. Do you have a strong objection to using @_ rather than ->?

rapus95 commented 3 years ago

My argument is that when #24990 can be used, it provides a welcome workaround around this problem with ordinary lambdas and pipes, but this headless proposal breaks it.

I don't see where this proposal breaks it. If you can use #24990 to work around the problem of ordinary lambdas with pipes, then why would you use a lambda? Doesn't care if it's headless or not. You're convoluting two things here. Unexpected precedence and unintended underscore consumption. Your argument uses #24990 to workaround the unexpected precedence and then you say that this proposal breaks it. If that's true then ordinary lambdas break YOUR proposal because an ordinary lambda would break that in the same way. What you are actually trying to make a point of is again the unintended consumption of an underscore. ('ll say more about it further down)

It doesn't matter: #24990 works well with pipes, but this proposal breaks it.

Only as long as your function is almost trivial. Because in every other case #24990 isn't even applicable. So yes there are a few cases where #24990 works well with pipes but in a lot of cases it isn't even applicable. And in these cases you always will have unexpected precedence since you will have to fall back to ordinary lambdas. But with this proposal you can still often have a more concise short lambda syntax at your hand. Which btw perfectly works the same as ordinary lambdas even for multiple pipes if you don't intermix both proposals. Because behaving the same as an ordinary lambda just with shorter and, when properly used, more concise syntax is the intention of this proposal. Only changing the precedence or introducing an additional pipe operator with fixed precedence will fix that.

It's this proposal that breaks it. On its own, #24990 can be moved around and reused without issue, a really nice property! I think it would be a shame to lose it.

I only think it would be a shame to encourage mindlessly copying around code you don't understand. Call, not copy. Or understand your code. I don't get why you focus so much on the copy case. For the record: Reuse is the opposite of copying. I.e. high reusability means to not to have to copy around code but to be able to just call the functions where they already exist.

TRIGGER-WARNING Given that I don't think we should focus syntactical decisions about shorter-for-conciseness variants on developers who write their code by copy-pasting without understanding their code (it always reminds me of scratch, btw a nice tool for playing around with code w/o having to care about scopes or the AST too much) I would suggest to finally make the copy-metric a non-metric.

24990 is ~perfect~ nice and this proposal breaks it

No. No. No. We don't break functionality per se. They just can't be used together mindlessly. In other words: You don't like this feature? fine, don't use it in your code! You won't get into any trouble. You won't even experience any differences and it will be as if this feature never existed. That's it. I also think it will be easy to decide if you use one proposal or the other line by line since almost all usecases of both proposals are one-liners. For large multi-liners neither will be used much and most certainly wouldn't be that concise anyway.

Anyway, you are trying to make the proposals mutually exclusive while they actually aren't. If they were I would have to ask you why you think any of the two proposal has more right to exist than the other and why we should exclusively take the less capable variant. But that's not the case. They are not mutually exclusive.

Here are our distinct mindsets once again: You try to restrict based on what could go wrong while I try to allow multiple syntactical approaches with different tradeoffs. In Julia we have a lot of TMTOWTDI (there's more than one way to do it). For example have a look at comprehensions <-> maps <-> iterators. If we went for "nah we only want a single way to do it", then, well, #24990 wouldn't be a thing either since we already have lambdas. But we want TMTOWTDI! We can have both features. As with all other "redundant" features that also inhabit different tradeoffs. As long as the behaviour is deterministic, which it would, thanks to scoping, there's no reason to select one exclusively here.

to be fair I'm not sure if I'd consider values |> map(_<1, _) to be strictly more concise than values |> ->map(->_<1, _) [...] to me the last one is the most concise and reusable one [...]

Is this not overselling a bit your proposal? 😊 I'd hope we can at least agree that the first one is more concise...

I don't agree given that you're intentionally using underscores in close positions, even in the same scope, to resolve to different scopes. So it's not overselling but a matter of taste. The variant with the single -> instead shows that there are different scopes by explicitly introducing a new scope and it probably won't strike anyone surprisingly that lambdas are involved 😄. That's why I find it more concise. I'd agree yours is the shortest. But not the most concise. (And since the ->binds the underscore in the first argument it should be a good thing for you since it is more robust to be copied around. But still, that's a non-metric.)

Having the pipe operator act as a headless lambda fence with proper scoping for unbound underscores would solve that issue for the general case (i.e. in more cases than #24990). E.g. values |> map(->^2, ) |> map(f, _) but at the same time cost us the ability to use #24990 in pipes.

Special-casing the |> operator is an option. I'm not sure what to think of it. It has obvious advantages but it makes the |> and _ features less orthogonal, and it's one more rule to learn and to think of when trying to make sense of underscores. When it was discussed in #24990, there was some push-back asking what then of <| and ∘ and how it scales in the future with additions such as ⨟. See #24990 (comment) and #24990 (comment).

Yes that's why I suggested creating a new operator for it. Then it remains entirely orthogonal. Btw, #24990 is also a rule more to learn. So again, that's a matter of taste which rule is more worthy. Not an objective fact to use as an argument. If I would use it as an argument I'd say, ditch #24990 and go with this proposal and special casing of |>. Then there would be only a single extra rule about underscores, that is, lone underscores on RHS bind to the next pipe operator or headless ->. Should be in your favor since it allows more cases with still only a single rule to learn about underscores. Especially it would work with pipes flawlessly! But I don't want to ditch #24990. I want both. (Or, using some famous words, "We want it all" 😏, though, that "we" may only refer to myself since I don't know much about other people's minds)

So in order to not to lose the ability to use #24990 with the pipe operator syntax we could move that headless lambda meaning combined with tuple splatting into |>> which allows for really cool stuff like that:
(f, A, B) |>> (map(_, _), _) |>> + |> extrema |>> (2_+_)/3
What I Iike most about #24990 is the simplicity and readability. For me this goes in the other direction (powerful but dense, almost opaque syntax, I'd rather support with a macro than in the core language).

I still feel like you use "shortness", "readability", "simplicity" and "conciseness" interchangeably in which case you would have to like my proposal about |>>. Even when assuming that you don't do so, I don't see where it hits the readability given that each pipe (|) effectively marks a border of evaluation (thus reduces scan range and mental load a lot) and the number of > plainly shows if the object is inserted as a single argument (=one >) into the function provided or splatted across multiple arguments/underscores (=multiple >). And given that it's using a new operator it should be easy to not to confuse the current mental model and definitively won't break code. Once again you are free to not to use it. And it's definitively less cumbersome than wrapping something into Base.splat.

if the advantage of fenceless lambdas is so high that it would be an argument for not using a headless lambda (as designed in this proposal), then, well, don't use a headless lambda in that scope.

My concern is not about writing code but reading it (maybe not my own code). With #24990 it's trivial to interpret any _. This is lost when introducing headless lambdas.

My concern is also about reading code. Reading code that can't be handled by #24990 (meanwhile there actually are examples where this proposal would be concise while being shorter than an ordinary lambda). And just for the record, interpreting a lone underscore wouldn't be trivial anyway. It depends on LHS/RHS and it has subtle nuances: _*M -> -_*M will break the code. Same for 5*_+2 or -_-5 but not for -5-_. Doing an AST analysis (i.e. in which order the calls occur) doesn't feel that much easier to me than looking for a -> (since it requires intrinsic understanding of the precedences etc opposed to being able to find two characters).

Also, of course we can always say "just don't do that" (reminds me of C++) but it's nice when the language design helps to avoid pitfalls.

Sure, my approach to proper language design is by making arguments about possibility of concise usage. Rather than by analysing worst case. Because once again, there are cases for #24990 which are ugly and would justify not adding it since "it's better to by design avoid pitfalls". Heck, a lot of syntactical sugar we have and love probably has such cases where "avoiding pitfalls" as strategy would've not allowed them to come into existence. (It also leans a little bit towards the TMTOWTDI mindset).

It's the same as scoping/shadowing if you suddenly introduce a local definition for a locally unbound variable somewhere before it in the AST. Result: it gets bound.

Indeed, a source of pain (for beginners at least). It would be nice to avoid adding another one.

A source of pain? Again, only if you mindlessly copy around snippets or try to edit code you didn't read. In all other cases you have different scopes (for example by calling instead of copying) or know which variables are taken. copy-metric is a non-metric.

And even if you notoriously would have to combine this proposal and #24990, having the wrong arity will lead to an immediate error, thanks to Julia's strong enforcement of proper typing and thus proper combination of functions. And if you understand the code which you write (which is important) you'll easily be able to find which argument should be a function (i.e. an unbound underscore).

Yes, but that's the whole point of macros: they rewrite code. And they come with this distinctive @ symbol. I think headless lambdas are a great fit for a macro. Do you have a strong objection to using @_ rather than ->?

Well yes, I guess the same as you, I don't want to use some external syntax for a feature so close to base even though it could be made as a macro. Otherwise stop #24990 since it does AST restructuring which is kind of worse than just renaming nodes and already makes #24990 struggle since a new node is interleaved somewhere into the AST which doesn't have an underscore as it's immediate subexpression and thus manipulates distant AST expressions somewhere further up in the tree. For this proposal we only automatically infer the argument list. More precisely we add an element into the -> node while not changing the structure of the AST (i.e. the parents of all immediate or indirect subexpressions won't change) and rename a few underscore nodes (structure still doesn't change). The node, expressed by -> already is in the right place of the AST and the structure won't change.

Having said that all I again want to emphasize that I want #24990 and this proposal to coexist, each with its own distinct spot of usefulness. If for whatever reason both proposals are useful in the same place the only thing to remember is to decide for one. And if it can be fully handled by #24990 go for it. If it can't, here is the feature that can.

StefanKarpinski commented 3 years ago

Even though many people want an anonymous function syntax that doesn't require the leading -> that's proposed here, it should be noted that these are not incompatible and the syntax without the leading -> can be considered a further abbreviation where you have some rule that tells you where to insert the ->.

rapus95 commented 3 years ago

Which, IMO would be the better way anyway since that would make this proposal here be the generic case and the fence-less lambda just be a special case of it that could be implemented by a simple AST insertion. Also would make macros for fence-less lambdas trivial to implement sind its kinda like "one out, insert fence". Julia cumulates so much elegance/expressiveness/performance by just combining perfectly chosen abstractions/specializations. 🙈 IMO head-less lamda fits into it perfectly well.

rapus95 commented 2 years ago

Having opened a duplicate of this myself I seem to really want this feature. 😄 Now that #24490 is stalling at having no clear path forward I want to sum up my current position.

Proposal A: Introduce argument-less `->` as a collecting fence/barrier/boundary for unbound underscores

Aim:

capture the essence of an action (in the mathematical sense) including all partial applications and the cases, where we have nested or mixed scopes
providing a very concise (using -> which is clearly hinting the functional aspect) but unambiguous short-form for lambdas that has explicit scoping

Formally, this introduces -> as a unary/prefix operator variant, which (other than other unary operators) has the same precedence as the infix variant of ->.

It acts as a middle spot between implicit scoping and no syntactic sugar at all, preventing a false dilemma. Particularly useful in cases where operators are defined but accessor-functions are used (-> the outer call is not an operator, thus #24990 doesn't work) as well as scenarios where operators aren't defined at all, as in a lot of non-mathematical contexts.

Examples

->sqrt(_^2+_^2) # euclidean distance
map(->exp(_*im), 0:10) # euler formula
map(->abs(_+_), x, y) # euler formula
filter(->abs(_.offset)>4, x)
filter(->!ismissing(_.v), x) # filter all where property v is not missing
filter(->real(_^2)>0, x) # filter all where the square has positive real part
reduce(->merge(_,normalize(_)), x, init=EmptyOne()) #custom types that don't use/define operators
->(x->_+_*x+_*x^2+_*x^3) # cubic polynomial builder

# some more cases that also work with #24990 where the only benefit here is the explicit scoping for those who prefer it
->_[_] #indexing (outer scope) (also works with #24990)
->_(_) #application (outer scope) (also works with #24990)
->_(_...) #splatting (inner and outer scope) (=Splat(_)(_), might work with #24990?)
->_[1] #indexing (outer scope) (also works with #24990)
->A[_] #indexing (inner scope) (also works with #24990)
->_(5) #application (outer scope) (also works with #24990)
->f(_) #application (inner scope) (also works with #24990)
->_(B...) #splatting (outer scope) (also works with #24990)
->f(_...) #splatting (inner scope) (won't work with #24990 due to nesting)

This allows for more specific(/ugly/concise) usage. But explicit scoping ideally assists understanding what is happening nevertheless.

Benefits

explicit head for parsing
obvious relation to lambdas since it also uses the ->
eases and increases readability because you know a) that all following underscores will belong to the same anonymous function initiated by -> (easier than #24990) and b) know right away in which order the arguments are used and thus don't have to scan back and forth (easier than ordinary lambdas with argument lists that may use arguments in another order than passed) c) benefits reading over writing

non-breaking since it's currently invalid syntax:

julia> map(->abs(_+2), 1:10)
ERROR: syntax: invalid identifier name "->"

Proposal B: Introduce a new pipe-variant `|>>` as syntactic sugar with proper precedence and argument splatting included

It's syntactic sugar for an ordinary pipe which by expansion works as a fence for underscores in the same way as ->. And it also acts as if the following function was wrapped in a Splat (which will be introduced and exported in Julia 1.9, hopefully together with this proposal!).

Expansion Example

x |>> (map(_, _), _)
#expanding `|>>` to
x |> Splat(->(map(_, _), _))
#expanding headless `->` to
x |> Splat((a1, a2, a3)->(map(a1, a2), a3))

By expanding it in this way, this also solves the disadvantageous precedence between (ordinary) lambdas and piping for free, since it encapsulates the lambda into the Splat call which prevents -> from consuming (multiple) pipe operators.

Example for mixed usage of all approaches & how it is evaluated

(f, A, B) |>> (map(_, _), _) |> Splat(+) |> extrema |>> isodd(_+_)
              (map(f, A), B) |> Splat(+) |> extrema |>> isodd(_+_) #C=map(f, A)
                                     C+B |> extrema |>> isodd(_+_) #D=C+B
                                         extrema(D) |>> isodd(_+_) #(E,F)=extrema(D)
                                                        isodd(E+F)

Benefits

purely syntactic
probably very easy to implement
elegant expansion solves precedence
complements |>+#24990 (that proposal for concise single argument cases and this proposal for simple multi-arg cases)
hints the integrated Splat with the extra “feeding” > (|> vs |>>)

non-breaking since it's currently invalid syntax:

julia> (x,2) |>> abs(_+_)
ERROR: syntax: ">" is not a unary operator

Side Notes

this proposal sits somewhere in between generic lambda syntax and #24990 in regard to flexibility and capability. It just has a focus on different trade-offs. Thus, don't evaluate the proposal based on the primary use cases of the other variants! Those are unrelated/orthogonal to this proposal in the same way as the other variants are. And if any certain situation is better solved with any of the other variants, USE THAT!
worst case readability is ALWAYS bad. Don't use as metric.
if number of characters is counted at all (which probably is generally not a good metric), never compare keystrokes to type but characters to read! We want readable code, not writable code! (at least in all cases where those diverge from each other)
extensions to this proposal (like circled numbers) are out of this proposal's scope, since this proposal in no way depends on them. Thus, open another issue for it.

LilithHafner commented 2 years ago

If it weren't for 24990, I would support this proposal, but if https://github.com/JuliaLang/julia/pull/24990 merges with the "one function call and zero or more operator calls" semantic, then I see little advantage of this proposed syntax. In the listed examples of why -> is useful, only three cases are not doable with the implicit capture of #24990. The example usage of |>> ((map(f, A), B) |> Splat(+) |> extrema |>> (_+_)/2) is also more simply expressed under #24990 as (map(f, A), B) |> Splat(+) |> extrema |> (_+_)/2.

I think this is an alternative to 24990, not an orthogonal proposal. I would not like to see both merge because that would introduce too much sugar for the same thing and the marginal utility of one proposal once the other merges is very slight.

Existing (x,y,z) -> y(f(z) + x^2) syntax handles complex cases well, 24990 handles simple filter(_.value > 0, x) cases well, and I don't think there is enough middle ground to warrant the extra syntactic complexity of this proposal and the in addition to the ->less version.

The best case I can see is replacing map(x -> exp(x*im), v) with map(->exp(_*im), v), and I don't find it compelling.

rapus95 commented 2 years ago

If it weren't for 24990, I would support this proposal, but if #24990 merges with the "one function call and zero or more operator calls" semantic, then I see little advantage of this proposed syntax. In the listed examples of why -> is useful, only three cases are not doable with the implicit capture of #24990. The example usage of |>> ((map(f, A), B) |> Splat(+) |> extrema |>> (_+_)/2) is also more simply expressed under #24990 as (map(f, A), B) |> Splat(+) |> extrema |> (_+_)/2.

I'll update the examples to be more outstanding (mostly just replacing operators with functions is enough to make #24990 non-usable). For the pipe example, that's primarily for showing how it would work, not for showing that there are no other ways to write that. Also, how did you get there: (map(f, A), B)? Because the input was a tuple, so either you need to do tuple destructuring beforehand or you cannot do it that way.

I think this is an alternative to 24990, not an orthogonal proposal. I would not like to see both merge because that would introduce too much sugar for the same thing and the marginal utility of one proposal once the other merges is very slight.

Different nesting layers just don't work with the other proposal. And just because the other syntax works in some shared cases, doesn't make them automatically particularly readable. In particular, I really like it if I have access to a concise and explicit syntax which doesn't force me into making assumptions about the actual scoping in each case just because I stumbled over an underscore. In other words, what about the people who want to use the non-explicit variant only in very simple single-argument cases to reduce mental load?

Existing (x,y,z) -> y(f(z) + x^2) syntax handles complex cases well, 24990 handles simple filter(_.value > 0, x) cases well, and I don't think there is enough middle ground to warrant the extra syntactic complexity of this proposal and the in addition to the ->less version.

To me, just because both cases are applicable, that's not necessarily a middle ground. It's more like a matter of taste whether you want implicit or explicit scoping, when both do the right thing. Also, it's only that easy if the corresponding type doesn't use a comparator function and accessors. Both are considered to be good Julian style. #24990 is just heavily restricted to operators. So yeah, have a look at the side-notes. There are some things that are certainly more concise in the other syntax approaches. And that's why both proposals aren't mutually exclusive. We often have TIMTOWTDI in Julia. And they complement each other quite well. Both relate to underscores, one being implicit, the other explicit. And the explicit variant has an obvious relation to anonymous functions. So I'd consider it well-rounded.

oscardssmith commented 1 year ago

Triage approves Proposal A and wants to defer Proposal B (cause we're tired) but likes the concept. (of https://github.com/JuliaLang/julia/issues/38713#issuecomment-1188977419)

bramtayl commented 1 year ago

That's cool! Did triage discuss syntax for reusing arguments like in #46916?

rapus95 commented 1 year ago

Triage approves Proposal A and wants to defer Proposal B (cause we're tired) but likes the concept. (of #38713 (comment))

Whose resort would this kind of code change/addition fall into? Aka, who can be asked for insights regarding an implementation?

That's cool! Did triage discuss syntax for reusing arguments like in #46916?

No, they/we didn't 🙈

gbaraldi commented 1 year ago

The person to ask here is @JeffBezanson because it would be parsing/lowering. Mayne @c42f as well.

aplavin commented 1 year ago

Readability and (un)intuitiveness of multiargument handling is pretty worrying even after reading arguments above... Both general concerns like "using the same symbol to mean two different variables", and specific potential usecases like filter(-> !ismissing(_) && isfinite(_), A) (plus more discussed above).

Are actual real life example (Base or packages) where either multiarg option improves clarity gathered somewhere? In the abstract, stuff like ->_[_] looks less readable than (A,i) -> A[i], while -> !ismissing(_) && isfinite(_) or -> _.num/_.denom seem unambiguous and intuitive.

Seelengrab commented 1 year ago

IIRC the general feel for "reusing input arguments" was that people really should use a named function or proper anonymous function instead. That function clearly has a special meaning, so giving it a name seems appropriate.

Previous Next

JuliaLang / julia

headless anonymous function (->) syntax #38713

Idea

Advantage(s)

Feasibility

Compatibility with #24990

Pipe issues

24990 alone helps to side-step this problem:

Parser instability

Comparison of Visual Penalties

What to measure and what not

Claims about incompatibilities between #24990, piping and this proposal

Pipe issues

24990 alone helps to side-step this problem:

Parser instability

24990 is ~perfect~ nice and this proposal breaks it

Proposal A: Introduce argument-less -> as a collecting fence/barrier/boundary for unbound underscores

Examples

Benefits

Proposal B: Introduce a new pipe-variant |>> as syntactic sugar with proper precedence and argument splatting included

Expansion Example

Example for mixed usage of all approaches & how it is evaluated

Benefits

Side Notes

Proposal A: Introduce argument-less `->` as a collecting fence/barrier/boundary for unbound underscores

Proposal B: Introduce a new pipe-variant `|>>` as syntactic sugar with proper precedence and argument splatting included