JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.42k stars 5.45k forks source link

headless anonymous function (->) syntax #38713

Open rapus95 opened 3 years ago

rapus95 commented 3 years ago

Since https://github.com/JuliaLang/julia/pull/24990 stalls on the question of what the right amount of tight capturing is

Idea

I want to propose a headless -> variant which has the same scoping mechanics as (args...)-> but automatically collects all not-yet-captured underscores into an argument list. EDIT: Nesting will follow the same rules as variable shadowing, that is, the underscore binds to the tightest headless -> it can find.

Before After
lfold((x,y)->x+2y, A) lfold(->_+2_,A)
lfold((x,y)->sin(x)-cos(y), A) lfold(->sin(_)-cos(_), A)
map(x->5x+2, A) map(->5_+2,A)
map(x->f(x.a), A) map(->f(_.a),A)

Advantage(s)

In small anonymous functions underscores as variables can increase the readability since they stand out a lot more than ordinary letters. For multiple argument cases like anonymous functions for reduce/lfold it can even save a decent amount of characters. Overall it reads very intuitively as start here and whatever arguments you get, just drop them into the slots from left to right

      -> ---| -----|
            V      V
lfold(->sin(_)-cos(_), A)

Sure, some more complex options like reordering ((x,y)->(y,x)), ellipsing ((x...)->x) and probably some other cases won't be possible but if everything would be possible in the headless variant we wouldn't have introduced the head in the first place.

Feasibility

1) Both, a leading -> and an _ as the right hand side (value side) error on 1.5 so that shouldn't be breaking. 2) Since it uses the well-defined scoping of the ordinary anonymous functions it should be easy to 2a) switch between both variants mentally 2b) reuse most of the current parser code and just extend it to collect/replace underscores

Compatibility with #24990

It shouldn't clash with the result of #24990 because that focuses more on ~tight single argument~ very tight argument cases. And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

Further Explorations

This proposal can optionally be combined with https://github.com/JuliaLang/julia/pull/53946. Additionally, the following links to comments further down explore different ideas to stretch into, all adding their own value to different parts of the ecosystem. Alternative explorations: https://github.com/JuliaLang/julia/issues/38713#issuecomment-1436118670 https://github.com/JuliaLang/julia/issues/38713#issuecomment-1188977419

aplavin commented 1 year ago

Does this argument (no pun intended :) ) works differently for multiarg lambdas? Maybe, only a single underscore should be allowed as a start?..

rapus95 commented 1 year ago

That's cool! Did triage discuss syntax for reusing arguments like in #46916?

I just now came to this conclusion: if you're fine with using _1 for first argument to reuse it, then _1->_1 + _1 is already very close to what you want! You only would save 2 characters again (though, writing length shouldn't be the metric anyway)

rapus95 commented 1 year ago

Does this argument (no pun intended :) ) works differently for multiarg lambdas? Maybe, only a single underscore should be allowed as a start?..

it just drops in the arguments in order of the holes, no matter how many holes you insert! (At least that's what I think about it)

aplavin commented 1 year ago

You only would save 2 characters again (though, writing length shouldn't be the metric anyway)

Sure, it isn't, that's why I'm looking from readability PoV and not the code length. And using the same _ to mean different args does read confusing, IMO.

it just drops in the arguments in order of the holes, no matter how many holes you insert! (At least that's what I think about it)

Is there any prior art on this in Julia? All macro packages that do similar stuff, reuse the single underscore for the same argument, AFAIK.

gbaraldi commented 1 year ago

It's that much about saving characters, but saving having to give them names.

bramtayl commented 1 year ago

I'd be happy enough with a rule like _ can be used once and only once, the least common denominator.

aplavin commented 1 year ago

Is there any prior art on this in Julia? All macro packages that do similar stuff, reuse the single underscore for the same argument, AFAIK.

Btw, this is also the case in (some?) other languages - single placeholder means a single thing. Eg https://reference.wolfram.com/language/ref/Slot.html.

c42f commented 1 year ago

who can be asked for insights regarding an implementation

If you wanted, you could start by implementing the parsing parts in JuliaSyntax.jl. And also implement a prototype of lowering there as part of JuliaSyntax's Expr conversion. (This isn't the correct place to do lowering, but it's enough to try out the syntax in the REPL and package code.) The following PR is an example of how to write such prototypes: https://github.com/JuliaLang/JuliaSyntax.jl/pull/148

This should be fairly easy and accessible to someone who knows Julia code, rather than hacking at the scheme parser and lowering which can be quite a learning curve. It will give a working prototype to play with which is good: It's one thing to write a proposal, but it's another thing to try it out in practice.

Once that's all working to your satisfaction you could implement the same thing in the scheme code? (Or, if we get JuliaSyntax.jl into Base shortly (:crossed_fingers:) we'd potentially use that implementation for the parser parts. Very much depends on stabilization timeline which I can't promise anything on right now, alas.)

Implementation notes:

Ok... honestly... writing the above description was most of the work in implementing a prototype so I just did it. Here it is:

https://github.com/JuliaLang/JuliaSyntax.jl/pull/199

It works:

julia> data = [(a=1,b=2), (a=3,b=4)]
2-element Vector{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}:
 (a = 1, b = 2)
 (a = 3, b = 4)

julia> filter(->_.a > 2, data)
1-element Vector{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}:
 (a = 3, b = 4)
c42f commented 1 year ago

I'd be happy enough with a rule like _ can be used once and only once, the least common denominator

Having written a data filtering example instinctively, I'd want _ to mean a single thing in that case.

So I think that's the big tension here (edit: it's not so bad). The tension between:

The following alternative proposal solves the latter issue https://github.com/JuliaLang/JuliaSyntax.jl/pull/148

c42f commented 1 year ago

By the way, an alternative to Proposal B is pipefirst/pipelast style operators, as prototyped in

https://github.com/JuliaLang/JuliaSyntax.jl/pull/148

rapus95 commented 1 year ago

I'd be happy enough with a rule like _ can be used once and only once, the least common denominator

Having written a data filtering example instinctively, I'd want _ to mean a single thing in that case.

So I think that's the big tension here. The tension between:

  • The desire to write predicates of a single data argument (a single row in a tabular data source)

for a single argument that's reused that means having

x->x.A + x.B * x.X
#vs
->_.A + _.B * _.X
#if _ was an ordinary variable it would look like this:
_->_.A + _.B * _.X

so it's only having _ vs x and dropping a single char which doesn't count as argument IMO (readability not writability)

and for reusing multiple arguments, it doesn't increase readability since I still would have to scan the entire function to see which argument appears where since it can be in an arbitrary order just as in ordinary lambdas. Any combination of reuse and multiple arguments is basically the same as an ordinary lambda just with different names and shorter write style. But it doesn't reduce complexity. (you can transform any ordinary lambda into that style)

The proposal I made above reduces the complexity and by that cannot represent all possible anonymous functions as it forbids reusing entirely!

From a certain point of view my proposal just completes the meaning of the underscore "name doesn't care" into a new direction. mind a function like that

(a, b, c, d) -> a^b + c*d
#applying _ in its meaning would make this:
(_, _, _, _) -> _^_ + _*_

and since we dropped all naming/referrability information as intended (that's the sole purpose of the underscore), the only remaining information to be used is positioning. And there the easiest approach is to go in order. And then the argument list by itself doesn't carry any information aside of the number of arguments which we can also infer from the body of the function. Thus, we drop the argument list.

I visualize it as "pick all arguments on caller side and drop them one after another into the holes of the callee". As the underscore means "we won't refer to it again", using a single underscore in multiple places to refer to the same thing heavily goes against this meaning of not referring to the same again. And in general, being able to assign values to names exists for being able to refer to the same thing in a later position, possibly multiple times! So we've got assignment (aka names for arguments) for exactly that purpose of wanting to reuse things.

rapus95 commented 1 year ago

But to be clear here, I also would like to have a simple way to operate on transforms of a single object. Maybe I can come up with some clever objects, that give the intended results when combined with my proposal. But then only for reusing a single variable, never reusing multiple arguments! For that we have ordinary lambdas 😄 I'll let it ponder for a bit 😊

rapus95 commented 1 year ago

as a first shot, switching to column based solves the problem in many cases:

filter(row->row.a + row.b > 4, data)
#becomes
filter(->_+_>4, data.a, data.b)

this also is in line with extracting the mathematical "action" (here: checking if a sum is lower than 4) note: not sure if filter supports multiple data arguments but I see that you want to drop the duplication of the data here I'll ponder it further

rapus95 commented 1 year ago

In case of DataFrames.jl it would work already:

subset(df, Cols(:A, :B)=>ByRow(->_+_>4))
filter([:A, :B]=> ->_+_>4, df)

tbh to me that looks very concise and idiomatic!

rapus95 commented 1 year ago

something along the lines of

filter(Splat(->_.a+_.b>4), nzip(data, 2))

would also work conceptionally but for that to be less cumbersome than an ordinary lambda, there's a long way to go...

I like the following concept:

filter(Mirror(->_.a+_.b>4), data)  

even though I don't exactly know right now, how we'd be able to make calling the Mirror object call the headless lambda with the right number of arguments

c42f commented 1 year ago

The proposal does work nicely with the existing DataFrames idioms for columns :+1: This is good!

(Side note: To be honest I've never been 100% satisfied with the DataFrames APIs for selecting columns. I do kind of wish for something more along the lines of SplitApplyCombine.jl because the tooling in there feels more composable. But DataFrames is a lot more expressive within its domain, and it's hard to argue with just how usable and flexible it is there.)

I think I need to read back over this whole thread more carefully. The case with a single left hand side argument is probably not really be a problem as a normal lambda with a single char variable will do. (I was thinking back to proposals where the -> isn't necessary, and there you do save a few more characters.)

It would be interesting to trawl the open source julia packages on github, digest the syntax from all of them, and test this syntax proposal more widely. There's some tooling in JuliaSyntax.jl for downloading all of General and parsing it - it would be easy to repurpose that to look for uses of lambads. Perhaps we could extend that to code not within General as well. I feel that released packages might be statistically rather different from end-user code, in terms of how they use lambda syntax. For example, data cleaning code which uses DataFrames is largely going to be random end-user scripts, not packages.

One thing about implementing only proposal A (and leaving B to later) is that I feel that concise lambda syntax isn't entirely independent from the "what to do about piping" issue. These are so commonly needed in some combination that I feel like they need to be considered together. Even though they can be used separately. pipefirst/pipelast operators are quite an interesting point in the design space for the piping part (https://github.com/JuliaLang/JuliaSyntax.jl/pull/148)

aplavin commented 1 year ago

so it's only having _ vs x and dropping a single char which doesn't count as argument IMO (readability not writability)

It does help readability in practice. With underscores being very "local" by their nature, it's immediately visible that the lambda is self-contained. Meanwhile, x -> x.A + x.B * x.C requires carefully reading the expression to see that only x is used there, and not some y from the outer scope.

I visualize it as "pick all arguments on caller side and drop them one after another into the holes of the callee".

I don't think this is familiar/intuitive for the majority of users. At least it was very unexpected for me: first reading it, I thought I don't understand something, couldn't believe that a single symbol is taken to mean different variables in the same context :)

More objectively: All (?) existing Julia packages that use underscore in similar contexts reuse it to mean the same argument multiple times! A non-exhaustive list of such packages: Accessors, Chain, DataPipes, Hose, Pipe, Underscores. Also, the same behavior is used in some other languages, such as Mathematica. This seems a nice path to follow :)

This exactly means "I don't care about the name, so call it _, but remain consistent, and it means the same in the whole expression".

And again, maybe limit the implementation to the totally unambigous case first, and see how it actually goes in practice? Single underscore, used only once. This already helps many of the simplest cases.

aplavin commented 1 year ago

as a first shot, switching to column based solves the problem in many cases:

filter(row->row.a + row.b > 4, data)

becomes

filter(->+>4, data.a, data.b)

Is there any actual "table" type where this works, or could conceivably work? It's not clear at all what multi-arg filter can mean. Meanwhile, filter(-> _.a + _.b > 4, data) works for many tables immediately, including base Julia vectors, stuff like StructArrays or TypedTables, and even DataFrames.

(a, b, c, d) -> a^b + c*d

applying _ in its meaning would make this:

(_, _, _, _) -> _^_ + _*_

Sure, four-arg lambdas where arguments are used in order become simpler here. But: maybe, if there are that many arguments, give them actual names?

tpapp commented 1 year ago

I agree with @aplavin above. I think it would be best to keep this syntax excessively simple, because for complex cases we have the fully general alternatives ((args...) -> ... , do blocks, or heaven forbid, defining and actual function) already.

The single-argument version (ie headless -> allows you to use _, all instances of which refer to the same and only argument) already covers a lot of use cases and has very simple semantics. This would handle stuff like

_ version equivalent
filter(-> _.a > 2 && _.b == 3, itr) filter(x -> x.a > 2 && x.b == 3, itr)
map(-> _[4], itr) map(x -> x[4], itr)
sum(-> calculate(a, b; _.c, _.d), rows) sum(row -> calculate(a, b; row.c, row.d), rows)

For (a,b,c,d)->(x->a+b*x+c*x^2+d*x^3) and (x,y)->sqrt(x^2+y^2)

bramtayl commented 1 year ago

Writing the above description was most of the work in implementing a prototype so I just did it.

Yay!

Headless -> allows you to use _

What if you don't use any _? Might seem trivial but could be important if it gets extended to chaining:

-1 |> abs

JeffBezanson commented 1 year ago

The single-argument version (ie headless -> allows you to use _, all instances of which refer to the same and only argument) already covers a lot of use cases and has very simple semantics.

This is a good point actually; 1-argument functions are by far the most common and useful in cases like this. There are some nice examples for 2 arguments, but once you get to 3 or 4 arguments I would argue it's more useful to reference a single argument 3 or 4 times. It also has the nice properties (1) same symbol refers to the same thing (2) parse tree reordering doesn't matter. After all in Real Functional Programming™ functions all have one argument anyway.

rapus95 commented 1 year ago

Furthermore, I'll call the proposals "individual holes" resp. "referencing holes" for having the underscores represent individual values resp. referring to the same value.

(Side note: To be honest I've never been 100% satisfied with the DataFrames APIs for selecting columns. I do kind of wish for something more along the lines of SplitApplyCombine.jl because the tooling in there feels more composable. But DataFrames is a lot more expressive within its domain, and it's hard to argue with just how usable and flexible it is there.)

Can you make an example on how you use SplitApplyCombine.jl? Then I can have a look at the interactions with this proposal.

With underscores being very "local" by their nature, it's immediately visible that the lambda is self-contained. Meanwhile, x -> x.A + x.B * x.C requires carefully reading the expression to see that only x is used there, and not some y from the outer scope.

You'll always need a careful look on what's within the ->'s scope. But for the individual holes, it's as easy as reading from left to right. For a single argument lambda, it's also as easy as reading from left to right. You're just spotting x's instead of _'s.

I thought I don't understand something, couldn't believe that a single symbol is taken to mean different variables in the same context

Having it mean the same everywhere makes it behave like an ordinary variable whose naming was skipped in _->. I feel like we lose options going that route. Because for more than one argument that's reused, I still think, ordinary lambdas would be better since order on caller side is entirely unrelated to order on callee side.

But yes I see the "need" for something like the referencing holes.

I'll soon make a more theory-focused comment on why I would choose one over the other/what I'm missing in your ideas that hopefully clarifies intentions behind each of them.

What if you don't use any _?

This should create a 0-arg lambda that behaves like Returns(val)

t also has the nice properties (1) same symbol refers to the same thing

but that refers to ordinary variable behavior, while underscores are meant to be special cased in their usage.

JeffBezanson commented 1 year ago

This should create a 0-arg lambda that behaves like Returns(val)

Except, presumably, that the computation is deferred?!

rapus95 commented 1 year ago

This should create a 0-arg lambda that behaves like Returns(val)

Except, presumably, that the computation is deferred?!

I guess to the extent of what the compiler infers as "pure"/precomputable, so yes, deferred aside of optimization. In that sense it pushes use of closures (anonymous functions using implicit state) -> might lead to more boxing issues

JeffBezanson commented 1 year ago

but that refers to ordinary variable behavior, while underscores are meant to be special cased in their usage.

Underscore being somewhat different from a normal variable doesn't mean it's better for it to be as different as possible. The same visual thing changing meaning every time it occurs would be deliberately going against properties that are generally considered good.

Also I would argue the 0-argument case is not so important since (1) it's much rarer, (2) the syntax ()->... already spares you needing to come up with a variable name.

rapus95 commented 1 year ago

Dataflow

We've got 4 variants of dataflow:

  1. one->one 1a) direct transform 1b) extract/split and combine (hidden 1->many->1, mixture of 2 and 3)
  2. one->many: extract (and transform) data into multiple things
  3. many->one: combine/merge multiple data into one
  4. many->many: intertwined transformation of multiple data

Note: DataFrames.jl's API (forces to/) splits up 1b cases into a combination of 2 and 3 since transformations are passed as [columns]->transform which describes 2nd variant on left- and 3rd variant on right-hand side.

Proposals analyzed

Proposal A (individual holes) referencing spots/single argument single hole
concept each occurence refers to its own value on caller side strictly in order each occurrence references the same single argument on caller side there's at max one occurence for at max one argument on the caller side
supports dataflow variants focuses 3
allows: 1a, 3
focuses 1b
allows 1a, 1b, 2
focuses 1a
allows: 1a
benefits/drawbacks pushes to explicit, more separated extraction
many functions of higher order don't support multiple arguments for the data(->requires zip/splat)
leans towards mapping of column-based data
works well with DataFrames.jl
pushes to implicit extraction
combines many steps (split, apply, combine) into a single statement (often single-line)
leans towards mapping row-based data
complements DataFrames.jl/acts as an alternative approach
simplicity of scope
escalation/worst case constant need for transforming row- to column-based data in longer pipelines (f.e. through Splat/Zip interaction) named-tuple pipelines which make arbitrarily intertwined transformations in each step ?

UPDATE

further down in this issue, there's the idea to enable both proposals with a slightly different and more individually tailored syntax:

-> _ + _ #Proposal A, stands for (a,b)->a+b
#and
-> $a + $b #referencing spots, stands for x->x.a+x.b

->$a
#to me feels even more straight to the point and better serving the intention than
->_.a

This brings a solution to #22710 and to #24990 by preceding the special symbols with some explicit function indicator (the headless ->) that explicitly binds these symbols locally to the argument(s).

That way, we have both variants available and tailored to their respective use cases, focusing on what's needed.

The current direction reminds me of the missing/nothing situation, where the best and by itself very good solution was not to try to bunch multiple approaches into the same thing.

So the new proposal would be to add a new headless lambda syntax, that has tailored meanings for $ and _ within its own scope. This feels like multiple dispatch on a syntactical level! Shared meaning, new specializations!

Transforming between proposals

Proposal A (individual holes) -> referencing spots/single argument

on caller side: put all arguments into a named tuple on callee side: extract using these names explanation: in some sense it emulates naming of variables. Regarding dataflow, it manually merges data (2) first, thus it's now variant 1b and syntax can be used.

referencing spots/single argument -> Proposal A (individual holes)

on caller side: pre-extract needed data into multiple arguments (repeat as needed) on callee side: drop extractors/accessors explanation: forces manual extraction outside the lambda for all data needed. Regarding dataflow, it enforces splitting cases of 1b into 2&3, so syntax can be used for variant 3.

Further thoughts

Note: I may update this post as needed based on comments

JeffBezanson commented 1 year ago

I believe being able to reference an argument more than once has strictly higher expressiveness than being able to accept multiple arguments. To compare, see what it takes for each one to emulate the other:

  1. ->_[1] + _[2] # 2 arguments in terms of one argument. generalizes easily to e.g. ->_[1] + _[3]
  2. ->(x=_; x.a + x.b) # multiple reference. splat(->_ + _) works in the special case where you want to access components 1 and 2

IOW, the only way to recover multiple reference in general is to introduce a name anyway, giving even longer syntax than we have now. So on reflection I think I'd rather have multiple reference.

rapus95 commented 1 year ago

If we had ->$name as the syntax for an accessor for a single argument (quite old idea that never made it to life because the proposal was to use $namealone IIRC, #22710) that's fed into the lambda, then this would complement current Proposal A IMO quite well and directly serves the intended purpose of the referencing hole proposal which is about extraction from a single variable. It would allow for example: ->$a+$b. It can't get more expressive IMO.

It also is an IMO logical abstraction of the current $ usage. $ is used for interpolation, it extracts values from an outer state. The $name stands for outerstate.name. if used within a headless -> then this could simply bind the interpolation to the argument just as it could bind underscores to the arguments.

In a headless lambda, $name extracts onlyarg.name.

using $ for direct access/extractions is also well-known from other languages afaik (IIRC it was even said before as an argument to use _ to reference the same thing multiple times)

both approaches would benefit each other quite well IMO since they are tailored to different situations (1b/2 vs 3).

So maybe, is it possible, that we have a similar case as for nothing/missing, where the best solution is providing two different approaches tailored to different entirely valid situations instead of trying to bunch multiple things into one and make them contest each other as "the right approach"? So the general meaning of the headless -> would become to appropriately bind special symbols like $ and _ to its argument(s)

From a certain point of view #22710 and #24990 are quite similar. They tried to create functions out of a syntax that didn't hint a function. Both were rejected. But having a headless -> as function indicator that makes those symbols locally bound/operating on arguments, we have the missing piece. Sure, with an extra ->-typing-effort but that's a fair price for clarity/readability.

To me, ->$a really feels more expressive and concise than x->x.a and ->_.a. (and it even is shorter)

rapus95 commented 1 year ago
_ version $ version equivalent
none more concise filter(-> $a > 2 && $b == 3, itr) filter(x -> x.a > 2 && x.b == 3, itr)
map(-> _[4], itr) (maybe map(-> $[4], itr))* map(x -> x[4], itr)
none more concise sum(-> calculate(a, b; $c, $d), rows) sum(row -> calculate(a, b; row.c, row.d), rows)

* it would make sense as the accessor that resolves to getindex(singlearg, 4)

This would make the rule as follows: ->_ + _ stands for arg1 + arg2 ->$a stands for getproperty(onlyarg, :a) ->$[a] stands for getindex(onlyarg, a) quote $a end stands for the currently available interpolation of the outer scope Optionally extending/"extrapolating" this further $a could stand for getglobal(:a) which basically is (->$a)(globalstate) (this would then provide a syntax for the new suggestion to access globals via getglobal and make global usage more obvious)

JeffBezanson commented 1 year ago

I think further overloading $ like that is questionable. If the multi-arg underscore design needs to be fixed up by adding that, it's a significant weakness. It doesn't bother me that _.a is one character longer than $a since it is clearer and doesn't have quote capturing issues.

rapus95 commented 1 year ago

I think further overloading $ like that is questionable. If the multi-arg underscore design needs to be fixed up by adding that, it's a significant weakness. It doesn't bother me that _.a is one character longer than $a since it is clearer and doesn't have quote capturing issues.

I wouldn't call it fixing, since it serves its own purpose in an overall concept that consists of a newly designed headless lambda syntax with tailored meanings for $x and _. And multiple dispatch, being the core concept of Julia, basically means to define a thing in multiple ways that share a meaning, not an implementation, doesn't it? That's certainly the case for $ when captured by a ->. Basically it's something very natural to have names acting relative to the scope, they're used in. It's like using globals and locals. No one should be surprised that -> creates a local scope since that is what it's meant to do... And we already have localized behavior for $ since it acts differently in certain code blocks and macros while trying to preserve a shared meaning... So this doesn't add a new meaning, but a new specialization to a place where it was unused before (since that place didn't even exist). We sort of added a new implementation to an existing concept.

JeffBezanson commented 1 year ago

This is not the same as multiple dispatch, since it is a scoping and surface syntax issue. Looking at a + b, a and b might be any types of numbers but you don't have to know which to understand what it does. But in quote ->$a end the question is whether arg.a is evaluated at run time or a is evaluated at macro expansion time, which are not the same meaning. It's even more confusing because in the proposed $a syntax, a is not evaluated but is a quoted symbol. And if we had $[i] as well, by analogy you would think $.a would work. For me adding $ sends the fairly simple -> and _ proposal off the rails.

Yes, local scopes can override variables from outer scopes. But unfortunately $ is not a normal scoped identifier and does not work like that. E.g. you can't do something like let ($) = foo and disable $-interpolation inside that block. For example

macro m(x); QuoteNode(x); end
x = 42
:(@m $x)  # result: :(@m 42)

(The point is that you might expect the macro to see the $x argument syntax, but the outer quote "captures" it.)

And we already have localized behavior for $ since it acts differently in certain code blocks

Example?

rapus95 commented 1 year ago

This is not the same as multiple dispatch, since it is a scoping and surface syntax issue. Looking at a + b, a and b might be any types of numbers but you don't have to know which to understand what it does.

I see, that I didn't mean multiple dispatch but meant shadowing/capturing which should be a known concept to folks.

But in quote ->$a end the question is whether arg.a is evaluated at run time or a is evaluated at macro expansion time, which are not the same meaning.

Since it's ALWAYS possible not to use a headless lambda in meta-programming (easiest to stick to ordinary lambdas there, if you don't want to learn the resolution order), that's not a showstopper for me. There simply isn't a better solution than to say, if you code in a way where your expressions shadow each other or raise the question of capture order, you have to learn the mechanics of how it's resolved, or you're most probably screwed.

If we don't want to just forbid headless lambdas in quote blocks and macro calls (which still would be a sensible approach IMO) I'd probably say to have the headless lambda follow the nesting level for capturing $. Even in quote blocks. Because that way is the easiest to evaluate visually and doesn't need more structure than nesting level to be resolved. But I'm also fine with not allowing it in quote blocks at all and having macros see the resolved anonymous function.

If you have a concrete example where being able to nest this proposal in a quote block is very important, I'm happy to think further about how I'd progress there! But I can't imagine them appear in ordinary data transforming pipelines for the user.

It's even more confusing because in the proposed $a syntax, a is not evaluated but is a quoted symbol.

I don't find that confusing if it's communicated that $a is meant to access a of the argument (aka expanding to onlyarg.a). To me, it's exactly as confusing as onlyarg.a having the a appear quoted (getproperty(onlyarg, :a)). Interpolation always represents the value behind a quoted name in some scope. How is this different? In quote block it accesses the value behind the name a of the outer scope, here it accesses the value behind the name a of the argument onlyarg.

And if we had $[i] as well, by analogy you would think $.a would work.

Following the previous, if $a accesses a of the argument, $[i] accessing [i] of the argument (aka expanding to onlyarg[i]) comes quite naturally to me. At least as naturally as I wouldn't infer onlyarg.[i] from onlyarg.a as the correct way of indexing. There are 2 ways to extract: via name (->getproperty) and via index (->getindex), if you append a name, it does the former, if you append indexing, it does the latter.

Yes, local scopes can override variables from outer scopes. But unfortunately $ is not a normal scoped identifier and does not work like that. E.g. you can't do something like let ($) = foo and disable $-interpolation inside that block. For example

macro m(x); QuoteNode(x); end
x = 42
:(@m $x)  # result: :(@m 42)

(The point is that you might expect the macro to see the $x argument syntax, but the outer quote "captures" it.)

That's a good example for how "if you code in a way where syntax shadows each other, understand how it works, or you might get screwed." is already part of the language. More generally speaking, I wouldn't fine-tune this feature for meta-programming because whenever you're explicitly writing lambdas for higher order functions (where this is tailored to), you most probably won't generate them through meta-programming. It's just a way too small fraction of cases.

tpapp commented 1 year ago

I apologize for sounding like a broken record, but again: coming up with a complicated DSL full of special cases for writing anonymous functions is not something we should contemplate, since we already have various syntaxes for anonymous functions.

My impression is that people have two problems with the current syntax:

  1. verbosity, ie x -> x[i] seems excessive, 7+2 (space) characters. So maybe we can shave off 1+1 with -> _[i], which is kind of nice, if one is into these things.

  2. having to name arguments which play no semantic role. There are conventions for this (eg x), but I can see the appeal.

However, the less trivial your anonymous functions are, the less important these things become. Shaving of a few chars of a function that takes up half a line is no longer that important. And if your function has 2-3 arguments, maybe you should start naming them (sure, there are trivial examples when you don't want to, but generally it is better style).

We should be focusing on a syntax that is very simple, makes 90% of the trivial applications for -> a bit simpler, with the understanding that users can (and should) just fall back to the latter when necessary.

[Personally I see no reason to add anything to Julia, I am happy with the current syntax, but people have been asking for this and the one-argument headless option is something that fits nicely into Julia and presumably does not cause any major headaches.]

MasonProtter commented 1 year ago

Agreed, turning this into a complicated DSL with complex unpredictable behaviours one has to memorize would be a bad outcome

I must say though, I find it very frustrating that this syntax has only one real reason to even be considered: That macros bind less tightly than commas. If it wasn't for https://github.com/JuliaLang/julia/issues/36547, then we could just simply write @_ or whatever instead of -> and not need to modify the parser and lowering at all.

aplavin commented 1 year ago

->$a

to me feels even more straight to the point and better serving the intention than

->_.a

This basically proposes $ instead of _ as the single argument placeholder, right? It's a perfectly reasonable suggestion, after all neither $ nor _ carry any meaning by themselves as characters. However, I believe there are strong arguments to prefer _ here.

Having it mean the same everywhere makes it behave like an ordinary variable whose naming was skipped in _->.

And... that's a good thing! :)

I feel like we lose options going that route. Because for more than one argument that's reused, I still think, ordinary lambdas would be better since order on caller side is entirely unrelated to order on callee side.

Sure, if a lambda has multiple arguments and reuses them, better give them some names...

clarkevans commented 1 year ago

Thomas and Mason highlight an excellent policy. For every line of code written, it is read at least a dozen times. Hence, any improvement must necessarily make reading comprehension of an expression easier, not harder by adding more semantic complexity. I don't mind providing special meaning to the underscore, if it's got a simple interpretation -- it's a parameter for a single argument. The other proposals here seem to require too much of my mental space.

clarkevans commented 1 year ago

What if we use Unicode open circle characters, e.g. , ..., ? We could drop -> in this case so that means (args...) -> args[1] and ①+② would mean (args...) -> args[1]+args[2], etc. To make writing convenient, \1TAB would insert , etc. I'm less fond of providing special meaning to _, since this character is often used for arguments that are ignored, e.g. (_,x,_) -> x. Alas, I know how many are resistant to using Unicode as part of Julia's syntax. Arguing against this proposal myself, there are documentation conventions that use black circled numbers (e.g. ❶,..., ⓴) as footnote labels; although the open circle is light enough as to not draw as much visual attention.

expression interpretation?
filter(① > 2 && ② == 3, itr...) filter(_ -> _[1] > 2 && _[2] == 3, itr)
map(④, itr...) map(_ -> _[4], itr)
sum(calculate(a, b; ②, ④), rows...) sum(_ -> calculate(a, b; _[2], _[4]), rows)

It seems even this "simple" proposal is less than trivial. To follow Aaron's test cases, the iterators would have to be expanded into tuples. Moreover, by getting rid of -> there's no clear boundary where the anonymous function goes; there is probably no sane rule that would provide the last interpretation. I guess this proposal also entails too much magic. Easier to read, but even harder to comprehend. It was fun to consider, though. I'll join the downvote ;)

rapus95 commented 1 year ago

As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description:

Added a new syntax for lambdas for which the argument list is skipped. It is tailored to different situations: 1) Accessing/extracting data from a single (first and only) argument Within a headless lambda that gets exactly one argument (onlyarg), interpolation syntax $prop refers to onlyarg.prop and $[i] refers to onlyarg[i]. Both access the first and only argument of the lambda. (solves 22710 and part of 24990) Examples:

further examples:

tpapp commented 1 year ago

I'm missing core aspects on what makes the proposal difficult to understand for the user.

From my perspective it is not that the proposal is difficult to understand per se. Julia is a powerful language, which comes with a certain amount of complexity, and users manage that just fine.

I think the key issue is the gain in function vs the added complexity, and tradeoffs between various alternatives (the multislot and the single argument versions are of course mutually exclusive).

Also, I think that a _ stands out more visually than a $.

(Incidentally, I find it confusing to switch syntaxes in the middle of a proposal like this.)

MasonProtter commented 1 year ago

->$a

to me feels even more straight to the point and better serving the intention than

->_.a

This basically proposes $ instead of _ as the single argument placeholder, right?

@aplavin no, if you look at the code you quoted they are suggesting that -> $a means x -> x.a, not that $ is used as an alternative for _.


As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description:

@rapus95 I have no problem personally, with adding more handy syntax because I've already learned our current syntax, so this is just a small bite sized addition for me to learn. However, that's not the case for everyone, specifically new users.

I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users.

The more special syntax we have, the less willing we should be to add more special syntax on top of it.

MasonProtter commented 1 year ago

Actually, I just realized that we don't really need to solve https://github.com/JuliaLang/julia/issues/36547, and we can actually just replace thist -> syntax with a macro pretty trivially. The key is just to slurp up and then spit out extra arguments that might end up in the macro.

using MacroTools

@eval macro $(:_)(ex)
    @gensym x
    if ex isa Expr && ex.head == :tuple
        pre_body, rest... = ex.args
    else
        pre_body = ex
        rest = ()
    end
    body = MacroTools.postwalk(pre_body) do ex
        ex == :_ ? x : ex
    end
    λ = :($x -> $body)
    if length(rest) == 0
        esc(λ)
    else
        esc(:(($λ, $(rest...))...))
    end
end

Behold:

julia> map(@_ _[1], [[1,2,3], [4,5,6]])
2-element Vector{Int64}:
 1
 4

No parser changes required.

bramtayl commented 1 year ago

Macros kind of work "outside in" but parsing kind of works "inside out". In this case, if this was done by the parser, _ would "find" the innermost -> to "attach" to. And it would be good for this syntax to be nestable, so people could do: df |> filter(-> _.a .> 1, _). There are messy ways to work around this with a macro, but especially since this is such a commonly demanded feature, I think parser support is the way to go.

MasonProtter commented 1 year ago

Parsing also definitely works "outside in", and has to take the same care that a macro would have to take to attach _ to the right fence.

bramtayl commented 1 year ago

Hmm, maybe I meant symbol resolution works inside out?

MasonProtter commented 1 year ago

This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something.

JeffBezanson commented 1 year ago

I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users.

:100: I would go farther though. Less noise is better for everybody, not just somebody in their first week of learning Julia.

bramtayl commented 1 year ago

This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something.

Ok, but what if someone else writes a new macro that uses _ and they don't play well together?

MasonProtter commented 1 year ago

Someone could also quite easily write a macro that doesn't play well with this PR in the same way.

MasonProtter commented 1 year ago

Okay, I've made https://github.com/MasonProtter/SimpleUnderscores.jl, @bramtayl or anyone else interested in this syntax please feel free to poke around with it and see if it fails in any obvious ways.