Open rapus95 opened 3 years ago
Does this argument (no pun intended :) ) works differently for multiarg lambdas? Maybe, only a single underscore should be allowed as a start?..
That's cool! Did triage discuss syntax for reusing arguments like in #46916?
I just now came to this conclusion:
if you're fine with using _1
for first argument to reuse it, then _1->_1 + _1
is already very close to what you want! You only would save 2 characters again (though, writing length shouldn't be the metric anyway)
Does this argument (no pun intended :) ) works differently for multiarg lambdas? Maybe, only a single underscore should be allowed as a start?..
it just drops in the arguments in order of the holes, no matter how many holes you insert! (At least that's what I think about it)
You only would save 2 characters again (though, writing length shouldn't be the metric anyway)
Sure, it isn't, that's why I'm looking from readability PoV and not the code length. And using the same _
to mean different args does read confusing, IMO.
it just drops in the arguments in order of the holes, no matter how many holes you insert! (At least that's what I think about it)
Is there any prior art on this in Julia? All macro packages that do similar stuff, reuse the single underscore for the same argument, AFAIK.
It's that much about saving characters, but saving having to give them names.
I'd be happy enough with a rule like _
can be used once and only once, the least common denominator.
Is there any prior art on this in Julia? All macro packages that do similar stuff, reuse the single underscore for the same argument, AFAIK.
Btw, this is also the case in (some?) other languages - single placeholder means a single thing. Eg https://reference.wolfram.com/language/ref/Slot.html.
who can be asked for insights regarding an implementation
If you wanted, you could start by implementing the parsing parts in JuliaSyntax.jl. And also implement a prototype of lowering there as part of JuliaSyntax's Expr
conversion. (This isn't the correct place to do lowering, but it's enough to try out the syntax in the REPL and package code.) The following PR is an example of how to write such prototypes: https://github.com/JuliaLang/JuliaSyntax.jl/pull/148
This should be fairly easy and accessible to someone who knows Julia code, rather than hacking at the scheme parser and lowering which can be quite a learning curve. It will give a working prototype to play with which is good: It's one thing to write a proposal, but it's another thing to try it out in practice.
Once that's all working to your satisfaction you could implement the same thing in the scheme code? (Or, if we get JuliaSyntax.jl into Base shortly (:crossed_fingers:) we'd potentially use that implementation for the parser parts. Very much depends on stabilization timeline which I can't promise anything on right now, alas.)
Implementation notes:
->
is parsed here: https://github.com/JuliaLang/JuliaSyntax.jl/blob/764597acc87265c9c96bc5fe52a0ce11a94c6d4a/src/parser.jl#L1389->
binds very tightly on the left (precedence above ^
) and very loosely on the left (precedence below =
!). So it's got "weird precedence" :-)_
with strict left-to-right ordering of anon function arguments requires care because Expr
does not preserve source order in general. For example, in f(_+x; a=_+y)
, the y
occurs before the x
if you traverse the Expr
depth first in the order of the Expr
args
list.Expr
representation for prefix ->
is just Expr(:->, rhs)
which is consistent with other unary syntactic ops like <: x
. I think this would be fine.Ok... honestly... writing the above description was most of the work in implementing a prototype so I just did it. Here it is:
https://github.com/JuliaLang/JuliaSyntax.jl/pull/199
It works:
julia> data = [(a=1,b=2), (a=3,b=4)]
2-element Vector{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}:
(a = 1, b = 2)
(a = 3, b = 4)
julia> filter(->_.a > 2, data)
1-element Vector{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}:
(a = 3, b = 4)
I'd be happy enough with a rule like _ can be used once and only once, the least common denominator
Having written a data filtering example instinctively, I'd want _
to mean a single thing in that case.
So I think that's the big tension here (edit: it's not so bad). The tension between:
map
The following alternative proposal solves the latter issue https://github.com/JuliaLang/JuliaSyntax.jl/pull/148
By the way, an alternative to Proposal B is pipefirst/pipelast style operators, as prototyped in
I'd be happy enough with a rule like _ can be used once and only once, the least common denominator
Having written a data filtering example instinctively, I'd want
_
to mean a single thing in that case.So I think that's the big tension here. The tension between:
- The desire to write predicates of a single data argument (a single row in a tabular data source)
for a single argument that's reused that means having
x->x.A + x.B * x.X
#vs
->_.A + _.B * _.X
#if _ was an ordinary variable it would look like this:
_->_.A + _.B * _.X
so it's only having _ vs x and dropping a single char which doesn't count as argument IMO (readability not writability)
and for reusing multiple arguments, it doesn't increase readability since I still would have to scan the entire function to see which argument appears where since it can be in an arbitrary order just as in ordinary lambdas. Any combination of reuse and multiple arguments is basically the same as an ordinary lambda just with different names and shorter write style. But it doesn't reduce complexity. (you can transform any ordinary lambda into that style)
The proposal I made above reduces the complexity and by that cannot represent all possible anonymous functions as it forbids reusing entirely!
From a certain point of view my proposal just completes the meaning of the underscore "name doesn't care" into a new direction. mind a function like that
(a, b, c, d) -> a^b + c*d
#applying _ in its meaning would make this:
(_, _, _, _) -> _^_ + _*_
and since we dropped all naming/referrability information as intended (that's the sole purpose of the underscore), the only remaining information to be used is positioning. And there the easiest approach is to go in order. And then the argument list by itself doesn't carry any information aside of the number of arguments which we can also infer from the body of the function. Thus, we drop the argument list.
I visualize it as "pick all arguments on caller side and drop them one after another into the holes of the callee". As the underscore means "we won't refer to it again", using a single underscore in multiple places to refer to the same thing heavily goes against this meaning of not referring to the same again. And in general, being able to assign values to names exists for being able to refer to the same thing in a later position, possibly multiple times! So we've got assignment (aka names for arguments) for exactly that purpose of wanting to reuse things.
But to be clear here, I also would like to have a simple way to operate on transforms of a single object. Maybe I can come up with some clever objects, that give the intended results when combined with my proposal. But then only for reusing a single variable, never reusing multiple arguments! For that we have ordinary lambdas 😄 I'll let it ponder for a bit 😊
as a first shot, switching to column based solves the problem in many cases:
filter(row->row.a + row.b > 4, data)
#becomes
filter(->_+_>4, data.a, data.b)
this also is in line with extracting the mathematical "action" (here: checking if a sum is lower than 4)
note: not sure if filter supports multiple data arguments
but I see that you want to drop the duplication of the data
here I'll ponder it further
In case of DataFrames.jl it would work already:
subset(df, Cols(:A, :B)=>ByRow(->_+_>4))
filter([:A, :B]=> ->_+_>4, df)
tbh to me that looks very concise and idiomatic!
something along the lines of
filter(Splat(->_.a+_.b>4), nzip(data, 2))
would also work conceptionally but for that to be less cumbersome than an ordinary lambda, there's a long way to go...
I like the following concept:
filter(Mirror(->_.a+_.b>4), data)
even though I don't exactly know right now, how we'd be able to make calling the Mirror object call the headless lambda with the right number of arguments
The proposal does work nicely with the existing DataFrames
idioms for columns :+1: This is good!
(Side note: To be honest I've never been 100% satisfied with the DataFrames APIs for selecting columns. I do kind of wish for something more along the lines of SplitApplyCombine.jl because the tooling in there feels more composable. But DataFrames is a lot more expressive within its domain, and it's hard to argue with just how usable and flexible it is there.)
I think I need to read back over this whole thread more carefully. The case with a single left hand side argument is probably not really be a problem as a normal lambda with a single char variable will do. (I was thinking back to proposals where the ->
isn't necessary, and there you do save a few more characters.)
It would be interesting to trawl the open source julia packages on github, digest the syntax from all of them, and test this syntax proposal more widely. There's some tooling in JuliaSyntax.jl for downloading all of General and parsing it - it would be easy to repurpose that to look for uses of lambads. Perhaps we could extend that to code not within General as well. I feel that released packages might be statistically rather different from end-user code, in terms of how they use lambda syntax. For example, data cleaning code which uses DataFrames is largely going to be random end-user scripts, not packages.
One thing about implementing only proposal A (and leaving B to later) is that I feel that concise lambda syntax isn't entirely independent from the "what to do about piping" issue. These are so commonly needed in some combination that I feel like they need to be considered together. Even though they can be used separately. pipefirst/pipelast operators are quite an interesting point in the design space for the piping part (https://github.com/JuliaLang/JuliaSyntax.jl/pull/148)
so it's only having _ vs x and dropping a single char which doesn't count as argument IMO (readability not writability)
It does help readability in practice. With underscores being very "local" by their nature, it's immediately visible that the lambda is self-contained. Meanwhile, x -> x.A + x.B * x.C
requires carefully reading the expression to see that only x
is used there, and not some y
from the outer scope.
I visualize it as "pick all arguments on caller side and drop them one after another into the holes of the callee".
I don't think this is familiar/intuitive for the majority of users. At least it was very unexpected for me: first reading it, I thought I don't understand something, couldn't believe that a single symbol is taken to mean different variables in the same context :)
More objectively:
All (?) existing Julia packages that use underscore in similar contexts reuse it to mean the same argument multiple times! A non-exhaustive list of such packages: Accessors
, Chain
, DataPipes
, Hose
, Pipe
, Underscores
.
Also, the same behavior is used in some other languages, such as Mathematica. This seems a nice path to follow :)
This exactly means "I don't care about the name, so call it _
, but remain consistent, and it means the same in the whole expression".
And again, maybe limit the implementation to the totally unambigous case first, and see how it actually goes in practice? Single underscore, used only once. This already helps many of the simplest cases.
as a first shot, switching to column based solves the problem in many cases:
filter(row->row.a + row.b > 4, data)
becomes
filter(->+>4, data.a, data.b)
Is there any actual "table" type where this works, or could conceivably work? It's not clear at all what multi-arg filter
can mean.
Meanwhile, filter(-> _.a + _.b > 4, data)
works for many tables immediately, including base Julia vectors, stuff like StructArrays or TypedTables, and even DataFrames.
(a, b, c, d) -> a^b + c*d
applying _ in its meaning would make this:
(_, _, _, _) -> _^_ + _*_
Sure, four-arg lambdas where arguments are used in order become simpler here. But: maybe, if there are that many arguments, give them actual names?
I agree with @aplavin above. I think it would be best to keep this syntax excessively simple, because for complex cases we have the fully general alternatives ((args...) -> ...
, do
blocks, or heaven forbid, defining and actual function) already.
The single-argument version (ie headless ->
allows you to use _
, all instances of which refer to the same and only argument) already covers a lot of use cases and has very simple semantics. This would handle stuff like
_ version |
equivalent |
---|---|
filter(-> _.a > 2 && _.b == 3, itr) |
filter(x -> x.a > 2 && x.b == 3, itr) |
map(-> _[4], itr) |
map(x -> x[4], itr) |
sum(-> calculate(a, b; _.c, _.d), rows) |
sum(row -> calculate(a, b; row.c, row.d), rows) |
For (a,b,c,d)->(x->a+b*x+c*x^2+d*x^3)
and (x,y)->sqrt(x^2+y^2)
Writing the above description was most of the work in implementing a prototype so I just did it.
Yay!
Headless
->
allows you to use_
What if you don't use any _
? Might seem trivial but could be important if it gets extended to chaining:
-1 |> abs
The single-argument version (ie headless
->
allows you to use_
, all instances of which refer to the same and only argument) already covers a lot of use cases and has very simple semantics.
This is a good point actually; 1-argument functions are by far the most common and useful in cases like this. There are some nice examples for 2 arguments, but once you get to 3 or 4 arguments I would argue it's more useful to reference a single argument 3 or 4 times. It also has the nice properties (1) same symbol refers to the same thing (2) parse tree reordering doesn't matter. After all in Real Functional Programming™ functions all have one argument anyway.
Furthermore, I'll call the proposals "individual holes" resp. "referencing holes" for having the underscores represent individual values resp. referring to the same value.
(Side note: To be honest I've never been 100% satisfied with the DataFrames APIs for selecting columns. I do kind of wish for something more along the lines of SplitApplyCombine.jl because the tooling in there feels more composable. But DataFrames is a lot more expressive within its domain, and it's hard to argue with just how usable and flexible it is there.)
Can you make an example on how you use SplitApplyCombine.jl? Then I can have a look at the interactions with this proposal.
With underscores being very "local" by their nature, it's immediately visible that the lambda is self-contained. Meanwhile,
x -> x.A + x.B * x.C
requires carefully reading the expression to see that onlyx
is used there, and not somey
from the outer scope.
You'll always need a careful look on what's within the ->
's scope. But for the individual holes, it's as easy as reading from left to right. For a single argument lambda, it's also as easy as reading from left to right. You're just spotting x's instead of _'s.
I thought I don't understand something, couldn't believe that a single symbol is taken to mean different variables in the same context
Having it mean the same everywhere makes it behave like an ordinary variable whose naming was skipped in _->
. I feel like we lose options going that route. Because for more than one argument that's reused, I still think, ordinary lambdas would be better since order on caller side is entirely unrelated to order on callee side.
But yes I see the "need" for something like the referencing holes.
I'll soon make a more theory-focused comment on why I would choose one over the other/what I'm missing in your ideas that hopefully clarifies intentions behind each of them.
What if you don't use any
_
?
This should create a 0-arg lambda that behaves like Returns(val)
t also has the nice properties (1) same symbol refers to the same thing
but that refers to ordinary variable behavior, while underscores are meant to be special cased in their usage.
This should create a 0-arg lambda that behaves like
Returns(val)
Except, presumably, that the computation is deferred?!
This should create a 0-arg lambda that behaves like
Returns(val)
Except, presumably, that the computation is deferred?!
I guess to the extent of what the compiler infers as "pure"/precomputable, so yes, deferred aside of optimization. In that sense it pushes use of closures (anonymous functions using implicit state) -> might lead to more boxing issues
but that refers to ordinary variable behavior, while underscores are meant to be special cased in their usage.
Underscore being somewhat different from a normal variable doesn't mean it's better for it to be as different as possible. The same visual thing changing meaning every time it occurs would be deliberately going against properties that are generally considered good.
Also I would argue the 0-argument case is not so important since (1) it's much rarer, (2) the syntax ()->...
already spares you needing to come up with a variable name.
We've got 4 variants of dataflow:
Note: DataFrames.jl's API (forces to/) splits up 1b cases into a combination of 2 and 3 since transformations are passed as
[columns]->transform
which describes 2nd variant on left- and 3rd variant on right-hand side.
Proposal A (individual holes) | referencing spots/single argument | single hole | |
---|---|---|---|
concept | each occurence refers to its own value on caller side strictly in order | each occurrence references the same single argument on caller side | there's at max one occurence for at max one argument on the caller side |
supports dataflow variants | focuses 3 allows: 1a, 3 |
focuses 1b allows 1a, 1b, 2 |
focuses 1a allows: 1a |
benefits/drawbacks | pushes to explicit, more separated extraction many functions of higher order don't support multiple arguments for the data(->requires zip/splat) leans towards mapping of column-based data works well with DataFrames.jl |
pushes to implicit extraction combines many steps (split, apply, combine) into a single statement (often single-line) leans towards mapping row-based data complements DataFrames.jl/acts as an alternative approach |
simplicity of scope |
escalation/worst case | constant need for transforming row- to column-based data in longer pipelines (f.e. through Splat/Zip interaction) | named-tuple pipelines which make arbitrarily intertwined transformations in each step | ? |
further down in this issue, there's the idea to enable both proposals with a slightly different and more individually tailored syntax:
-> _ + _ #Proposal A, stands for (a,b)->a+b
#and
-> $a + $b #referencing spots, stands for x->x.a+x.b
->$a
#to me feels even more straight to the point and better serving the intention than
->_.a
This brings a solution to #22710 and to #24990 by preceding the special symbols with some explicit function indicator (the headless ->
) that explicitly binds these symbols locally to the argument(s).
That way, we have both variants available and tailored to their respective use cases, focusing on what's needed.
->$a
directly hints, that there's only a single argument which will be used as the data source/accessed->
it just becomes an interpolation/extraction from the argument->
they just become slots for the arguments to drop intoThe current direction reminds me of the missing/nothing situation, where the best and by itself very good solution was not to try to bunch multiple approaches into the same thing.
So the new proposal would be to add a new headless lambda syntax, that has tailored meanings for $
and _
within its own scope. This feels like multiple dispatch on a syntactical level! Shared meaning, new specializations!
on caller side: put all arguments into a named tuple on callee side: extract using these names explanation: in some sense it emulates naming of variables. Regarding dataflow, it manually merges data (2) first, thus it's now variant 1b and syntax can be used.
on caller side: pre-extract needed data into multiple arguments (repeat as needed) on callee side: drop extractors/accessors explanation: forces manual extraction outside the lambda for all data needed. Regarding dataflow, it enforces splitting cases of 1b into 2&3, so syntax can be used for variant 3.
|>>
, pipe with included splatting) also gets approved, since that motivates to pipe an extraction step (variant 2, one->many) through a splatting pipe into a combining step (variant 3, many->one)Note: I may update this post as needed based on comments
I believe being able to reference an argument more than once has strictly higher expressiveness than being able to accept multiple arguments. To compare, see what it takes for each one to emulate the other:
->_[1] + _[2]
# 2 arguments in terms of one argument. generalizes easily to e.g. ->_[1] + _[3]
->(x=_; x.a + x.b)
# multiple reference. splat(->_ + _)
works in the special case where you want to access components 1 and 2IOW, the only way to recover multiple reference in general is to introduce a name anyway, giving even longer syntax than we have now. So on reflection I think I'd rather have multiple reference.
If we had ->$name
as the syntax for an accessor for a single argument (quite old idea that never made it to life because the proposal was to use $name
alone IIRC, #22710) that's fed into the lambda, then this would complement current Proposal A IMO quite well and directly serves the intended purpose of the referencing hole proposal which is about extraction from a single variable. It would allow for example: ->$a+$b
. It can't get more expressive IMO.
It also is an IMO logical abstraction of the current $ usage. $ is used for interpolation, it extracts values from an outer state.
The $name
stands for outerstate.name
. if used within a headless ->
then this could simply bind the interpolation to the argument just as it could bind underscores to the arguments.
In a headless lambda, $name
extracts onlyarg.name
.
using $ for direct access/extractions is also well-known from other languages afaik (IIRC it was even said before as an argument to use _ to reference the same thing multiple times)
both approaches would benefit each other quite well IMO since they are tailored to different situations (1b/2 vs 3).
So maybe, is it possible, that we have a similar case as for nothing/missing, where the best solution is providing two different approaches tailored to different entirely valid situations instead of trying to bunch multiple things into one and make them contest each other as "the right approach"? So the general meaning of the headless ->
would become to appropriately bind special symbols like $
and _
to its argument(s)
From a certain point of view #22710 and #24990 are quite similar. They tried to create functions out of a syntax that didn't hint a function. Both were rejected. But having a headless ->
as function indicator that makes those symbols locally bound/operating on arguments, we have the missing piece. Sure, with an extra ->
-typing-effort but that's a fair price for clarity/readability.
To me, ->$a
really feels more expressive and concise than x->x.a
and ->_.a
. (and it even is shorter)
_ version | $ version | equivalent |
---|---|---|
none more concise | filter(-> $a > 2 && $b == 3, itr) |
filter(x -> x.a > 2 && x.b == 3, itr) |
map(-> _[4], itr) |
(maybe map(-> $[4], itr) )* |
map(x -> x[4], itr) |
none more concise | sum(-> calculate(a, b; $c, $d), rows) |
sum(row -> calculate(a, b; row.c, row.d), rows) |
* it would make sense as the accessor that resolves to getindex(singlearg, 4)
This would make the rule as follows:
->_ + _
stands for arg1 + arg2
->$a
stands for getproperty(onlyarg, :a)
->$[a]
stands for getindex(onlyarg, a)
quote $a end
stands for the currently available interpolation of the outer scope
Optionally extending/"extrapolating" this further $a
could stand for getglobal(:a)
which basically is (->$a)(globalstate)
(this would then provide a syntax for the new suggestion to access globals via getglobal
and make global usage more obvious)
I think further overloading $
like that is questionable. If the multi-arg underscore design needs to be fixed up by adding that, it's a significant weakness. It doesn't bother me that _.a
is one character longer than $a
since it is clearer and doesn't have quote capturing issues.
I think further overloading
$
like that is questionable. If the multi-arg underscore design needs to be fixed up by adding that, it's a significant weakness. It doesn't bother me that_.a
is one character longer than$a
since it is clearer and doesn't have quote capturing issues.
I wouldn't call it fixing, since it serves its own purpose in an overall concept that consists of a newly designed headless lambda syntax with tailored meanings for $x
and _
. And multiple dispatch, being the core concept of Julia, basically means to define a thing in multiple ways that share a meaning, not an implementation, doesn't it? That's certainly the case for $
when captured by a ->
. Basically it's something very natural to have names acting relative to the scope, they're used in. It's like using globals and locals. No one should be surprised that ->
creates a local scope since that is what it's meant to do... And we already have localized behavior for $
since it acts differently in certain code blocks and macros while trying to preserve a shared meaning... So this doesn't add a new meaning, but a new specialization to a place where it was unused before (since that place didn't even exist). We sort of added a new implementation to an existing concept.
This is not the same as multiple dispatch, since it is a scoping and surface syntax issue. Looking at a + b
, a
and b
might be any types of numbers but you don't have to know which to understand what it does. But in quote ->$a end
the question is whether arg.a
is evaluated at run time or a
is evaluated at macro expansion time, which are not the same meaning. It's even more confusing because in the proposed $a
syntax, a
is not evaluated but is a quoted symbol. And if we had $[i]
as well, by analogy you would think $.a
would work. For me adding $
sends the fairly simple ->
and _
proposal off the rails.
Yes, local scopes can override variables from outer scopes. But unfortunately $
is not a normal scoped identifier and does not work like that. E.g. you can't do something like let ($) = foo
and disable $-interpolation inside that block. For example
macro m(x); QuoteNode(x); end
x = 42
:(@m $x) # result: :(@m 42)
(The point is that you might expect the macro to see the $x
argument syntax, but the outer quote "captures" it.)
And we already have localized behavior for
$
since it acts differently in certain code blocks
Example?
This is not the same as multiple dispatch, since it is a scoping and surface syntax issue. Looking at
a + b
,a
andb
might be any types of numbers but you don't have to know which to understand what it does.
I see, that I didn't mean multiple dispatch but meant shadowing/capturing which should be a known concept to folks.
But in
quote ->$a end
the question is whetherarg.a
is evaluated at run time ora
is evaluated at macro expansion time, which are not the same meaning.
Since it's ALWAYS possible not to use a headless lambda in meta-programming (easiest to stick to ordinary lambdas there, if you don't want to learn the resolution order), that's not a showstopper for me. There simply isn't a better solution than to say, if you code in a way where your expressions shadow each other or raise the question of capture order, you have to learn the mechanics of how it's resolved, or you're most probably screwed.
If we don't want to just forbid headless lambdas in quote blocks and macro calls (which still would be a sensible approach IMO) I'd probably say to have the headless lambda follow the nesting level for capturing $
. Even in quote blocks. Because that way is the easiest to evaluate visually and doesn't need more structure than nesting level to be resolved. But I'm also fine with not allowing it in quote blocks at all and having macros see the resolved anonymous function.
If you have a concrete example where being able to nest this proposal in a quote block is very important, I'm happy to think further about how I'd progress there! But I can't imagine them appear in ordinary data transforming pipelines for the user.
It's even more confusing because in the proposed
$a
syntax,a
is not evaluated but is a quoted symbol.
I don't find that confusing if it's communicated that $a
is meant to access a
of the argument (aka expanding to onlyarg.a
). To me, it's exactly as confusing as onlyarg.a
having the a
appear quoted (getproperty(onlyarg, :a)
). Interpolation always represents the value behind a quoted name in some scope. How is this different? In quote block it accesses the value behind the name a
of the outer scope, here it accesses the value behind the name a
of the argument onlyarg
.
And if we had
$[i]
as well, by analogy you would think$.a
would work.
Following the previous, if $a
accesses a
of the argument, $[i]
accessing [i]
of the argument (aka expanding to onlyarg[i]
) comes quite naturally to me. At least as naturally as I wouldn't infer onlyarg.[i]
from onlyarg.a
as the correct way of indexing. There are 2 ways to extract: via name (->getproperty) and via index (->getindex), if you append a name, it does the former, if you append indexing, it does the latter.
Yes, local scopes can override variables from outer scopes. But unfortunately
$
is not a normal scoped identifier and does not work like that. E.g. you can't do something likelet ($) = foo
and disable $-interpolation inside that block. For examplemacro m(x); QuoteNode(x); end x = 42 :(@m $x) # result: :(@m 42)
(The point is that you might expect the macro to see the
$x
argument syntax, but the outer quote "captures" it.)
That's a good example for how "if you code in a way where syntax shadows each other, understand how it works, or you might get screwed." is already part of the language. More generally speaking, I wouldn't fine-tune this feature for meta-programming because whenever you're explicitly writing lambdas for higher order functions (where this is tailored to), you most probably won't generate them through meta-programming. It's just a way too small fraction of cases.
I apologize for sounding like a broken record, but again: coming up with a complicated DSL full of special cases for writing anonymous functions is not something we should contemplate, since we already have various syntaxes for anonymous functions.
My impression is that people have two problems with the current syntax:
verbosity, ie x -> x[i]
seems excessive, 7+2 (space) characters. So maybe we can shave off 1+1 with -> _[i]
, which is kind of nice, if one is into these things.
having to name arguments which play no semantic role. There are conventions for this (eg x
), but I can see the appeal.
However, the less trivial your anonymous functions are, the less important these things become. Shaving of a few chars of a function that takes up half a line is no longer that important. And if your function has 2-3 arguments, maybe you should start naming them (sure, there are trivial examples when you don't want to, but generally it is better style).
We should be focusing on a syntax that is very simple, makes 90% of the trivial applications for ->
a bit simpler, with the understanding that users can (and should) just fall back to the latter when necessary.
[Personally I see no reason to add anything to Julia, I am happy with the current syntax, but people have been asking for this and the one-argument headless option is something that fits nicely into Julia and presumably does not cause any major headaches.]
Agreed, turning this into a complicated DSL with complex unpredictable behaviours one has to memorize would be a bad outcome
I must say though, I find it very frustrating that this syntax has only one real reason to even be considered: That macros bind less tightly than commas. If it wasn't for https://github.com/JuliaLang/julia/issues/36547, then we could just simply write @_
or whatever instead of ->
and not need to modify the parser and lowering at all.
->$a
to me feels even more straight to the point and better serving the intention than
->_.a
This basically proposes $
instead of _
as the single argument placeholder, right?
It's a perfectly reasonable suggestion, after all neither $
nor _
carry any meaning by themselves as characters. However, I believe there are strong arguments to prefer _
here.
_
, and they use it to refer to the same value within a single scope - see list above. Literally no package (right?) decided to reuse _
for several values, or to use $
as the placeholder! Also, Mathematica's #
placeholder behaves the same._
is just like a variable, but with a name we don't care about. No extra rules unique to this situation, no confusion with the already existing "interpolation" behavior of $
.Having it mean the same everywhere makes it behave like an ordinary variable whose naming was skipped in _->.
And... that's a good thing! :)
I feel like we lose options going that route. Because for more than one argument that's reused, I still think, ordinary lambdas would be better since order on caller side is entirely unrelated to order on callee side.
Sure, if a lambda has multiple arguments and reuses them, better give them some names...
Thomas and Mason highlight an excellent policy. For every line of code written, it is read at least a dozen times. Hence, any improvement must necessarily make reading comprehension of an expression easier, not harder by adding more semantic complexity. I don't mind providing special meaning to the underscore, if it's got a simple interpretation -- it's a parameter for a single argument. The other proposals here seem to require too much of my mental space.
What if we use Unicode open circle characters, e.g. ①
, ..., ㊿
? We could drop ->
in this case so that ①
means (args...) -> args[1]
and ①+②
would mean (args...) -> args[1]+args[2]
, etc. To make writing convenient, \1
TAB would insert ①
, etc. I'm less fond of providing special meaning to _
, since this character is often used for arguments that are ignored, e.g. (_,x,_) -> x
. Alas, I know how many are resistant to using Unicode as part of Julia's syntax. Arguing against this proposal myself, there are documentation conventions that use black circled numbers (e.g. ❶,..., ⓴) as footnote labels; although the open circle is light enough as to not draw as much visual attention.
expression | interpretation? |
---|---|
filter(① > 2 && ② == 3, itr...) |
filter(_ -> _[1] > 2 && _[2] == 3, itr) |
map(④, itr...) |
map(_ -> _[4], itr) |
sum(calculate(a, b; ②, ④), rows...) |
sum(_ -> calculate(a, b; _[2], _[4]), rows) |
It seems even this "simple" proposal is less than trivial. To follow Aaron's test cases, the iterators would have to be expanded into tuples. Moreover, by getting rid of ->
there's no clear boundary where the anonymous function goes; there is probably no sane rule that would provide the last interpretation. I guess this proposal also entails too much magic. Easier to read, but even harder to comprehend. It was fun to consider, though. I'll join the downvote ;)
As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description:
Added a new syntax for lambdas for which the argument list is skipped. It is tailored to different situations:
1) Accessing/extracting data from a single (first and only) argument
Within a headless lambda that gets exactly one argument (onlyarg
), interpolation syntax $prop
refers to onlyarg.prop
and $[i]
refers to onlyarg[i]
. Both access the first and only argument of the lambda. (solves 22710 and part of 24990)
Examples:
filter(->$colA+$colB>5, data)
as a short form of filter(x->x.colA+x.colB>5, data)
accessor = ->$a
(hence accessor(x) == x.a
)fifth = ->$[5]
(hence fifth(x) = x[5]
)
2) Combining/transforming multiple arguments (as needed for higher order functions)
Within a headless lambda, underscores refer to those different arguments in order, based on position. (solves part of 24990)
Example:
reduce(->2*_+_, data)
as a short form of reduce((a,b)->2*a+b, data)
map(->2*sin(_), data)
as a short form of map(x->2*sin(x), data)
further examples:
mapreduce(->$a, ->2*_+_, data)
as a shorthand for mapreduce(x->x.a, (s,n)->2*s+n, data)
subset(df, [colA, colB] => ByRow(->_+_>5))
as a shorthand for subset(df, [colA, colB] => ByRow((a,b)->a+b>5))
I'm missing core aspects on what makes the proposal difficult to understand for the user.
From my perspective it is not that the proposal is difficult to understand per se. Julia is a powerful language, which comes with a certain amount of complexity, and users manage that just fine.
I think the key issue is the gain in function vs the added complexity, and tradeoffs between various alternatives (the multislot and the single argument versions are of course mutually exclusive).
Also, I think that a _
stands out more visually than a $
.
(Incidentally, I find it confusing to switch syntaxes in the middle of a proposal like this.)
->$a
to me feels even more straight to the point and better serving the intention than
->_.a
This basically proposes
$
instead of_
as the single argument placeholder, right?
@aplavin no, if you look at the code you quoted they are suggesting that -> $a
means x -> x.a
, not that $
is used as an alternative for _
.
As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description:
@rapus95 I have no problem personally, with adding more handy syntax because I've already learned our current syntax, so this is just a small bite sized addition for me to learn. However, that's not the case for everyone, specifically new users.
I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users.
The more special syntax we have, the less willing we should be to add more special syntax on top of it.
Actually, I just realized that we don't really need to solve https://github.com/JuliaLang/julia/issues/36547, and we can actually just replace thist ->
syntax with a macro pretty trivially. The key is just to slurp up and then spit out extra arguments that might end up in the macro.
using MacroTools
@eval macro $(:_)(ex)
@gensym x
if ex isa Expr && ex.head == :tuple
pre_body, rest... = ex.args
else
pre_body = ex
rest = ()
end
body = MacroTools.postwalk(pre_body) do ex
ex == :_ ? x : ex
end
λ = :($x -> $body)
if length(rest) == 0
esc(λ)
else
esc(:(($λ, $(rest...))...))
end
end
Behold:
julia> map(@_ _[1], [[1,2,3], [4,5,6]])
2-element Vector{Int64}:
1
4
No parser changes required.
Macros kind of work "outside in" but parsing kind of works "inside out". In this case, if this was done by the parser, _
would "find" the innermost ->
to "attach" to. And it would be good for this syntax to be nestable, so people could do: df |> filter(-> _.a .> 1, _)
. There are messy ways to work around this with a macro, but especially since this is such a commonly demanded feature, I think parser support is the way to go.
Parsing also definitely works "outside in", and has to take the same care that a macro would have to take to attach _
to the right fence.
Hmm, maybe I meant symbol resolution works inside out?
This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something.
I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users.
:100: I would go farther though. Less noise is better for everybody, not just somebody in their first week of learning Julia.
This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something.
Ok, but what if someone else writes a new macro that uses _
and they don't play well together?
Someone could also quite easily write a macro that doesn't play well with this PR in the same way.
Okay, I've made https://github.com/MasonProtter/SimpleUnderscores.jl, @bramtayl or anyone else interested in this syntax please feel free to poke around with it and see if it fails in any obvious ways.
Since https://github.com/JuliaLang/julia/pull/24990 stalls on the question of what the right amount of tight capturing is
Idea
I want to propose a headless
->
variant which has the same scoping mechanics as(args...)->
but automatically collects all not-yet-captured underscores into an argument list. EDIT: Nesting will follow the same rules as variable shadowing, that is, the underscore binds to the tightest headless->
it can find.lfold((x,y)->x+2y, A)
lfold(->_+2_,A)
lfold((x,y)->sin(x)-cos(y), A)
lfold(->sin(_)-cos(_), A)
map(x->5x+2, A)
map(->5_+2,A)
map(x->f(x.a), A)
map(->f(_.a),A)
Advantage(s)
In small anonymous functions underscores as variables can increase the readability since they stand out a lot more than ordinary letters. For multiple argument cases like anonymous functions for reduce/lfold it can even save a decent amount of characters. Overall it reads very intuitively as start here and whatever arguments you get, just drop them into the slots from left to right
Sure, some more complex options like reordering (
(x,y)->(y,x)
), ellipsing ((x...)->x
) and probably some other cases won't be possible but if everything would be possible in the headless variant we wouldn't have introduced the head in the first place.Feasibility
1) Both, a leading
->
and an_
as the right hand side (value side) error on 1.5 so that shouldn't be breaking. 2) Since it uses the well-defined scoping of the ordinary anonymous functions it should be easy to 2a) switch between both variants mentally 2b) reuse most of the current parser code and just extend it to collect/replace underscoresCompatibility with #24990
It shouldn't clash with the result of #24990 because that focuses more on ~tight single argument~ very tight argument cases. And even if you are in a situation where the headless
->
consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->
) in the right place to make that underscore once again standalone.Further Explorations
This proposal can optionally be combined with https://github.com/JuliaLang/julia/pull/53946. Additionally, the following links to comments further down explore different ideas to stretch into, all adding their own value to different parts of the ecosystem. Alternative explorations: https://github.com/JuliaLang/julia/issues/38713#issuecomment-1436118670 https://github.com/JuliaLang/julia/issues/38713#issuecomment-1188977419