Closed shelakel closed 3 years ago
The reason this may be desirable in base is 2 fold:
1.) We may want to encourage pipelining as being the Julian Way -- arguments can be made that it is more readable 2.) things like Lazy.jl, FunctionalData.jl, and my own Pipe.jl require a macro to wrap the expression it is to act on -- which makes it less readable.
I feel the answer may lay in having Infix Macros. And defining |> as such.
I'm not certain having |>, (or their cousin the do block) belong in core at all. But the tools don't exist to define them outside of the parser.
The ability to have that sort of pipelining syntax seems very nice. Could just that be added to Base, i.e. x |> y(f) = y(f, x)
part, that Lazy.j, FunctionalData.jl, and Pipe.jl could use? :+1:
Having looked at code that uses the various implementations of this out in packages, I personally find it unreadable and very much un-Julian. The left-to-right pipeline pun doesn't help readability, it just makes your code stand out as backwards from the rest of the perfectly normal code that uses parentheses for function evaluation. I'd rather discourage a syntax that leads to 2 different styles where code written in either style looks inside-out and backwards relative to code written in the other. Why not just settle on the perfectly good syntax we already have and encourage making things look more uniform?
@tkelman Personally, I see it from a somewhat utilitarian point of view. Granted, maybe if you're doing something simple then it isn't necessary, but if you're writing a function say, that does something fairly complicated, or long winded, (off the top of my head: data manipulation e.g.), then I think that's where pipeline syntax shines.
I understand what you mean though; it would be more uniform if you had one function call syntax for everything. Personally though, I think it's better to make it easier to write [complicated] code that can be easily understood. Granted, you have to learn the syntax and what it means, but, IMHO, |>
is no harder to grasp than how to call a function.
@tkelman I'd look at it from a different point of view. Obviously, there are people who prefer that style of programming. I can see that maybe you'd want to have a consistent style for the source code to Base, but this is only about added the parser support for their preferred style of programming their Julia applications. Do julians really want to try to dictate or otherwise stifle something other people find beneficial? I've found pipelining stuff together very useful in Unix, so even though I've never used a programming language that enabled it in the language, I'd at least give it the benefit of the doubt.
We do have |>
as a function piping operator, but there are implementation limitations to how it's currently done that make it pretty slow at the moment.
Piping is great in a unix shell where everything takes text in and text out. With more complicated types and multiple inputs and outputs, it's not as clear-cut. So we have two syntaxes, but one makes a lot less sense in the MIMO case. Parser support for alternate styles of programming or DSL's is not usually necessary since we have powerful macros.
OK, thanks, I was going by @oxinabox's comment:
But the tools don't exist to define them outside of the parser.
Is it understood what would be done to remove the implementation limitations you refered to?
Some of the earlier suggestions could potentially be implemented by making |>
parse its arguments as a macro instead of as a function. The former command-object piping meaning of |>
has been deprecated, so this might actually be freed up to do something different with, come 0.5-dev.
However this choice reminds me quite a bit of the special parsing of ~
which I feel is a mistake for reasons I've stated elsewhere.
Parsing ~ is just insane, it's a function in base. Using _
, _1
, _2
, seem more reasonable (esp. if you raise if these variables are defined elsewhere in scope). Still until we have more efficient anonymous functions this seems like it's not going to work...
implemented by making |> parse its arguments as a macro instead of as a function
Unless you do that!
Parsing ~ is just insane, it's a function in base
It's a unary operator for the bitwise version. Infix binary ~
parses as a macro, ref https://github.com/JuliaLang/julia/issues/4882, which I think is a strange use of an ascii operator (https://github.com/JuliaLang/julia/pull/11102#issuecomment-98477891).
@tkelman
So we have two syntaxes, but one makes a lot less sense in the MIMO case.
3 Syntaxes. Kind of. Pipe in, Normal function call and Do-blocks. Debatable even 4, since Macros use a different convention as well.
For me, the Readorder (ie left to right) == Application order, makes, for SISO function chains, a lot clearer.
I do a lot of code like (Using iterators.jl, and pipe.jl):
loaddata(filename) |> filter(s-> 2<=length(s)<=15, _) |> take!(150,_) |> map(eval_embedding, _)
results |> get_error_rate(desired_results, _) |> round(_,2)
For SISO, it;s better (for my personal preference), for MIMO it is not.
Julia seems to have already settled towards there being multiple correct ways to do things. Which I am not 100% sure is a good thing.
As I said I would kind of like Pipe and Do blocks moved out of the main language.
Do-blocks have quite a few very helpful use cases, but it has annoyed me a little that they have to use the first input as the function, doesn't always fit in quite right with the multiple dispatch philosophy (and neither would pandas/D style UFCS with postfix data.map(f).sum()
, I know it's popular but I don't think it can be combined effectively with multiple dispatch).
Piping can probably be deprecated quite soon, and left to packages to use in DSL's like your Pipe.jl.
Julia seems to have already settled towards there being multiple correct ways to do things. Which I am not 100% sure is a good thing.
It's related to the question of whether or not we can rigorously enforce a community-wide style guide. So far we haven't done much here, but for long-term package interoperability, consistency, and readability I think this will become increasingly important as the community grows. If you're the only person who will ever read your code, go nuts and do whatever you want. If not though, there's value in trading off slightly worse (in your own opinion) readability for the sake of uniformity.
@tkelman @oxinabox
I have yet to find a clear reason why it should not be included in the language, or indeed in the "core" packages. [e.g: Base]
Personally, I think making |>
a macro might be the answer.
Something like this perhaps? (I'm not a master Julia programmer!)
macro (|>) (x, y::Union(Symbol, Expr))
if isa(y, Symbol)
y = Expr(:call, y) # assumes y is callable
end
push!(y.args, x)
return eval(y)
end
Under Julia v0.3.9, I was unable to define it twice -- once with a symbol, and once with an expression; my [limited] understanding of Union
is that there is performance hit from using it, so I'm guessing that would be something to rectify in my toy example code.
Of course, there is a problem with the use syntax for this.
For example, to run the equivalent of log(2, 10)
, you have to write @|> 10 log(2)
, which isn't desirable here.
My understanding is that you'd have to be able to somehow mark functions/macros as "infixable", as it were, such that you could then write it thus: 10 |> log(2)
. (Correct if wrong!)
Contrived example, I know. I can't think of a good one right now! =)
It's also worth pointing out one area I have not covered in my example... So e.g:
julia> for e in ([1:10], [11:20] |> zip) println(e) end
(1,11)
(2,12)
(3,13)
(4,14)
(5,15)
(6,16)
(7,17)
(8,18)
(9,19)
(10,20)
Again - contrived example, but hopefully you get the point! I did some fiddling, but as of writing this I was unable to fathom how to implement that, myself.
Please see https://github.com/JuliaLang/julia/issues/554#issuecomment-110091527 and #11608.
On Jun 9, 2015, at 9:37 PM, H-225 notifications@github.com wrote:
I have yet to find a clear reason why it should not be included in the language
This is the wrong mental stance for programming language design. The question must by "why?" rather than "why not?" Every feature needs a compelling reason for its inclusion, and even with a good reason, you should think long and hard before adding anything. Can you live without it? Is there a different way to accomplish the same thing? Is there a different variation of the feature that would be better and more general or more orthogonal to the existing features? I'm not saying this particular idea couldn't happen, but there needs to be a far better justification than "why not?" with a few examples that are no better than the normal syntax.
The question must by "why?" rather than "why not?"
+1_000_000
Indeed. See this fairly well known blog post: Every feature starts with -100 points. It needs to make a big improvement to be worth adding to the language.
FWIW, Pyret (http://www.pyret.org/) went through this exact discussion a few months ago. The language supports a "cannonball" notation which originally functioned much the way that people are proposing with |>
. In Pyret,
[list: 1, 2, 3, 5] ^ map(add-one) ^ filter(is-prime) ^ sum() ^ ...
So, the cannonball notation desugared into adding arguments to the functions.
It didn't take long before they decided that this syntax was too confusing. Why is sum()
being called without any arguments? etc. Ultimately, they opted for an elegant currying alternative:
[list: 1, 2, 3, 5] ^ map(_, add-one) ^ filter(_, is-prime) ^ sum() ^ ...
This has the advantage of being more explicit and simplifies the ^
operator to a simple function.
Yes, that seems much more reasonable to me. It is also more flexible than currying.
@StefanKarpinski I'm a little confused. Did you mean to say more flexible then chaining (not currying)? After all Pyret's solution was to simply use currying, which is more general than chaining.
Maybe, if we modify the |>
syntax a little bit (I really don't know how hard it is to implement, maybe it conflicts with |
and >
), we could set something flexible and readable.
Defining something like
foo(x,y) = (y,x)
bar(x,y) = x*y
We would have:
randint(10) |_> log(_,2) |> sum
(1,2) |_,x> foo(_,x) |x,_> bar(_,2) |_> round(_, 2) |> sum |_> log(_, 2)
In other words, we would have an operator like |a,b,c,d>
where a
, b
, c
and d
would get the returned values of the last expression (in order) and use it in placeholders inside the next one.
If there are no variables inside |>
it would work as it works now. We could also set a new stardard: f(x) |> g(_, 1)
would get all values returned by f(x)
and associate with the _
placeholder.
@samuela, what I meant was that with currying you can only omit trailing arguments, whereas with the _
approach, you can omit any arguments and get an anonymous function. I.e. given f(x,y)
with currying you can do f(x)
to get a function that does y -> f(x,y)
, but with underscores you can do f(x,_)
for the same thing but also do f(_,y)
to get x -> f(x,y)
.
While I like the underscore syntax, I'm still not satisfied with any proposed answer to the question of how much of the surrounding expression it "captures".
what do you do if a function returns multiple results? Would it have to pass a tuple to the _ position? Or could there be a syntax to split it up on the fly? May be a stupid question, if so, pardon!
@StefanKarpinski Ah, I see what you mean. Agreed.
@ScottPJones the obvious answer is to allow ASCII art arrows: http://scrambledeggsontoast.github.io/2014/09/28/needle-announce/
@simonbyrne That looks even worse than programming in Fortran IV on punched cards, like I did in my misspent youth! Just wondered if some syntax like _1, _2, etc. might allow pulling apart a multiple return, or is that just a stupid idea on my part?
@simonbyrne That's brilliant. Implementing that as a string macro would be an amazing GSoC project.
Why is sum() being called without any arguments?
I think that the implicit argument is also one of the more confusing things about do
notation, so it would be nice if we could utilise the same convention for that as well (though I realise that it is much more difficult, as it is already baked into the language).
@simonbyrne You don't think it could be done in an unambiguous way? If so, that's something I think is worth breaking (the current do
notation), if it can be made more logical, more general, and consistent with chaining.
@simonbyrne Yeah, I totally agree. I understand the motivation for the current do
notation but I feel strongly that it doesn't justify the syntactical gymnastics.
@samuela regarding map(f, _) vs just map(f). I agree that some magic desugaring would be confusing, but I do think map(f) is something that should exist. It wouldn't require and sugar just add a simple method to map. eg
map(f::Base.Callable) = function(x::Any...) map(f,x...) end
i.e. map takes a function and then returns a function that works on things that are iterable (more or less).
More generally I think we should lean towards functions that have additional "convenience" methods, rather than some sort of convention that |>
always maps data to the first argument (or similar).
In the same vein there could be a
type Underscore end
_ = Underscore()
and a general convention that functions should/could have methods that take underscores in certain arguments, and then return functions that take fewer arguments. I'm less convinced that this would be a good idea, as one would need to add 2^n methods for each function that takes n arguments. But it's one approach. I wonder if it would be possible to not have to explicitly add so many methods but rather hook into the method look up, so that if any arguments are of type Underscore then the appropriate function is returned.
Anyway, I definitely think having a version of map and filter that just take a callable and return a callable makes sense, the thing with the Underscore may or may not be workable.
@patrickthebold
I would imagine that x |> map(f, _)
=> x |> map(f, Underscore())
=> x |> map(f, x)
, as you propose, would be the simplest way to implement map(f, _)
, right? - just have _
be a special entity which you'd program for?
Though, I'm uncertain if that would be better than having it automatically inferred by Julia-- presumably using the |>
syntax-- rather than having to program it yourself.
Also, regarding your proposal for map
- I kinda like it. Indeed, for the current |>
that would be quite handy. Though, I imagine it would be simpler better to just implement automatic inferencing of x |> map(f, _)
=> x |> map(f, x)
instead?
@StefanKarpinski Makes sense. Hadn't thought of it quite like that.
Nothing I said would be tied to |>
in any way. What I meant regarding the _
would be for example to add methods to <
as such:
<(_::Underscore, x) = function(z) z < x end
<(x, _::Underscore) = function(z) x < z end
But again I think this would be a pain unless there was a way to automatically add the appropriate methods.
Again, the thing with the underscores is separate that adding the convenience method to map as outlined above. I do think both should exist, in some form or another.
@patrickthebold Such an approach with a user-defined type for underscore, etc would place a significant and unnecessary burden on the programmer when implementing functions. Having to list out all 2^n of
f(_, x, y) = ...
f(x, _, y) = ...
f(_, _, y) = ...
...
would be very annoying, not to mention inelegant.
Also, your proposition with map
would I suppose provide a workaround syntax for map(f)
with basic functions like map
and filter
but in general it suffers from the same complexity issue as the manual underscore approach. For example, for func_that_has_a_lot_of_args(a, b, c, d, e)
you'd have to go through the grueling process of typing out each possible "currying"
func_that_has_a_lot_of_args(a, b, c, d, e) = ...
func_that_has_a_lot_of_args(b, c, d, e) = ...
func_that_has_a_lot_of_args(a, b, e) = ...
func_that_has_a_lot_of_args(b, d, e) = ...
func_that_has_a_lot_of_args(a, d) = ...
...
And even if you did, you'd still be faced with an absurd amount of ambiguity when calling the function: Does func_that_has_a_lot_of_args(x, y, z)
refer to the definition where x=a,y=b,z=c
or x=b,y=d,z=e
, etc? Julia would discern between them with runtime type information but for the lay-programmer reading the source code it would be totally unclear.
I think the best way to get underscore currying done right is to simply incorporate it into the language. It would be a very straightforward change to the compiler after all. Whenever an underscore appears in a function application, just pull it out to create a lambda. I started looking into implementing this a few weeks ago but unfortunately I don't think I'll have enough free time in the next few weeks to see it through. For someone familiar with the Julia compiler though it would probably take no more than an afternoon to get things working.
@samuela Can you clarify what you mean by, "pull it out to create a lambda"? - I'm curious. I too have wondered how that may be implemented.
@patrickthebold
Ah - I see. Presumably you could then use such a thing like this: filter(_ < 5, [1:10])
=> [1:4]
?
Personally, I would find filter(e -> e < 5, [1:10])
easier to read; more consistent - less hidden meaning, though I grant you, it is more concise.
Unless you have an example where it really shines?
@samuela
Also, your proposition with map would I suppose provide a workaround syntax for map(f) with basic functions like map and filter but in general it suffers from the same complexity issue as the manual underscore approach.
I wasn't suggesting that this be done in general, only for map
and filter
, and possibly a few other places where it seems obvious. To me, that's how map
should work: take in a function and return a function. (pretty sure that's what Haskell does.)
would be very annoying, not to mention inelegant.
I think we are in agreement on that. I'd hope there would be a way to add something to the language to handle method invocations where some arguments are of type Underscore. Upon further thought, I think it boils down to having a special character automatically expand into a lambda, or have a special type that automatically expands into a lambda. I don't feel strongly either way. I can see pluses and minuses to both approaches.
@H-225 yes the underscore thing is just a syntactic convenience. Not sure how common it is, but Scala certainly has it. Personally I like it, but I think it's just one of those style things.
@H-225 Well, in this case I think a compelling and relevant example would be function chaining. Instead of having to write
[1, 2, 3, 5]
|> x -> map(addone, x)
|> x -> filter(isprime, x)
|> sum
|> x -> 3 * x
|> ...
one could simply write
[1, 2, 3, 5]
|> map(addone, _)
|> filter(isprime, _)
|> sum
|> 3 * _
|> ...
I find myself unknowingly using this underscore syntax (or some slight variant) constantly in languages that support it and only realize how helpful it is when transitioning to work in languages that do not support it.
As far as I know, there are currently at least 3.5 libraries/approaches that attempt to address this problem in Julia: Julia's builtin |>
function, Pipe.jl, Lazy.jl, and 0.5 for Julia's builtin do
notation which is similar in spirit. Not to bash any of these libraries or approaches, but many of them could be greatly simplified if underscore currying was supported by Julia.
@samuela if you'd like to play with an implementation of this idea, you could try out FunctionalData.jl, where your example would look like this:
@p map [1,2,3,4] addone | filter isprime | sum | times 3 _
The last part shows how to pipe the input into the second parameter (default is argument one, in which case the _
can be omitted). Feedback very much appreciated!
Edit: the above is simply rewritten to:
times(3, sum(filter(map([1,2,3,4],addone), isprime)))
which uses FunctionalData.map and filter instead of Base.map and filter. Main difference is the argument order, second difference is the indexing convention (see docs). In any case, Base.map can simply be used by reversing the argument order. @p
is quite a simple rewrite rule (left to right becomes inner-to-outer, plus support for simple currying: @p map data add 10 | showall
becomes
showall(map(data, x->add(x,10)))
Hack may introduce something like this: https://github.com/facebook/hhvm/issues/6455. They're using $$
which is off the table for Julia ($
is already too overloaded).
FWIW, I really like Hack's solution to this.
I like it too, my main reservation being that I'd still kind of like a terser lambda notation that might use _
for variables / slots and it would be good to make sure that these don't conflict.
Couldn't one use __
? What's the lambda syntax you're thinking of? _ -> sqrt(_)
?
Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map(_ + 2, v)
, the real issue being how much of the surrounding expression the _
belongs to.
Doesn't Mathematica have a similar system for anonymous arguments? How do they handle the scope of the bounding of those arguments? On Tue, Nov 3, 2015 at 9:09 AM Stefan Karpinski notifications@github.com wrote:
Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map( + 2, v), the real issue being how much of the surrounding expression the belongs to.
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/5571#issuecomment-153383422.
https://reference.wolfram.com/language/tutorial/PureFunctions.html, showing the # symbol, is what I was thinking of. On Tue, Nov 3, 2015 at 9:34 AM Jonathan Malmaud malmaud@gmail.com wrote:
Doesn't Mathematica have a similar system for anonymous arguments? How do they handle the scope of the bounding of those arguments? On Tue, Nov 3, 2015 at 9:09 AM Stefan Karpinski notifications@github.com wrote:
Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map( + 2, v), the real issue being how much of the surrounding expression the belongs to.
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/5571#issuecomment-153383422.
Mathematica uses &
to delimit it.
Rather than doing something as general as a shorter lambda syntax (which could take an arbitrary expression and return an anonymous function) we could get around the delimiter problem by confining the acceptable expressions to function calls, and the acceptable variables / slots to entire parameters. This would give us a very clean multi-parameter currying syntax à la Open Dyln. Because the _
replaces entire parameters, the syntax could be minimal, intuitive, and unambiguous. map(_ + 2, _)
would translate to x -> map(y -> y + 2, x)
. Most non-function call expressions that you would want to lambdafy would probably be longer and more amiable to ->
or do
anyway. I do think the trade-off of usability vs generality would be worth it.
@durcan, that sounds promising – can you elaborate on the rule a bit? Why does the first _
stay inside the argument of map
while the second one consumes the whole map
expression? I'm not clear on what "confining the acceptable expressions to function calls" means, nor what "confining acceptable variables / slots to entire parameters" means...
Ok, I think I get the rule, having read some of that Dylan documentation, but I have to wonder about having map(_ + 2, v)
work but map(2*_ + 2, v)
not work.
Would it be possible to allow calling any function on Any so that the value is passed to the function as the first parameter and the parameters passed to the function call on the value is added afterwards? ex.
Is it possible to indicate in a deterministic way what a function will return in order to avoid run time exceptions?