Function chaining - Githubissues

shelakel commented 10 years ago

Would it be possible to allow calling any function on Any so that the value is passed to the function as the first parameter and the parameters passed to the function call on the value is added afterwards? ex.

sum(a::Int, b::Int) -> a + b

a = 1
sum(1, 2) # = 3
a.sum(2) # = 3 or
1.sum(2) # = 3

Is it possible to indicate in a deterministic way what a function will return in order to avoid run time exceptions?

oxinabox commented 9 years ago

The reason this may be desirable in base is 2 fold:

1.) We may want to encourage pipelining as being the Julian Way -- arguments can be made that it is more readable 2.) things like Lazy.jl, FunctionalData.jl, and my own Pipe.jl require a macro to wrap the expression it is to act on -- which makes it less readable.

I feel the answer may lay in having Infix Macros. And defining |> as such.

I'm not certain having |>, (or their cousin the do block) belong in core at all. But the tools don't exist to define them outside of the parser.

ScottPJones commented 9 years ago

The ability to have that sort of pipelining syntax seems very nice. Could just that be added to Base, i.e. x |> y(f) = y(f, x) part, that Lazy.j, FunctionalData.jl, and Pipe.jl could use? :+1:

tkelman commented 9 years ago

Having looked at code that uses the various implementations of this out in packages, I personally find it unreadable and very much un-Julian. The left-to-right pipeline pun doesn't help readability, it just makes your code stand out as backwards from the rest of the perfectly normal code that uses parentheses for function evaluation. I'd rather discourage a syntax that leads to 2 different styles where code written in either style looks inside-out and backwards relative to code written in the other. Why not just settle on the perfectly good syntax we already have and encourage making things look more uniform?

Tetralux commented 9 years ago

@tkelman Personally, I see it from a somewhat utilitarian point of view. Granted, maybe if you're doing something simple then it isn't necessary, but if you're writing a function say, that does something fairly complicated, or long winded, (off the top of my head: data manipulation e.g.), then I think that's where pipeline syntax shines.

I understand what you mean though; it would be more uniform if you had one function call syntax for everything. Personally though, I think it's better to make it easier to write [complicated] code that can be easily understood. Granted, you have to learn the syntax and what it means, but, IMHO, |> is no harder to grasp than how to call a function.

ScottPJones commented 9 years ago

@tkelman I'd look at it from a different point of view. Obviously, there are people who prefer that style of programming. I can see that maybe you'd want to have a consistent style for the source code to Base, but this is only about added the parser support for their preferred style of programming their Julia applications. Do julians really want to try to dictate or otherwise stifle something other people find beneficial? I've found pipelining stuff together very useful in Unix, so even though I've never used a programming language that enabled it in the language, I'd at least give it the benefit of the doubt.

tkelman commented 9 years ago

We do have |> as a function piping operator, but there are implementation limitations to how it's currently done that make it pretty slow at the moment.

Piping is great in a unix shell where everything takes text in and text out. With more complicated types and multiple inputs and outputs, it's not as clear-cut. So we have two syntaxes, but one makes a lot less sense in the MIMO case. Parser support for alternate styles of programming or DSL's is not usually necessary since we have powerful macros.

ScottPJones commented 9 years ago

OK, thanks, I was going by @oxinabox's comment:

But the tools don't exist to define them outside of the parser.

Is it understood what would be done to remove the implementation limitations you refered to?

tkelman commented 9 years ago

Some of the earlier suggestions could potentially be implemented by making |> parse its arguments as a macro instead of as a function. The former command-object piping meaning of |> has been deprecated, so this might actually be freed up to do something different with, come 0.5-dev.

However this choice reminds me quite a bit of the special parsing of ~ which I feel is a mistake for reasons I've stated elsewhere.

hayd commented 9 years ago

Parsing ~ is just insane, it's a function in base. Using _, _1, _2, seem more reasonable (esp. if you raise if these variables are defined elsewhere in scope). Still until we have more efficient anonymous functions this seems like it's not going to work...

implemented by making |> parse its arguments as a macro instead of as a function

Unless you do that!

tkelman commented 9 years ago

Parsing ~ is just insane, it's a function in base

It's a unary operator for the bitwise version. Infix binary ~ parses as a macro, ref https://github.com/JuliaLang/julia/issues/4882, which I think is a strange use of an ascii operator (https://github.com/JuliaLang/julia/pull/11102#issuecomment-98477891).

oxinabox commented 9 years ago

@tkelman

So we have two syntaxes, but one makes a lot less sense in the MIMO case.

3 Syntaxes. Kind of. Pipe in, Normal function call and Do-blocks. Debatable even 4, since Macros use a different convention as well.

For me, the Readorder (ie left to right) == Application order, makes, for SISO function chains, a lot clearer.

I do a lot of code like (Using iterators.jl, and pipe.jl):

loaddata(filename) |> filter(s-> 2<=length(s)<=15, _) |> take!(150,_) |> map(eval_embedding, _)
results |> get_error_rate(desired_results, _) |> round(_,2)

For SISO, it;s better (for my personal preference), for MIMO it is not.

Julia seems to have already settled towards there being multiple correct ways to do things. Which I am not 100% sure is a good thing.

As I said I would kind of like Pipe and Do blocks moved out of the main language.

tkelman commented 9 years ago

Do-blocks have quite a few very helpful use cases, but it has annoyed me a little that they have to use the first input as the function, doesn't always fit in quite right with the multiple dispatch philosophy (and neither would pandas/D style UFCS with postfix data.map(f).sum(), I know it's popular but I don't think it can be combined effectively with multiple dispatch).

Piping can probably be deprecated quite soon, and left to packages to use in DSL's like your Pipe.jl.

Julia seems to have already settled towards there being multiple correct ways to do things. Which I am not 100% sure is a good thing.

It's related to the question of whether or not we can rigorously enforce a community-wide style guide. So far we haven't done much here, but for long-term package interoperability, consistency, and readability I think this will become increasingly important as the community grows. If you're the only person who will ever read your code, go nuts and do whatever you want. If not though, there's value in trading off slightly worse (in your own opinion) readability for the sake of uniformity.

Tetralux commented 9 years ago

@tkelman @oxinabox I have yet to find a clear reason why it should not be included in the language, or indeed in the "core" packages. [e.g: Base] Personally, I think making |> a macro might be the answer. Something like this perhaps? (I'm not a master Julia programmer!)

macro (|>) (x, y::Union(Symbol, Expr))
    if isa(y, Symbol)
        y = Expr(:call, y) # assumes y is callable
    end
    push!(y.args, x)
    return eval(y)
end

Under Julia v0.3.9, I was unable to define it twice -- once with a symbol, and once with an expression; my [limited] understanding of Union is that there is performance hit from using it, so I'm guessing that would be something to rectify in my toy example code.

Of course, there is a problem with the use syntax for this. For example, to run the equivalent of log(2, 10), you have to write @|> 10 log(2), which isn't desirable here. My understanding is that you'd have to be able to somehow mark functions/macros as "infixable", as it were, such that you could then write it thus: 10 |> log(2). (Correct if wrong!) Contrived example, I know. I can't think of a good one right now! =)

It's also worth pointing out one area I have not covered in my example... So e.g:

julia> for e in ([1:10], [11:20] |> zip) println(e) end
(1,11)
(2,12)
(3,13)
(4,14)
(5,15)
(6,16)
(7,17)
(8,18)
(9,19)
(10,20)

Again - contrived example, but hopefully you get the point! I did some fiddling, but as of writing this I was unable to fathom how to implement that, myself.

tkelman commented 9 years ago

Please see https://github.com/JuliaLang/julia/issues/554#issuecomment-110091527 and #11608.

StefanKarpinski commented 9 years ago

On Jun 9, 2015, at 9:37 PM, H-225 notifications@github.com wrote:

I have yet to find a clear reason why it should not be included in the language

This is the wrong mental stance for programming language design. The question must by "why?" rather than "why not?" Every feature needs a compelling reason for its inclusion, and even with a good reason, you should think long and hard before adding anything. Can you live without it? Is there a different way to accomplish the same thing? Is there a different variation of the feature that would be better and more general or more orthogonal to the existing features? I'm not saying this particular idea couldn't happen, but there needs to be a far better justification than "why not?" with a few examples that are no better than the normal syntax.

johnmyleswhite commented 9 years ago

The question must by "why?" rather than "why not?"

+1_000_000

oxinabox commented 9 years ago

Indeed. See this fairly well known blog post: Every feature starts with -100 points. It needs to make a big improvement to be worth adding to the language.

samuela commented 9 years ago

FWIW, Pyret (http://www.pyret.org/) went through this exact discussion a few months ago. The language supports a "cannonball" notation which originally functioned much the way that people are proposing with |>. In Pyret,

[list: 1, 2, 3, 5] ^ map(add-one) ^ filter(is-prime) ^ sum() ^ ...

So, the cannonball notation desugared into adding arguments to the functions.

It didn't take long before they decided that this syntax was too confusing. Why is sum() being called without any arguments? etc. Ultimately, they opted for an elegant currying alternative:

[list: 1, 2, 3, 5] ^ map(_, add-one) ^ filter(_, is-prime) ^ sum() ^ ...

This has the advantage of being more explicit and simplifies the ^ operator to a simple function.

StefanKarpinski commented 9 years ago

Yes, that seems much more reasonable to me. It is also more flexible than currying.

samuela commented 9 years ago

@StefanKarpinski I'm a little confused. Did you mean to say more flexible then chaining (not currying)? After all Pyret's solution was to simply use currying, which is more general than chaining.

prcastro commented 9 years ago

Maybe, if we modify the |> syntax a little bit (I really don't know how hard it is to implement, maybe it conflicts with | and >), we could set something flexible and readable.

Defining something like

foo(x,y) = (y,x)
bar(x,y) = x*y

We would have:

randint(10) |_> log(_,2) |> sum

(1,2) |_,x>  foo(_,x)   |x,_>   bar(_,2) |_> round(_, 2) |> sum |_> log(_, 2)

In other words, we would have an operator like |a,b,c,d> where a, b, c and d would get the returned values of the last expression (in order) and use it in placeholders inside the next one.

If there are no variables inside |> it would work as it works now. We could also set a new stardard: f(x) |> g(_, 1) would get all values returned by f(x) and associate with the _ placeholder.

StefanKarpinski commented 9 years ago

@samuela, what I meant was that with currying you can only omit trailing arguments, whereas with the _ approach, you can omit any arguments and get an anonymous function. I.e. given f(x,y) with currying you can do f(x) to get a function that does y -> f(x,y), but with underscores you can do f(x,_) for the same thing but also do f(_,y) to get x -> f(x,y).

While I like the underscore syntax, I'm still not satisfied with any proposed answer to the question of how much of the surrounding expression it "captures".

ScottPJones commented 9 years ago

what do you do if a function returns multiple results? Would it have to pass a tuple to the _ position? Or could there be a syntax to split it up on the fly? May be a stupid question, if so, pardon!

samuela commented 9 years ago

@StefanKarpinski Ah, I see what you mean. Agreed.

simonbyrne commented 9 years ago

@ScottPJones the obvious answer is to allow ASCII art arrows: http://scrambledeggsontoast.github.io/2014/09/28/needle-announce/

ScottPJones commented 9 years ago

@simonbyrne That looks even worse than programming in Fortran IV on punched cards, like I did in my misspent youth! Just wondered if some syntax like _1, _2, etc. might allow pulling apart a multiple return, or is that just a stupid idea on my part?

MikeInnes commented 9 years ago

@simonbyrne That's brilliant. Implementing that as a string macro would be an amazing GSoC project.

simonbyrne commented 9 years ago

Why is sum() being called without any arguments?

I think that the implicit argument is also one of the more confusing things about do notation, so it would be nice if we could utilise the same convention for that as well (though I realise that it is much more difficult, as it is already baked into the language).

ScottPJones commented 9 years ago

@simonbyrne You don't think it could be done in an unambiguous way? If so, that's something I think is worth breaking (the current do notation), if it can be made more logical, more general, and consistent with chaining.

samuela commented 9 years ago

@simonbyrne Yeah, I totally agree. I understand the motivation for the current do notation but I feel strongly that it doesn't justify the syntactical gymnastics.

patrickthebold commented 9 years ago

@samuela regarding map(f, _) vs just map(f). I agree that some magic desugaring would be confusing, but I do think map(f) is something that should exist. It wouldn't require and sugar just add a simple method to map. eg

map(f::Base.Callable) = function(x::Any...) map(f,x...) end

i.e. map takes a function and then returns a function that works on things that are iterable (more or less).

More generally I think we should lean towards functions that have additional "convenience" methods, rather than some sort of convention that |> always maps data to the first argument (or similar).

In the same vein there could be a

type Underscore end
_ = Underscore()

and a general convention that functions should/could have methods that take underscores in certain arguments, and then return functions that take fewer arguments. I'm less convinced that this would be a good idea, as one would need to add 2^n methods for each function that takes n arguments. But it's one approach. I wonder if it would be possible to not have to explicitly add so many methods but rather hook into the method look up, so that if any arguments are of type Underscore then the appropriate function is returned.

Anyway, I definitely think having a version of map and filter that just take a callable and return a callable makes sense, the thing with the Underscore may or may not be workable.

Tetralux commented 9 years ago

@patrickthebold I would imagine that x |> map(f, _) => x |> map(f, Underscore()) => x |> map(f, x), as you propose, would be the simplest way to implement map(f, _), right? - just have _ be a special entity which you'd program for? Though, I'm uncertain if that would be better than having it automatically inferred by Julia-- presumably using the |> syntax-- rather than having to program it yourself.

Also, regarding your proposal for map - I kinda like it. Indeed, for the current |> that would be quite handy. Though, I imagine it would be ~~simpler~~ better to just implement automatic inferencing of x |> map(f, _) => x |> map(f, x) instead?

@StefanKarpinski Makes sense. Hadn't thought of it quite like that.

patrickthebold commented 9 years ago

Nothing I said would be tied to |> in any way. What I meant regarding the _ would be for example to add methods to < as such:

<(_::Underscore, x) = function(z) z < x end
<(x, _::Underscore) = function(z) x < z end

But again I think this would be a pain unless there was a way to automatically add the appropriate methods.

Again, the thing with the underscores is separate that adding the convenience method to map as outlined above. I do think both should exist, in some form or another.

samuela commented 9 years ago

@patrickthebold Such an approach with a user-defined type for underscore, etc would place a significant and unnecessary burden on the programmer when implementing functions. Having to list out all 2^n of

f(_, x, y) = ...
f(x, _, y) = ...
f(_, _, y) = ...
...

would be very annoying, not to mention inelegant.

Also, your proposition with map would I suppose provide a workaround syntax for map(f) with basic functions like map and filter but in general it suffers from the same complexity issue as the manual underscore approach. For example, for func_that_has_a_lot_of_args(a, b, c, d, e) you'd have to go through the grueling process of typing out each possible "currying"

func_that_has_a_lot_of_args(a, b, c, d, e) = ...
func_that_has_a_lot_of_args(b, c, d, e) = ...
func_that_has_a_lot_of_args(a, b, e) = ...
func_that_has_a_lot_of_args(b, d, e) = ...
func_that_has_a_lot_of_args(a, d) = ...
...

And even if you did, you'd still be faced with an absurd amount of ambiguity when calling the function: Does func_that_has_a_lot_of_args(x, y, z) refer to the definition where x=a,y=b,z=c or x=b,y=d,z=e, etc? Julia would discern between them with runtime type information but for the lay-programmer reading the source code it would be totally unclear.

I think the best way to get underscore currying done right is to simply incorporate it into the language. It would be a very straightforward change to the compiler after all. Whenever an underscore appears in a function application, just pull it out to create a lambda. I started looking into implementing this a few weeks ago but unfortunately I don't think I'll have enough free time in the next few weeks to see it through. For someone familiar with the Julia compiler though it would probably take no more than an afternoon to get things working.

Tetralux commented 9 years ago

@samuela Can you clarify what you mean by, "pull it out to create a lambda"? - I'm curious. I too have wondered how that may be implemented.

Tetralux commented 9 years ago

@patrickthebold Ah - I see. Presumably you could then use such a thing like this: filter(_ < 5, [1:10]) => [1:4] ? Personally, I would find filter(e -> e < 5, [1:10]) easier to read; more consistent - less hidden meaning, though I grant you, it is more concise.

Unless you have an example where it really shines?

patrickthebold commented 9 years ago

@samuela

Also, your proposition with map would I suppose provide a workaround syntax for map(f) with basic functions like map and filter but in general it suffers from the same complexity issue as the manual underscore approach.

I wasn't suggesting that this be done in general, only for map and filter, and possibly a few other places where it seems obvious. To me, that's how map should work: take in a function and return a function. (pretty sure that's what Haskell does.)

would be very annoying, not to mention inelegant.

I think we are in agreement on that. I'd hope there would be a way to add something to the language to handle method invocations where some arguments are of type Underscore. Upon further thought, I think it boils down to having a special character automatically expand into a lambda, or have a special type that automatically expands into a lambda. I don't feel strongly either way. I can see pluses and minuses to both approaches.

@H-225 yes the underscore thing is just a syntactic convenience. Not sure how common it is, but Scala certainly has it. Personally I like it, but I think it's just one of those style things.

samuela commented 9 years ago

@H-225 Well, in this case I think a compelling and relevant example would be function chaining. Instead of having to write

[1, 2, 3, 5]
  |> x -> map(addone, x)
  |> x -> filter(isprime, x)
  |> sum
  |> x -> 3 * x
  |> ...

one could simply write

[1, 2, 3, 5]
  |> map(addone, _)
  |> filter(isprime, _)
  |> sum
  |> 3 * _
  |> ...

I find myself unknowingly using this underscore syntax (or some slight variant) constantly in languages that support it and only realize how helpful it is when transitioning to work in languages that do not support it.

As far as I know, there are currently at least 3.5 libraries/approaches that attempt to address this problem in Julia: Julia's builtin |> function, Pipe.jl, Lazy.jl, and 0.5 for Julia's builtin do notation which is similar in spirit. Not to bash any of these libraries or approaches, but many of them could be greatly simplified if underscore currying was supported by Julia.

rened commented 9 years ago

@samuela if you'd like to play with an implementation of this idea, you could try out FunctionalData.jl, where your example would look like this:

@p map [1,2,3,4] addone | filter isprime | sum | times 3 _

The last part shows how to pipe the input into the second parameter (default is argument one, in which case the _ can be omitted). Feedback very much appreciated!

Edit: the above is simply rewritten to:

times(3, sum(filter(map([1,2,3,4],addone), isprime)))

which uses FunctionalData.map and filter instead of Base.map and filter. Main difference is the argument order, second difference is the indexing convention (see docs). In any case, Base.map can simply be used by reversing the argument order. @p is quite a simple rewrite rule (left to right becomes inner-to-outer, plus support for simple currying: @p map data add 10 | showall becomes

showall(map(data, x->add(x,10)))

StefanKarpinski commented 9 years ago

Hack may introduce something like this: https://github.com/facebook/hhvm/issues/6455. They're using $$ which is off the table for Julia ($ is already too overloaded).

johnmyleswhite commented 8 years ago

FWIW, I really like Hack's solution to this.

StefanKarpinski commented 8 years ago

I like it too, my main reservation being that I'd still kind of like a terser lambda notation that might use _ for variables / slots and it would be good to make sure that these don't conflict.

johnmyleswhite commented 8 years ago

Couldn't one use __? What's the lambda syntax you're thinking of? _ -> sqrt(_)?

StefanKarpinski commented 8 years ago

Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map(_ + 2, v), the real issue being how much of the surrounding expression the _ belongs to.

malmaud commented 8 years ago

Doesn't Mathematica have a similar system for anonymous arguments? How do they handle the scope of the bounding of those arguments? On Tue, Nov 3, 2015 at 9:09 AM Stefan Karpinski notifications@github.com wrote:

Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map( + 2, v), the real issue being how much of the surrounding expression the belongs to.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/5571#issuecomment-153383422.

malmaud commented 8 years ago

https://reference.wolfram.com/language/tutorial/PureFunctions.html, showing the # symbol, is what I was thinking of. On Tue, Nov 3, 2015 at 9:34 AM Jonathan Malmaud malmaud@gmail.com wrote:

Doesn't Mathematica have a similar system for anonymous arguments? How do they handle the scope of the bounding of those arguments? On Tue, Nov 3, 2015 at 9:09 AM Stefan Karpinski notifications@github.com wrote:

Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map( + 2, v), the real issue being how much of the surrounding expression the belongs to.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/5571#issuecomment-153383422.

StefanKarpinski commented 8 years ago

Mathematica uses & to delimit it.

durcan commented 8 years ago

Rather than doing something as general as a shorter lambda syntax (which could take an arbitrary expression and return an anonymous function) we could get around the delimiter problem by confining the acceptable expressions to function calls, and the acceptable variables / slots to entire parameters. This would give us a very clean multi-parameter currying syntax à la Open Dyln. Because the _ replaces entire parameters, the syntax could be minimal, intuitive, and unambiguous. map(_ + 2, _) would translate to x -> map(y -> y + 2, x). Most non-function call expressions that you would want to lambdafy would probably be longer and more amiable to -> or do anyway. I do think the trade-off of usability vs generality would be worth it.

StefanKarpinski commented 8 years ago

@durcan, that sounds promising – can you elaborate on the rule a bit? Why does the first _ stay inside the argument of map while the second one consumes the whole map expression? I'm not clear on what "confining the acceptable expressions to function calls" means, nor what "confining acceptable variables / slots to entire parameters" means...

StefanKarpinski commented 8 years ago

Ok, I think I get the rule, having read some of that Dylan documentation, but I have to wonder about having map(_ + 2, v) work but map(2*_ + 2, v) not work.

JuliaLang / julia

Function chaining #5571