JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.35k stars 5.46k forks source link

Function chaining #5571

Closed shelakel closed 3 years ago

shelakel commented 10 years ago

Would it be possible to allow calling any function on Any so that the value is passed to the function as the first parameter and the parameters passed to the function call on the value is added afterwards? ex.

sum(a::Int, b::Int) -> a + b

a = 1
sum(1, 2) # = 3
a.sum(2) # = 3 or
1.sum(2) # = 3

Is it possible to indicate in a deterministic way what a function will return in order to avoid run time exceptions?

JeffBezanson commented 10 years ago

The . syntax is very useful, so we aren't going to make it just a synonym for function call. I don't understand the advantage of 1.sum(2) over sum(1,2). To me it seems to confuse things.

Is the question about exceptions a separate issue? i think the answer is no, aside from wrapping a function body in try..catch.

shelakel commented 10 years ago

The 1.sum(2) example is trivial (I also prefer sum(1,2)) but it's just to demonstrate that a function isn't owned per se by that type ex. 1 can be passed to a function with the first parameter being a Real, not just to functions that expect the first parameter to be an Int.

Edit: I might have misunderstood your comment. Dot functions will be useful when applying certain design patterns such as the builder pattern commonly used for configuration. ex.

validate_for(name).required().gt(3) 
# vs 
gt(required(validate_for(name)), 3) 

The exceptions I was just referring to is due to functions returning non-deterministic results (which is anyway bad practice). An example would be calling a.sum(2).sum(4) where .sum(2) sometimes return a String instead of an Int but .sum(4) expects an Int. I take it the compiler/runtime is already smart enough to evaluate such circumstances - which would be same when nesting the function sum(sum(1, 2), 4) - but the feature request would require extending said functionality to enforce type constraints on dot functions.

ssfrr commented 10 years ago

One of the use cases people seem to like is the "fluent interface". It's sometimes nice in OOP APIs when methods return the object, so you can do things like some_obj.move(4, 5).scale(10).display()

For me I think that this is better expressed as function composition, but the |> doesn't work with arguments unless you use anon. functions, e.g. some_obj |> x -> move(x, 4, 5) |> x -> scale(x, 10) |> display, which is pretty ugly.

One option to support this sort of thing would be if |> shoved the LHS as the first argument to the RHS before evaluating, but then it couldn't be implemented as a simple function as it is now.

Another option would be some sort of @composed macro that would add this sort of behavior to the following expression

You could also shift responsibility for supporting this to library designers, where they could define

function move(obj, x, y)
    # move the object
end

move(x, y) = obj -> move(obj, x, y)

so when you don't supply an object it does partial function application (by returning a function of 1 argument) which you could then use inside a normal |> chain.

kmsquire commented 10 years ago

Actually, the definition of |> could probably be changed right now to the behavior your asking for. I'd be for it.

On Monday, January 27, 2014, Spencer Russell notifications@github.com wrote:

One of the use cases people seem to like is the "fluent interface". It's sometimes nice in OOP APIs when methods return the object, so you can do things like some_obj.move(4, 5).scale(10).display()

For me I think that this is better expressed as function composition, but the |> doesn't work with arguments unless you use anon. functions, e.g. some_obj |> x -> move(x, 4, 5) |> x -> scale(x, 10) |> display, which is pretty ugly.

One option to support this sort of thing would be if |> shoved the LHS as the first argument to the RHS before evaluating, but then it couldn't be implemented as a simple function as it is now.

Another option would be some sort of @composed macro that would add this sort of behavior to the following expression

You could also shift responsibility for supporting this to library designers, where they could define

function move(obj, x, y)

move the object

end

move(x, y) = obj -> move(obj, x, y)

so when you don't supply an object it does partial function application (by returning a function of 1 argument) which you could then use inside a normal |> chain.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5571#issuecomment-33408448 .

shelakel commented 10 years ago

ssfrr I like the way you think! I was unaware of the function composition |>. I see there's recently been a similar discussion [https://github.com/JuliaLang/julia/issues/4963].

kmsquire I like the idea of extending the current function composition to allow you to specify parameters on the calling function ex. some_obj |> move(4, 5) |> scale(10) |> display. Native support would mean one less closure, but what ssfrr suggested is a viable way for now and as an added benefit it should also be forward compatible with the extended function composition functionality if it gets implemented.

Thanks for the prompt responses :)

kmsquire commented 10 years ago

Actually, @ssfrr was correct--it isn't possible to implement this as a simple function.

jakebolewski commented 10 years ago

What you want are threading macros (ex. http://clojuredocs.org/clojure_core/clojure.core/-%3E). Unfortunate that @-> @->> @-?>> is not viable syntax in Julia.

ssfrr commented 10 years ago

Yeah, I was thinking that infix macros would be a way to implement this. I'm not familiar enough with macros to know what the limitations are.

kmsquire commented 10 years ago

I think this works for @ssfrr's compose macro:

Edit: This might be a little clearer:

import Base.Meta.isexpr
_ispossiblefn(x) = isa(x, Symbol) || isexpr(x, :call)

function _compose(x)
    if !isa(x, Expr)
        x
    elseif isexpr(x, :call) &&    #
        x.args[1] == :(|>) &&     # check for `expr |> fn`
        length(x.args) == 3 &&    # ==> (|>)(expr, fn)
        _ispossiblefn(x.args[3])  #

        f = _compose(x.args[3])
        arg = _compose(x.args[2])
        if isa(f, Symbol)
            Expr(:call, f, arg) 
        else
            insert!(f.args, 2, arg)
            f
        end
    else
        Expr(x.head, [_compose(y) for y in x.args]...)
    end
end

macro compose(x)
    _compose(x)
end
julia> macroexpand(:(@compose x |> f |> g(1) |> h('a',"B",d |> c(fred |> names))))
:(h(g(f(x),1),'a',"B",c(d,names(fred))))
StefanKarpinski commented 10 years ago

If we're going to have this |> syntax, I'd certainly be all for making it more useful than it is right now. Using just to allow putting the function to apply on the right instead of the left has always seemed like a colossal waste of syntax.

malmaud commented 10 years ago

+1. It's especially important when you are using Julia for data analysis, where you commonly have data transformation pipelines. In particular, Pandas in Python is convenient to use because you can write things like df.groupby("something").aggregate(sum).std().reset_index(), which is a nightmare to write with the current |> syntax.

cdsousa commented 10 years ago

:+1: for this.

(I'd already thought in suggesting the use of the .. infix operator for this (obj..move(4,5)..scale(10)..display), but the operator |> will be nice too)

malmaud commented 10 years ago

Another possibility is adding syntactic sugar for currying, like f(a,~,b) translating to x->f(a,x,b). Then |> could keep its current meaning.

ssfrr commented 10 years ago

Oooh, that would be a really nice way to turn any expression into a function.

Possibly something like Clojure's anonymous function literals, where #(% + 5) is shorthand for x -> x + 5. This also generalizes to multiple arguments with %1, %2, etc. so #(myfunc(2, %1, 5, %2) is shorthand for x, y -> myfunc(2, x, 5, y)

Aesthetically I don't think that syntax fits very well into otherwise very readable julia, but I like the general idea.

To use my example above (and switching to @malmaud's tilde instead of %), you could do

some_obj |> move(~, 4, 5) |> scale(~, 10) |> display

which looks pretty nice.

This is nice in that it doesn't give the first argument any special treatment. The downside is that used this way we're taking up a symbol.

Perhaps this is another place where you could use a macro, so the substitution only happens within the context of the macro.

StefanKarpinski commented 10 years ago

We obviously can't do this with ~ since that's already a standard function in Julia. Scala does this with _, which we could also do, but there's a significant problem with figuring out what part of the expression is the anonymous function. For example:

map(f(_,a), v)

Which one does this mean?

map(f(x->x,a), v)
map(x->f(x,a), v)
x->map(f(x,a), v)

They're all valid interpretations. I seem to recall that Scala uses the type signatures of functions to determine this, which strikes me as unfortunate since it means that you can't really parse Scala without knowing the types of everything. We don't want to do that (and couldn't even if we wanted to), so there has to be a purely syntactic rule to determine which meaning is intended.

ssfrr commented 10 years ago

Right, I see your point on the ambiguity of how far to go out. In Clojure the whole expression is wrapped in #(...) so it's unambiguous.

In Julia is it idiomatic to use as don't-care value? Like `x, = somfunc()ifsomefunc` returns two values and you only want the first one?

To solve that I think we'd need macro with an interpolation-like usage:

some_obj |> @$(move($, 4, 5)) |> @$(scale($, 10)) |> display

but again, I think it's getting pretty noisy at that point, and I don't think that @$(move($, 4, 5)) gives us anything over the existing syntax x -> move(x, 4, 5), which is IMO both prettier and more explicit.

I think this would be a good application of an infix macro. As with #4498, if whatever rule defines functions as infix applied to macros as well, we could have a @-> or @|> macro that would have the threading behavior.

malmaud commented 10 years ago

Ya, I like the infix macro idea, although a new operator could just be introduced for this use in lieu of having a whole system for inplace macros. For example, some_obj ||> move($,4,5) ||> scale($, 10) |> disp or maybe just keep |> but have a rule that x |> f implicitly transforms into x |> f($): some_obj |> scale($,10) |> disp

meglio commented 10 years ago

Folks, it all really looks ugly: |> ||> etc. So far I found out Julia's syntax to be so clear that these things discussed above doesn't look so pretty if compared to anything else.

In Scala it's probably the worst thing - they have so much operators like ::, :, <<, >> +:: and so on - it just makes any code ugly and not readable for one without a few months of experience in using the language.

johnmyleswhite commented 10 years ago

Sorry to hear you don't like the proposals, Anton. It would be helpful if you made an alternative proposal.

meglio commented 10 years ago

Oh sorry, I am not trying to be unkind. And yes - critics without proposals are useless.

Unfortunately I am not a scientist constructing languages so I just do not know what to propose... well , except making methods optionally owned by objects as it is in some languages.

malmaud commented 10 years ago

I like the phrase "scientist constructing languages" - it sounds much more grandiose than numerical programmers sick of Matlab.

I feel that almost every language has a way to chain functions - either by repeated application of . in OO languages, or special syntax just for that purpose in more functional languages (Haskell, Scala, Mathematica, etc.). Those latter languages also have special syntax for anonymous function arguments, but I don't think Julia is really going to go there.

I'll reiterate support for Spencer's proposal - x |> f(a) get translated into f(x, a), very analogously to how do blocks works (and it reinforces a common theme that the first argument of a function is privileged in Julia for syntactic sugar purposes). x |> f is then seen as short-hand for x |> f(). It's simple, doesn't introduce any new operators, handles the vast majority of cases that we want function chaining for, is backwards-compatible, and fits with existing Julia design principles.

JeffBezanson commented 10 years ago

I also think that is the best proposal here, main problem being that it seems to preclude defining |> for things like I/O redirection or other custom purposes.

ssfrr commented 10 years ago

Just to note, . is not a special function chaining syntax, but it happens to work that way if the function on the left returns the object it just modified, which is something that the library developer has to do intentionally.

Analogously, in Julia a library developer can already support chaining with |> by defining their functions of N arguments to return a function of 1 argument when given N-1 arguments, as mentioned here

That would seem to cause problems if you want your function to support variable number of args, however, so having an operator that could perform the argument stuffing would be nice.

@JeffBezanson, it seems that this operator could be implemented if there was a way to do infix macros. Do you know if there's an ideological issue with that, or is just not implemented?

kmsquire commented 10 years ago

Recently, ~ was special-cased so that it quoted its arguments and calls the macro @~ by default. |> could be made to do the same thing.

Of course, in a few months, someone will ask for <| to do the same...

On Thursday, February 6, 2014, Spencer Russell notifications@github.com wrote:

Just to note, . is not a special function chaining syntax, but it happens to work that way if the function on the left returns the object it just modified, which is something that the library developer has to do intentionally.

Analogously, in Julia a library developer can already support chaining with |> by defining their functions of N arguments to return a function of 1 argument when given N-1 arguments, as mentioned herehttps://github.com/JuliaLang/julia/issues/5571#issuecomment-33408448

That would seem to cause problems if you want your function to support variable number of args, however, so having an operator that could perform the argument stuffing would be nice.

@JeffBezanson https://github.com/JeffBezanson, it seems that this operator could be implemented if there was a way to do infix macros. Do you know if there's an ideological issue with that, or is just not implemented?

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5571#issuecomment-34374347 .

ssfrr commented 10 years ago

right, I definitely wouldn't want this to be a special case. Handling it in your API design is actually not that bad, and even the variable arguments limitation isn't too much of an issue if you have type annotations to disambiguate.

function move(obj::MyType, x, y, args...)
    # do stuff
    obj
end

move(args...) = obj::MyType -> move(obj, args...)

I think this behavior could be handled by a @composable macro that would handle the 2nd declaration.

The infix macro idea is attractive to me in the situation where it would be unified with declaring infix functions, which is discussed in #4498.

meglio commented 10 years ago

Why Julia creators are so much against allowing objects to contain their own methods? Where could I read more about that decision? Which thoughts and theory are behind that decision?

ihnorton commented 10 years ago

@meglio a more useful place for general questions is the mailing list or the StackOverflow julia-lang tag. See Stefan's talk and the archives of the users and dev lists for previous discussions on this topic.

porterjamesj commented 10 years ago

Just chiming in, to me the most intuitive thing is to have some placeholder be replaced by the value of the previous expression in the sequence of things you're trying to compose, similar to clojure's as-> macro. So this:

@as _ begin
    3+3
    f(_,y)
    g(_) * h(_,z)
end

would be expanded to:

g(f(3+3,y)) * h(f(3+3,y),z)

You can think of the expression on the previous line "dropping down" to fill the underscore hole on the next line.

I started sketching a tiny something like this last quarter in a bout of finals week procrastination.

We could also support a oneliner version using |>:

@as _ 3+3 |> f(_,y) |> g(_) * h(_,z)
kmsquire commented 10 years ago

@porterjamesj, I like that idea!

JeffBezanson commented 10 years ago

I agree; that is pretty nice, and has an appealing generality. On Feb 7, 2014 3:19 PM, "Kevin Squire" notifications@github.com wrote:

@porterjamesj https://github.com/porterjamesj, I like that idea!

Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/5571#issuecomment-34497703 .

staticfloat commented 10 years ago

I like @porterjamesj's idea not only because is a breath of fresh air, but because it seems much more flexible than previous ideas. We're not married to only using the first argument, we have free reign of the choice of intermediate variable, and this also seems like something that we can implement right now without having to add new syntax or special-cases to the language.

Note that in Julia, because we don't do much of the obj.method(args...) pattern, and instead do the method(obj, args...) pattern, we tend not to have methods that return the objects they operate on for the express purpose of method chaining. (Which is what jQuery does, and is fantastic in javascript). So we don't save quite as much typing here, but for the purpose of having "pipes" setup between functions, I think this is really nice.

porterjamesj commented 10 years ago

Given that clojure's -> and ->> are just special cases of the above, and fairly common, we could probably implement those pretty easily too. Although the question of what to call them is a bit tricky. Maybe @threadfirst and @threadlast?

cdsousa commented 10 years ago

I like the idea of this being a macro too.

Isn't it better if the expansion, following the example, is something like

tmp = 3+3; tmp = f(tmp); return h(tmp, z)

to avoid multiple calls to the same operation? (Maybe that was already implicit in @porterjamesj's idea)

Another suggestion: would it be possible that the macro expands the shortcuts f to f(_) and f(y) to f(_,y)? Maybe it will be too much, but I think that then we have an option to use placeholder only when needed... (the shortcuts must, however, be allowed only on alone function calls, not on expressions like the g(_) * h(_,z) above)

porterjamesj commented 10 years ago

@cdsousa the point about avoiding multiple calls is a good one. The clojure implementation uses sequential let bindings to achieve this; I'm not sure if we can get away with this though because I don't know enough about the performance of our let.

ssfrr commented 10 years ago

So is the @as macro using line breaks and => as split points to decide what's the substitution expression and what's getting substituted?

JeffBezanson commented 10 years ago

let performance is good; now it can be as fast as a variable assignment when possible, and also pretty fast otherwise.

porterjamesj commented 10 years ago

@ssfrr in my toy implementation is just filters out all the linebreak related nodes that the parser inserts (N.B., I don't really understand all these, it would probably be good to have documentation on them in the manual) and then reduces the substitution over the list of expressions that remains. Using let would be better though I think.

kmsquire commented 10 years ago

@cdsousa:

Another suggestion: would it be possible that the macro expands the shortcuts f to f(_) and f(y) to f(_,y)

f to f(_) makes sense to me. For the second, I'm of the opinion that explicitly specifying the location is better, since reasonable people could argue that either f(_,y) or f(y,_) is more natural.

Given that clojure's -> and ->> are just special cases of the above, and fairly common, we could probably implement those pretty easily too. Although the question of what to call them is a bit tricky. Maybe @threadfirst and @threadlast?

I think specifying the location explicity with f(_,y...) or f(y..., _) allows the code to be quite understandable. While the extra syntax (and operators) make sense in Clojure, we don't really have additional operators available, and I think the additional macros would generally make the code less clear.

So is the @as macro using line breaks and => as split points to decide what's the substitution expression and what's getting substituted?

I would think it more natural to use |> as a split point, since it is already used for pipelining

MikeInnes commented 10 years ago

Just so you know, there's an implementation of the threading macro in Lazy.jl, which would lets you write, for example:

@>> range() map(x->x^2) filter(iseven)

On the plus side, it doesn't require any language changes, but it gets a bit ugly if you want to use more than one line.

I could also implement @as> in Lazy.jl if there's interest. Lazy.jl now has an @as macro, too.

pao commented 10 years ago

You can also do something like this (though using a Haskell-like syntax) with Monads.jl (note: it needs to be updated to use current Julia syntax). But I suspect that a specialized version for just argument threading should be able to avoid the performance pitfalls the general approach has.

nolta commented 9 years ago

Lazy.jl looks like a very nice package, and actively maintained. Is there a compelling reason this needs to be in Base?

gregid commented 9 years ago

How will function chaining work with functions returning multiple values? What would be the result of chaining eg.:

function foo(a,b)
    a+b, a*b   # x,y respectively
end

and bar(x,z,y) = x * z - y be?

Wouldn't it require a syntax like bar(_1,z,_2) ?

oxinabox commented 9 years ago

Throwing in another example:

data = [2.255, 3.755, 6.888, 7.999, 9.001]

The clean way to write: log(sum(round(data))) is data|>round|>sum|>log But if we wanted to do a base 2 log, and wanted to round to 3 decimals, then: we can only use the first form: log(2,sum(round(data,3)))

But ideally we would like to be able to do: data|>round(_,3)|>sum|>log(2,_) (or similar)

oxinabox commented 9 years ago

I have made a prototype for how I suggest it should work. https://github.com/oxinabox/Pipe.jl

It does not solve @gregid's point, but I am working on that now. It also does not handle the need to expand the arguments

It is similar to @one-more-minute 's Lazy.jl threading macros but keeps the |> symbol for readability (personal preference).

I'll slowly make it into a package, perhaps, at some point

shashi commented 9 years ago

One more option is:

data |>   x -> round(x,2)  |> sum |>  x -> log(2,x)

Although longer than log(2,sum(round(data,2))) this notation sometimes helps readability.

oxinabox commented 9 years ago

@shashi that is not bad, didn't think of that, I think generally too verbose to be easily readable

https://github.com/oxinabox/Pipe.jl Now does solve @gregid's problem. Though if you ask for both _[1] and _[2] it does this by making multiple calls to the subsitution Which I am not certain is the most desirable behavour.

Tetralux commented 9 years ago

As an outsider, I think the pipeline operator would benefit from adapting F#'s treatment of it. Granted, F# has currying, but some magic could perhaps be done on the back end to have it not require that. Like, in the implementation of the operator, and not the core language.

This would make [1:10] |> map(e -> e^2) result in [1, 4, 9, 16, 25, 36, 49, 64, 81, 100].

Looking back, @ssfrr alluded to this, but the obj argument in their example would be automatically given to map as the second argument in my example, thus saving programmers from having to define their functions to support it.

StefanKarpinski commented 9 years ago

What do you propose that it mean?

On Jun 5, 2015, at 5:22 PM, H-225 notifications@github.com wrote:

As an outsider, I think one of the better ways to do this would be to adapt F#'s treatment of it. Granted, F# has currying, but some magic could perhaps be done on the back end to have it not require that. Like, in the implementation of the operator, and not the core language.

This would make [1:10] |> map(e -> e^2) result in [1, 4, 9, 16, 25, 36, 49, 64, 81, 100].

Personally, I think that it nice and clear without being too verbose.

Obviously, one could write result = map(sqr, [1:10]), but they why have the pipeline operator at all? Perhaps there is something I'm missing?

— Reply to this email directly or view it on GitHub.

Tetralux commented 9 years ago

@StefanKarpinski Basically, have the operator work like either:

Perhaps have an interface pattern that any function to be used with the operator takes the data to operate on as the either the first or last argument, depending on which of the above is selected to be that pattern. So, for the map function as an example, map would either be map(func, data) or map(data, func).

Is that any clearer?

hayd commented 9 years ago

Lazy.jl looks like a very nice package, and actively maintained. Is there a compelling reason this needs to be in Base?

I think this is the important question here.