Closed bramtayl closed 9 years ago
This issue seems too broad to be of much practical use.
In cases like this it's best just to give an exhaustive list of suggested changes. Everything is data, so I'm not sure what a data argument is.
The argument order for convert
is very firmly established, and this function is extremely important so we're not going to change it. But please feel free to list other examples.
To elaborate a bit, the argument order for convert
matches call
; convert(T,x)
and T(x)
are related.
I'm new to Julia, but I can list examples as I come across them. Anything in the match family, so ismatch
, match
, eachmatch
, matchall
. map
and broadcast
are also not very compatible with chaining, but I can see why the argument order makes sense. I guess by data I mean the argument that's most likely to be chained. This might be a keep-in-mind-for-the-future issue rather than a go-back-and-change-everything issue. It seems like in Julia, function arguments and type arguments tend to come first (perhaps in resonance with call), which has the side effect of filling my code with a bunch of underscores (via @as _ begin
).
Making the function the first argument to map
is universal. Is there even one language that doesn't do that?
The match functions are more debatable, but python uses this argument order, as do other OO languages like ruby which uses re.match(string)
.
You are right that julia's libraries were not designed with this chaining thing in mind. Fully admitting my lisp bias, I've never wanted many different syntaxes for function calls.
R's apply functions and plyr functions all have data arguments first (with the notable exception of mapply
).
Edit: not to mention as.numeric, as.character, and family
And the notable exception of R's Map
function.
Another one is write
Well then we appear to be at an impasse. It looks like vast numbers of key functions (write, convert, map, ...) are incompatible with this redesign. As you point out, changing all of these is not really practical. However these functions are so fundamental that it wouldn't be practical to do things differently in the future either; certainly we can't have half of our I/O functions use f(obj, io)
and half go the other way.
I think there's an argument to be made for switching the argument order for match
. At present it's inconsistent with search
, replace
, and the rest of the string functions.
...but findin
uses the same order as match
. The good thing with match
and co. is that their arguments can easily be swapped and a deprecation added, there's no ambiguity thanks to the Regex
argument type. That's harder for findin
, though the function could also be renamed to work around this (cf. https://github.com/JuliaLang/julia/issues/10593).
I'm all for consistency, though honestly I would rather have changed all functions to search for their first argument in their second one. Maybe that's just me. Anyway, for chaining, it's not clear to me whether you'd more often pass one or another (both are "data").
Ok, here's a make-shift solution. It allows users to either vectorize or switch around the arguments of functions or both.
using Lazy
# convert singletons to a 1 entry vector
function to_array(x)
if (@> x typeof) <: AbstractArray
x
else
[x]
end
end
# switch the first and second items in a tuple
function switch_tuple(tuple)
if (@> tuple length) == 2
index = [2, 1]
else
index = [2, 1, 3:(@> arguments length)]
end
tuple[index]
end
# return an expression, which suffixes a function and reorders the arguments
function switch(function_symbol::Symbol)
suffixed_function_string = string(function_symbol) * "_s"
suffixed_function_symbol = @> suffixed_function_string parse
quote
function $suffixed_function_symbol(arguments...)
$function_symbol((@> arguments switch_tuple)...)
end
@> $suffixed_function_string parse
end
end
# return an expression, which suffixex a function
# and maps/broadcasts a function over an argument/arguments respectively
function vectorize(function_symbol::Symbol)
suffixed_function_string = string(function_symbol) * "_v"
suffixed_function_symbol = @> suffixed_function_string parse
quote
function $suffixed_function_symbol(arguments...)
arguments = map(to_array, arguments)
if length(arguments) == 1
map($function_symbol, arguments...)
else
broadcast($function_symbol, arguments...)
end
end
@> $suffixed_function_string parse
end
end
# vectorize some functions
@> :vectorize vectorize eval
@> :switch vectorize eval
@> :eval vectorize eval
# lists of functions to reverse and vectorize or just vectorize
reverse_and_vectorize = [:ismatch, :write, :convert]
just_vectorize = [:replace]
@> begin
# first reverse functions in reverse_and_vectorize
reverse_and_vectorize
switch_v
eval_v
# add in just_vectorize functions
vcat(just_vectorize)
# vectorize
vectorize_v
eval_v
end
# test
@> ["a", "b"] ismatch_s_v(r"a")
See also JuliaLang/julia#8450
The argument switch suffers (I think) from unnecessarily copying the arguments. Is there a way to unpack a tuple in a particular order?
Edit: fixed anonymous function issue
@bramtayl why anonymous function (bound to non-const global) to_array
?
Oops I'm still getting used to Julia style function definitions. See edit. It's there to be able to broadcast singletons.
That's kind of clever. But I don't see how this is a huge improvement over write(io, _)
, or just using normal syntax. If you know about write
, it's easy to see what write(io,_)
does, while write_s(io)
and @>
seem pretty obscure to me.
I also think (#8450) that ideally iteration is something you do with an existing function, not something that requires a new definition for each function. Nobody should decide which functions get _v
versions; you can write map(f, x)
when needed. Or maybe this could be part of the operator; for example @.> A write(io, _)
could mean for x in A; write(io, x); end
. But again I would argue the for
loop version is intelligible even to people who don't know the language.
The advantage of write_s would be that you can write out a chain without naming anything. Of course, names make debugging easier, but reading code harder.
@> begin
text
# a whole bunch of string processing
write_s(conn)
end
With out that, your code would look like this:
@as _ begin
text
# string processing with a bunch of unnecessary _'s
write(conn, _)
end
For me, the greater regularity of reusing the same write
function, and not needing to set up a definition to make write_s
exist, make the second version the winner. Maybe others will weigh in.
Given that chaining makes many function arguments implicit (and therefore makes line-local reasoning more difficult), I generally think it makes code harder to read. I also agree that having a single canonical write
function is more important than accommodating macro-based DSL's.
If I had to map, broadcast, or for loop (!) every time I use a function iteratively (pretty much always) AND had to write code that was riddled with under-scores, I'd probably give up and go back R. Consider the chain above without any help:
reverse_calls = map(switch, reverse_and_vectorize)
reverse_symbols = map(eval, reverse_calls)
both_symbols = vcat(reverse_symbols, just_vectorize)
vectorize_calls = map(vectorize, both_symbols)
vectorize_symbols = map(eval, vectorize_calls)
Useful for debugging, but it doesn't seem likely that any of these items cluttering up the environment will be used again.
or, with underscores:
@as _ begin
reverse_and_vectorize
map(switch, _)
map(eval, _)
vcat(_, just_vectorize)
map(vectorize, _)
map(eval, _)
end
Not even going to bother with for loops.
Yes, obviously there is such a thing as too much chaining. You might argue that the argument switching and vectorization should be done in two separate chains. But chaining also organizes code and clarifies structure.
If I had to map, broadcast, or for loop (!) every time I use a function iteratively (pretty much always) AND had to write code that was riddled with under-scores, I'd probably give up and go back R.
To me, a short vectorization syntax is a major need and that's what #8450 is supposed to deal with. In your example, the code would be much shorter already, and you might accept suffering a few underscores for chaining if vectorization allowed merging a few lines of the chain.
Also note that R does not provide any native support for chaining, so it's not like this kind of thing couldn't be done in Julia as well.
As @JeffBezanson said before, we seem to be at an impasse. This issue seems primarily focused on code aesthetics and it seems that several other Julia developers don't share your aesthetic sensibilities.
It sounds like there are several specific functions, like match
, that people would consider changing for consistency. But consistency for the sake of simplifying chaining doesn't seem sufficient to justify making so many breaking changes.
For me the decisive issue when debating these kinds of DSL-specific concerns is this: given that you want alternative surface syntax for writing identical semantics, why not just write an actual DSL that gets translated to Julia code? Why does Julia syntax need to match the syntax of your ideal DSL?
I think people tend to overuse shared-parser DSL's for this kind of use case. If you want truly independent syntax, a separate-parser DSL is the way to go. It has higher start-up cost for the DSL-developer, but completely frees you from having to reach consensus with others about your preferred syntax.
If you play your cards right, you might get armies of @hadley followers switching to Julia in the next few years, all of which are pretty used to chaining (and the kind of things in DataFramesMeta). I'm certain I couldn't tackle writing a new language. But maybe a package?
I, for one, don't see that as a goal worth pursuing given that I work on Julia in my free time. I'd vastly prefer having a language that can be used for the things that R will never be good at than a language that tries to emulate what R can already do well enough.
The problem with the idioms you're advocating for is that they don't come equipped with any fleshed out solutions to the issues of semantics that have held back work on #8450. The surface syntax of a replacement for vectorization is the least difficult part of what needs to be done to remove vectorization from Julia. The important issue is designing a set of semantics that's amenable to compilation to efficient code. That depends on progress on integrating functions into Julia's type system in such a way that multiple dispatch can operate effectively when using higher-order functions. See Jeff's thesis for some ideas about how this might be done and packages like FastAnonymous.jl for interim improvements.
For most applications, the bottleneck is how long it takes to write the code, not how long it takes to run the code.
That's completely false when you work at scale.
Conceded. I thought the point of Julia was to be the best of both worlds. Otherwise, why not just write in Fortran?
@bramtayl I think julia does give a lot of flexibility (more than I've ever seen elsewhere) to have the best of both worlds (although there are still a lot of rough edges, but those are being worked out), and maybe you can accomplish what you want in a package, with all of the power of multiple dispatch and julia macros behind you...
The problem is that we don't have any means for reaching an agreement about what "best" means. My take on this issue is that many of the people involved in this thread have very substantial disagreements about what good code looks like. I'm skeptical that we can resolve such large disagreements about aesthetics by talking them through.
Ok, I'll just keep the code for personal use only.
I agree that productivity is incredibly important, but I don't see how something like chaining syntax is drastically more productive than our normal syntax. As for vectorization, if I thought writing map
every time was a good solution, then #8450 would not be an open issue.
I just realized that it's a bit odd for chaining to work on the first argument. In languages with function currying, delayed arguments are added at the end. For example you could write
x |> map(switch) |> map(eval) |> vcat(_, just_vectorize) |> map(vectorize) |> map(eval)
because map(f)
means x->map(f,x)
. Maybe our functions are designed more for this style.
Maybe it's worth working for consistency in the other direction then? Is that piping to the last argument or to the second argument?
Yes, that's quite possible. I think it should pipe to the last argument.
Here's an extension to whole-sale vectorize all the functions in a Module.
using Lazy
using DataFrames
using DataFramesMeta
@> :typeof vectorize eval
@> :eval vectorize eval
@> :string vectorize eval
@> :convert switch eval
@> :ismatch switch eval vectorize eval
function get_functions(m::Module)
df = @> begin
DataFrame(symbol = @> m names)
@transform(
is_function = ( @> begin
:symbol
eval_v
typeof_v
.==(Function) end ) ,
compatible = ( @> begin
:symbol
string_v
ismatch_s_v( r"^[A-Za-z]" )
convert_s( Vector{Bool} ) end ) )
@where(:is_function & :compatible) end
df[:symbol] end
@> Base get_functions vectorize_v eval_v
Could somebody please explain the use of _
in the above example? (again, sorry for the newbie question, it's just that the only thing I can find with Google is about IJulia history variables, and the JuliaLang docs can't seem to find anything that isn't a alphanumeric string...)
@as
is a function from Lazy.jl. The example given in the Readme (worth checking out for context) is:
# @as lets you name the threaded argmument
@as _ x f(_, y) g(z, _) == g(z, f(x, y))
The benefit of is that you can specify exactly where you want the previous result to be piped into the next expression. It is needed in particular if there is not consistent method of figuring out where to pipe the previous result to (i.e. the first argument, the last argument, etc.). is only a symbol, and @as ~
would work equally as well were it not for interfering with formulas.
Jeff's currying example is from some other language, but you might assume a somewhat equivalent function.
whole-sale vectorize all the functions in a Module
I'm really not a fan of this. It's clearly the wrong abstraction: instead of the function and the iteration being treated as orthogonal (which they really are), it doubles the number of definitions in a module without regard for which of the new definitions actually make sense. Concepts should be composed using general mechanisms, not by concatenating names with underscores.
I continue to fail to understand the advantage of @> m names
over names(m)
. Isn't this just deliberately obscure?
Yeah, I was trying to use chaining as often as possible for illustrative purposes. Might have gone a bit over-board. Agreed that doubling the number of functions in a module is a little ridiculous, but until #8450 gets sorted it it might be useful, especially if no one else starts using _v
for something else.
It's also worth noting that the code above can be rewritten with Lazy's @>>
which pipes to the last argument. This wouldn't have worked for other string processing functions like replace, though.
using Lazy
using DataFrames
using DataFramesMeta
@> :vectorize vectorize eval
@> :eval vectorize eval
@> [:typeof, :string, :ismatch] vectorize_v eval_v
function get_functions(m::Module)
df = @> begin
DataFrame(symbol = @> m names)
@transform(
is_function = ( @>> begin
:symbol
eval_v
typeof_v
.==(Function) end ) ,
compatible = @>> begin
:symbol
string_v
ismatch_v( r"^[A-Za-z]" )
convert( Vector{Bool} ) end )
@where(:is_function & :compatible) end
df[:symbol] end
@> Base get_functions vectorize_v eval_v
Edit: an extension for multiple packages:
function make_functions(m::Module)
quote
@> $m get_functions switch_v eval_v
@> $m get_functions vectorize_v eval_v switch_v eval_v
end
end
@> :make_functions vectorize eval
@> [Base, Lazy] make_functions_v eval_v
I have to say that I find this style of coding pretty inscrutable – it doesn't seem like an improvement in terms of readability or writability. But I'm glad that the macro system lets you experiment like this.
I heard from some people that they like threading/piping because it lets them always read code "left to right, top to bottom", and with nesting/composition they have to find where the expression starts and where it continues to.
Some like to reason about code by describing it with phrases and it's harder to come up with words to describe print(sum(map(x->x-10, map(x->2x, A))))
than it is to describe @>> A map(x->2x) map(x->x-10) sum print
, the latter is pretty straight forward: "I have A, I multiply every element by 2, then I subtract 10 from every element, then I sum it, then I print it."
Some other people have said they get lost in nesting easily, always reading expressions all at once. And some other people just said threading is cooler looking :P
I have no idea how to extend this to work with macro functions, seeing as you can't use the splat operator with them.
the latter is pretty straight forward: "I have A, I multiply every element by 2, then I subtract 10 from every element, then I sum it, then I print it."
Which also exemplifies one additional usage pattern in which this style of appending operations at the end generally helps, that is building expressions step by step at the REPL while looking at the output, shell-style (if performance is not your primary concern, obviously).
Seems like this is a dup of #5571?
Yes, I think this discussion can be continued in #5571.
Making the function the first argument to map is universal. Is there even one language that doesn't do that?
- Ruby
list.map! {|x| x + 1 }
.- Elixir
Enum.map list, fn(x) -> x + 1 end
.- JS
list.map(function(x) { x + 1})
.- Ugly Java
list.stream().map(x -> x + 1).toArray()
.
While 4 of those are OOP, the order of arguments is kinda mimic the functional style when the list goes first and the function goes the second.
The Python and Clojure use another convention when function goes first.
It would be convenient to have data arguments consistently as the first argument. This is particularly useful for chaining. A few examples where the argument order is puzzling:
convert
,ismatch
.