JuliaData / SplitApplyCombine.jl

Split-apply-combine strategies for Julia
Other
144 stars 15 forks source link

Name proposal for lazy operations: past tense #11

Open andyferris opened 5 years ago

andyferris commented 5 years ago

I've been happy to seperate the semantics of greedy-vs-lazy operations into seperate functions, like map vs mapview and group vs groupview. However, the view suffix is a little tiresome.

I'm wondering if we should follow the example set by Base.Broadcast which uses broadcast(...) = materialize(broadcasted(...)), where broadcasted is more-or-less a lazy version of broadcast and materialize is something that behaves a bit like copy when necessary.

That would be something like:

Does anyone have any thoughts or opinions?

nalimilan commented 5 years ago

I'm not sure whether using past tense is explicit enough. Anyway, better discuss that in the base Julia repo?

bramtayl commented 5 years ago

i like this

bramtayl commented 5 years ago

Ok, further thoughts: why not just make everything lazy? Eager methods could just be optimization methods of Base.collect

andyferris commented 5 years ago

While I may somewhat agree with you... I try to follow Base semantics here for things like map. It might be too late to start fiddling with that (when it was discussed with earlier versions of Julia I understand it was felt at the time there may be too much run-time overhead with laziness. Even if the compiler is better at allowing zero-cost abstractions these days, complexity still goes up considerably).

Also, some operations aren't obviously better being lazy. groupview is only partially lazy (full lazy would be much worse!). filtered cannot possibly preserve array-ness. It's a tricky space, I feel.

bramtayl commented 5 years ago

I mean, map internally creates a generator and collects it, so... Being super-lazy opens up options for optimizations. For example, mapping a reduce function over truly lazy groups would enable the groupreduce optimization.

andyferris commented 5 years ago

For example, mapping a reduce function over truly lazy groups would enable the groupreduce optimization.

Unlike most operations where we can just rely on iteration and separation of concerns and still get optimal performance, for this particular case I think it would be necessary to overload the method explicitly. There's worse complications regarding anonymous functions that can't be introspected and the fact that map and broadcast do not work on AbstractDict in the first place. That is, I couldn't figure out a way to make map(g -> reduce(+, g), grouped(by, itr)) or even sum.(grouped(by, itr)) work. I raged a bit on JuliaLang/julia at the time :)

Of course, for something like mapreduce this is trivial, as reduce(op, mapview(f, itr); init = ...) already works out-of-the-box.

bramtayl commented 5 years ago

I think the solution in that case is pretty easy: collect(Generator(Reduce(f), Grouped(by, iter)) would just have to optimized to groupreduce(f, by, iter). I have an implementation of Reduce in JuliennedArrays. Grouped just has to be truly lazy.