Open andyferris opened 5 years ago
I'm not sure whether using past tense is explicit enough. Anyway, better discuss that in the base Julia repo?
i like this
Ok, further thoughts: why not just make everything lazy? Eager methods could just be optimization methods of Base.collect
While I may somewhat agree with you... I try to follow Base
semantics here for things like map
. It might be too late to start fiddling with that (when it was discussed with earlier versions of Julia I understand it was felt at the time there may be too much run-time overhead with laziness. Even if the compiler is better at allowing zero-cost abstractions these days, complexity still goes up considerably).
Also, some operations aren't obviously better being lazy. groupview
is only partially lazy (full lazy would be much worse!). filtered
cannot possibly preserve array-ness. It's a tricky space, I feel.
I mean, map internally creates a generator and collects it, so... Being super-lazy opens up options for optimizations. For example, mapping a reduce function over truly lazy groups would enable the groupreduce optimization.
For example, mapping a reduce function over truly lazy groups would enable the groupreduce optimization.
Unlike most operations where we can just rely on iteration and separation of concerns and still get optimal performance, for this particular case I think it would be necessary to overload the method explicitly. There's worse complications regarding anonymous functions that can't be introspected and the fact that map
and broadcast
do not work on AbstractDict
in the first place. That is, I couldn't figure out a way to make map(g -> reduce(+, g), grouped(by, itr))
or even sum.(grouped(by, itr))
work. I raged a bit on JuliaLang/julia at the time :)
Of course, for something like mapreduce
this is trivial, as reduce(op, mapview(f, itr); init = ...)
already works out-of-the-box.
I think the solution in that case is pretty easy: collect(Generator(Reduce(f), Grouped(by, iter))
would just have to optimized to groupreduce(f, by, iter)
. I have an implementation of Reduce in JuliennedArrays. Grouped just has to be truly lazy.
I've been happy to seperate the semantics of greedy-vs-lazy operations into seperate functions, like
map
vsmapview
andgroup
vsgroupview
. However, theview
suffix is a little tiresome.I'm wondering if we should follow the example set by
Base.Broadcast
which usesbroadcast(...) = materialize(broadcasted(...))
, wherebroadcasted
is more-or-less a lazy version ofbroadcast
andmaterialize
is something that behaves a bit likecopy
when necessary.That would be something like:
mapview
->mapped
.groupview
->grouped
.joined
instead ofjoin
.filtered
for lazyfilter
.product
is a noun, not a verb, and seems fine being lazy.flatten
vsflattened
?splitdims
atBase
andslice
/slices
seems like a possible naming.sliced
could be a lazy version?Does anyone have any thoughts or opinions?