JuliaData / SplitApplyCombine.jl

Split-apply-combine strategies for Julia
Other
144 stars 15 forks source link

Now with the power of superlazy #16

Closed bramtayl closed 5 years ago

bramtayl commented 5 years ago

This isn't meant to be a serious PR, but just a proof of concept that super-lazy concepts can be useful here. I've worked to reduce as much as possible to Base iterators (left_outer_join stubbornly remains). I've included several notes below:

bramtayl commented 5 years ago

The dims operations seemed like they overlapped heavily with JuliennedArrays. I've removed them. I was hoping I could convince you to contribute there instead?

mapmany is just flatten(Generator) so I got rid of that too. flatten, product, and mapview (Generator) are all already in Base. In order to make them work I needed to do a bit of type piracy to add superpowers to them (which is in map.jl). I'm not sure what the Julia gods will think of that. There is a stalled PR for adding indexing to iterators. I also in once case had to replace eltype with @default_eltype. what used to be eager group is now easy to get just by collecting the views. if there's optimizations missed here we could overload collect again I've added some imports from other packages (LightQuery, JuliennedArrays). Actually looking back right now LightQuery isn't really necessary. Reduce from JuliennedArrays is necessary to get optimizations (I've been hoping for a while we can get something similar into Base).

andyferris commented 5 years ago

Thanks for experimenting. :)

General remarks (not so much about the PR, but the ideas):

I generally want my lazy containers to be fully featured and powerful. For example, I think it's reasonable for a lazily mapped array to be an AbstractArray rather than just an iterator (or even just an iterator with indexing added). This is because some containers (like AbstractArrays) have interesting super-powers like matrix multiplicaiton, and certain operations might depend on the type of container rather being the same for all Generators, so I feel there might be a good reason for the lazily mapped container to end up in the correct part of the type tree.

I think it might be OK, in theory, to overload collect on certain kids of Generator. This does land you in the interesting situation that syntax infolving [ and ] might not create Arrays (or even AbstractArrays) when some users might expect that to be guaranteed - I'm not exactly sure.

What I really would like (but might get me smitten by the "Julia gods") is to overload Generator(f, x) so that Generator(f, a::AbstractArray) returns a GeneratedArray <: AbstractArray, and so on. Julia constructors are just generic functions (the true constructor is $(Expr(:new, T, x...))) so we can actually overload that. The lowered AST for generators returns the call to Generator, but in theory it would be easy to change that to a more generic factory function, and let everyone have at-it.

bramtayl commented 5 years ago

One thing I was playing around with was an Inferable{F} function wrapper. When wrapped around a function it gave "permission" for base to use promote_op and @default_eltype to determine the eltype. Then calling Base.Generator and Base.Iterators.product on Inferable and AbstractArrays returned the lazy AbstractArray interfaces that are in this package.