andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
282 stars 28 forks source link

Define a divide-and-conquer parallelism API? #14

Open tkf opened 4 years ago

tkf commented 4 years ago

Hi, I just registered a small package SplittablesBase.jl which is used from Transducers.jl. Essentially, you can support parallel reduce and everything that can be derived from it (e.g., map) by defining a single function halve(collection). I wonder if you are interested in defining it.

andyferris commented 4 years ago

@tkf thanks - with the ordering changes in #13 I'm thinking we can do things like ensure tokens have a total ordering and that you can find the first and last token as well as a way of finding the (approximate) midpoint and iterating through sub-intervals. Therefore I imagine implementing this in a generic way should be possible.

I haven't really thought through what order such dependencies should take. So far I've been keeping dependencies out because, well, IMO Julia 2.0 deserves an ordered dictionary in Base as well as enhancements to the AbstractDict interface, and I intened this to be the playground for that.

tkf commented 4 years ago

Actually, I bit the bullet and defined halve for Base.Dict using some implementation details of iterate(::Dict). So, I think a similar trick can be used even for "old" HashDictionary too. I also added halve spec for unordered collection: https://juliafolds.github.io/SplittablesBase.jl/dev/#SplittablesBase.halve Of course, it'd be great if #13 helps implementing this.

I also want to move SplittablesBase.jl to Base too. I think we need something like it for composable data-parallel algorithms. So let's hope both of them get into Base so that we don't have to think about dependency issue :laughing:

Meanwhile, it's possible to implement halve outside of Dictionaries.jl. But from my experience from implementing iterate(::Dict), it requires to touch some internals of the dictionary. So, I think it's better to do it inside Dictionaries.jl. One mid-ground option may be to implement unexported Dictionaries.halve in Dictionaries.jl so that I can define SplittablesBase.halve(::AbstractDictionary) at import-time via @require.

andyferris commented 4 years ago

Yes. I will need approximate token bisection for searchsortedxxx functions when they land, so no worries.

To be clear, yes we can define halve for any concrete type, I was proposing to have it implemented for all tokenizable AbstractDictionary. I'm aiming for all the free stuff you get with AbstractArray by defining axes, IndexStyle, getindex, isassigned, setindex! and similar.

tkf commented 4 years ago

Oh, I see. Yes, it'd be great if there is a way to define halve on abstract types. It was a bit daunting to think that "complex" collections like dictionaries might need halve implementation for each concrete type. Token-based API sounds like a good "middleware" for this.