chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

[Patterns] should it be possible to specify the locality of a distributed map's keys? How? #18964

Open lydia-duncan opened 2 years ago

lydia-duncan commented 2 years ago

This issue is part of a series of issues to design the interface for collections across serial, parallel, and distributed contexts. Our goal is to determine what we think is reasonable to support and what doesn't seem useful. Additional patterns welcome (but it would be best to put them in their own issue for discussion, likely)

This example uses a DistributedSet when setting the keys, but other situations can be imagined

var s = new DistributedSet(...);
...
var m = new DistributedMap(...);
coforall loc in s.locales {
  on loc {
    var localKeys = s.getLocalKeys();
    forall k in localKeys {
       // Would map determine where the key should live based on where the key is first given a value?
       // Or should the map have to do something special to ensure it is placed on that locale?
       // Should the user have control over it at all?
       // Should there be a way to add the entire set with their locales to be used as the keys in one go?
      m.add(k, someVal);
    }
  }
}
mppf commented 2 years ago

I think that distributed maps with different policies for which locales to store the key-value pairs need to be different types.

I can think of several strategies off-hand:

  1. the map load balances new keys (i.e, adds to the locale with the fewest keys at the time of the add)
  2. the map always adds keys on whatever locale the add call is made
  3. the map uses a hashing function to choose which locale

I think all of these are possible and should be implementable. (But not necessarily in the same collection). I would imagine that 2 or 3 are reasonable starting points.

jhh67 commented 2 years ago

For #3 I assume the hashing function can be specified? I'm thinking about k-mer counting in which they use specialized hash functions to partition the k-mers.

mppf commented 2 years ago

Yes I would expect that. That is what we have with distributed associative today (although that has other issues).

vasslitvinov commented 2 years ago

Should there be a way to add the entire set with their locales to be used as the keys in one go?

Knowing the set of keys at the point where a new map or set is created allows for faster creation, so (I suggest) should be provided as one of the ways to create a map or a set. Possibly as an initializer overload, or in other ways.

This is orthogonal to the key distribution strategy.

bradcray commented 2 years ago

@mppf: For options 1 and 2, how are you imagining subsequent accesses to the key will find it again? Some sort of global or globally-distributed directory that must be searched to find the locale in question?

(I've only been imagining case 3 for this reason).

mppf commented 2 years ago

The simplest answer would be that such a data structure would check all of the locales for those keys. I think that such a data structure could exist and be useful. Some sort of distributed directory would also be possible. (One thing about this issue is that I think we're discussing "How does this imagined data structure impact creating a somewhat unified API for collections?" rather than "Should we make this imagined data structure?". Of course we need to draw a line somewhere if we aren't trying to create an API that can support all map-like distributed data structures).