Open vasslitvinov opened 3 years ago
we could also redefine a "layout" to be a special case of a "distribution" ("how are these indices distributed (stored) into this single memory?"), rather than a disjoint concept
This is appealing to me partly because I think dmapped
doesn't flow well in English descriptions of code - I'd rather say "The array declaration specifies a distribution" than "The array declaration specifies a domain map". I think users with some HPC background know what "distributed data" or a "distributed array" are in a general sense but "domain map" is Chapel-specific jargon.
Additionally we have relatively few layouts and so they seem to be more of a corner case than something we need to create the syntax around. IIRC before dmapped
we had a distributed
keyword and I guess I find that more intuitive but I know it's not great to go backwards on these decisions.
Regarding new dmap
it seems to me that it should just be what happens when you create a Block distribution, say. If we need to have newBlockArr
we could just as well have newBlockDist
that returns the dmap
-wrapped class. Along those same lines, the fact that we need newBlockArr
at all is arguably pointing at some insufficiency in the dmapped
syntax. Could we achieve the same idea with something more like an explicit array initialization? E.g. what if you could use new
to construct Chapel arrays/domains more broadly - could we use that strategy to solve this same problem?
const D: domain(2) dmapped Block(boundingBox=Space) = Space;
// could it be this?
const D = new BlockDom(boundingBox=Space, initFrom=Space)
(I know that in the implementation, BlockDom
and _domain(BlockDom)
are different types; but this seems to be an implementation detail that we could paper over in some way for such a new
expression).
we could also redefine a "layout" to be a special case of a "distribution" ("how are these indices distributed (stored) into this single memory?"), rather than a disjoint concept
I like this direction.
There are few things that we may have to wrestle with, but I don't see them as blockers:
distribution.isDistributed
or distribution.distributed
but they seem.... tautological?Along those same lines, the fact that we need newBlockArr at all is arguably pointing at some insufficiency in the dmapped syntax. Could we achieve the same idea with something more like an explicit array initialization? E.g. what if you could use new to construct Chapel arrays/domains more broadly - could we use that strategy to solve this same problem?
To me, the annoyance that those helpers work around is the fact that you need to specify the boundingBox/startIdx etc and it is not about the dmapped
syntax per se. If we were to have mapped
for example that read better in English, the following seems pretty sweet to me:
var arr: [{1..10} mapped Block] int;
whereas
var arr: [{1..10} mapped Block({1..10})] int;
is significantly worse to me.
In general, I use newBlock*
et al. a lot, but if we had a cleaner way of declaring arrays like the above, I would use them.
The directions that @mppf and @e-kayrakli are discussing here definitely align with my own thinking about the problems with dmapped
and some directions forward (though, left on an island, I might come up with slightly different details). In particular:
dmap
dmapped
clunky as a keyword relative to our other keywordsAt some point in the distant past, @vasslitvinov proposed replacing dmapped
with something more like implemented
or stored
and supporting it on other types as well, though I'm having trouble remembering what types (syncs / singles / atomics come to mind?)
w.r.t. {1..10} mapped Block
, it's been proposed several times that we have a version of Block
that infers its bounding box from the first non-empty domain created over it (and similarly Cyclic
would infer its starting index in that way). I think we've even prototyped this a few times, but never successfully gotten it on master, though I can't recall whether that was for good reasons or bad.
I find this convenience feature attractive (particularly to not have to explain the bounding box argument in every Chapel tutorial I give), but it does potentially suggest that if I later re-size my domain from 1..10
to 1..20
it will be re-blocked (which we don't really support today). So there's either a big user education problem looming here ("Why is 99% of my data on the last locale?", or we'd need to decide that an inferred bounding box implied re-blocking on every re-assignment to the domain (which I think some users requested recently anyway—"give us that productivity, execution time cost be damned!").
but it does potentially suggest that if I later re-size my domain from 1..10 to 1..20 it will be re-blocked (which we don't really support today). So there's either a big user education problem looming here ("Why is 99% of my data on the last locale?", or we'd need to decide that an inferred bounding box implied re-blocking on every re-assignment to the domain (which I think some users requested recently anyway—"give us that productivity, execution time cost be damned!").
This sounds reasonable to me. Do we have a case today where we want to resize a distributed domain but keep its distribution's bounding box the same (i.e. the status quo works in favor of the user) ?
Do we have a case today where we want to resize a distributed domain but keep its distribution's bounding box the same (i.e. the status quo works in favor of the user) ?
I think that, in practice, it's pretty rare to re-assign distributed domains at all once they're initialized. More often a number of domains are interacting with a single distribution—say a number of domains represent rows/columns/blocks of a 2D distributed domain where we all want them to share the same distribution, not each individually get its own block distribution based on its indices. So you could imagine a situation where, if I were iterating over rows or columns, I might reassign a single distributed domain representing the current row / column rather than creating it from scratch each time. But in such cases, the domains will typically be created either be created by explicitly sharing a single distribution, or by operations on the original domain. I'm not aware of any key current cases where a single domain has its own distribution and is reassigned multiple times.
While discussing range vs. domain genericity in #17122, @benharsh suggested a view of the generic rectangular domain type as having four generic arguments:
domain(rank, idxType, stridable, domainMap)
We can use this direction to get rid of the dmapped
keyword. First, assume that three out of these four arguments have defaults, like:
domain(param rank, type idxType = int, param stridable = false, domainMap = defaultDist)
Then, allow the domainMap
argument to be a type. An instance of that type may be obtained implicitly when needed to create a domain value.
User code can look like this:
// types
domain(2) // default 2-d domains
domain(2, domainMap=Block) // Block 2-d domains
domain(2, domainMap=?) // 2-d domains with any domain map
domain(2, ?) // all 2-d domains, including different idxTypes and stridability
// variables/fields
var d1: domain(2); // a default 2-d domain
var d2: domain(2, domainMap = Block(...)); // Block-distributed
// domain literals
{1..n} // a default domain
{1..n, Block(...)} // a Block-distributed domain
{1..n, 1..m, Block(...)} // a Block-distributed domain
@vasslitvinov - based on a message you sent to Michelle, Brad and I, it sounds like these have been marked unstable and the issue can have the 2.0 tag removed. Is that accurate?
We have renamed .dist
to .distribution
. For 2.0, the dmapped
keyword will be unstable. Factory functions #22427 will be a stable replacement for dmapped
.
Nothing else needs to be done for 2.0. I removed the 2.0 label.
Personally, I would still like to have a stable replacement for the dmapped
keyword for the fall release and have been wrestling with the question while working on getting rid of dmap
cases. But I'm fine not having the label here for the time being.
I broke the specific question of dmapped
name replacements into #23128 to give it some new, top-level attention and capture and summarize some (hopefully most/all?) of the proposals given on this issue.
Background: the
domain
keyword describes the generic domain type; it can also be used to query the domain value of an array expression.This issue seeks elegant counterparts of these -- for the domain-map type and query.
Right now a domain-map value can be created:
dmapped
clause of a domain type or value, ornew dmap(....)
The domain-map value can be queried using:
dist
method on domain expressionsConsiderations about the
dist
method:domain
is spelled out; however, "distribution", when spelled out, looks too long, so we may not want that