Should `dmapped` be replaced, removed, or stabilized?

bradcray commented 1 year ago

This issue is an offshoot of one particular aspect of https://github.com/chapel-lang/chapel/issues/17908.

The dmapped keyword, like the dmap type have not been particularly popular terms within the language as time has passed. Currently, dmapped is marked as unstable, where the stable way of creating and applying a distribution to a domain is to use factory methods like Block.newDomain({1..n}) or MyBlocDist.newDomain({1..n}).

While using these factory methods are sufficient for creating a distributed domain, I find myself wishing for a dmapped equivalent since, arguably, we could get rid of Chapel's syntax for declaring ranges, (non-distributed) domains, and arrays and simply write range.createRange(1, n) or domain.createDomain(range.createRange(1, n), range.createRange(1, n)) but find benefits in having syntax like 1..n, {1..n, 1..n} or [1..n, 1..n] real appealing rather than expressing everything as a factory method call.

The history of dmapped was that we wanted something that applied equally well to distributions (domain maps targeting multiple locales) or layouts (domain maps targeting a single locale) and so decided to reserve and make up the dmap type and dmapped operator as a way of picking a word that would be special, not likely to conflict with an identifier a user wanted to use, and corresponded to the Chapel concept of a domain map. While those reasons are sound, dmapped as a keyword remains fairly unpopular, leading to its destabilization.

If we were to replace it 1:1 with another word, some options include:

`distributed` (example usage: `{1..n, 1..n} distributed MyDist`)

Con: This approach has the downside of flowing better for distributions than for layouts
Pro: However, it matches the .distribution query that we recently stabilized (replacing the previous .dist query) where we decided to simply think of layouts as degenerate distributions (or, potentially "distributions of indices/elements to memory")
Pro: Reads fairly naturally relative to dmapped
Con: A bit long
Con: May be an identifier that a user would want to use (e.g., a boolean field or flag indicating whether something is or should be distributed?)

`stored` (example usage: `{1..n, 1..n} stored MyDist`)

Pro: Of a reasonable length for a keyword
Pro: Seems less likely to conflict with a user identifier (?)
Pro: More neutral w.r.t. distributions vs. layouts
Neutral: Potentially applicable in other contexts, though potentially slightly clunky (e.g., atomic int stored [processor|network])
Con: Less correspondence with .distribution query

`implemented` (example usage `{1..n, 1..n} implemented MyDist`)

Con: Back to a longer keyword
Pro: Unlikely to conflict with user identifiers
Pro: Potentially applicable to other contexts (e.g., atomic int implemented [processor|network])
Con: Less correspondence with .distribution query

move distribution information to type?

In #17908, Vass noted a BHarsh-inspired proposal to move distribution information into the type rather than in the value field. While this is a possibility—and the distribution type is clearly a property of a domain or array—it has the downside of not being applicable in a value context. That is, there is no closed form way of creating a distributed domain in a type-inferred / anonymous context, like:

const D = {1..n} dmapped MyDist;
var A: [{1..n} dmapped MyDist] real;

For this reason, it doesn't feel like a satisfactory replacement for dmapped to me

get rid of support for `dmapped`-style syntax?

Similarly, we could just retire, and not replace, dmapped, relying on factory methods. While this is sufficient, it seems unattractive to me, personally, for the reasons given at the top of this OP—it becomes the one type in the range/domain/array space that doesn't have a syntax for declaring a complete domain value (which is to say, one that embeds its distribution).

retain `dmapped`?

Of course, we could retain dmapped, though again, the main downside of this is that it has been considered non-intuitive, not very readable/verby ("1..10 domain mapped [by] Block"), and unpopular.

lydia-duncan commented 1 year ago

What about over? E.g. const D = {1..n} over MyDist;

Pros:
- short
- indicates the relationship between the domain and the distribution
- I don't think the word on its own would be used as an identifier in other contexts, it'd probably be used in combination with other words
Cons:
- Doesn't say anything directly about distributions (but does it need to?)
- Reserves an additional keyword (more relevant if we don't buy the pro above)

benharsh commented 1 year ago

Vass noted a BHarsh proposal to move distribution information into the type

I just want to clarify that it wasn't my intention to suggest that we move dmapped-related stuff into the type of a domain as an alternative to dmapped. Rather, I was trying to point out in an unrelated discussion that in practice a domain's distribution is already a part of the domain's runtime and static type, but that we often only speak of domains in terms of rank/idxType/etc.

bradcray commented 1 year ago

I've wordsmithed a little to reduce the implication that you literally proposed this.

jeremiah-corrado commented 7 months ago

In a recent design discussion, @bradcray, @e-kayrakli and I went over the above options (and a few others) for renaming dmapped. Our top proposal was to switch to the keyword mapped.

Note that we also discussed some alternatives to dmapped other than renaming it, but found that they didn't offer sufficiently compelling benefits over the existing syntax (at least that we could think of), and thus didn't warrant making a more significant change to the language.

Some of the points in favor of renaming dmapped to mapped:

nicely conveys the concept of mapping/distributing a set of indices across locales
- doesn't feel out of place when used for a layout: {1..n} mapped new CS()
- doesn't convey the idea that memory will be allocated when creating the domain
short: distributed domain expressions are already long, so we didn't want to introduce a longer keyword
we imagine it's less likely than some other options to collide with identifiers in existing user code

In discussing this with the broader team, the majority of people were in favor of going through with renaming it, pending a broader discussion about how to add new keywords to the language after Chapel 2.0.

stonea commented 7 months ago

Maybe, the subteam's already gone through this but I'd feel somewhat cautious adding mapped as a keyword to the language without there being discussion about it and checking to see if it appears in user codes like CHAMPS, Chop, Arkouda, etc.

It's not a traditional keyword, we're post Chapel 2.0, and it doesn't seem unlikely to me that a user would want to name a variable, proc, or method 'mapped' (dmapped seems much more unique/unlikely to me).

Anyway, it's not something I feel strongly enough to veto things over, but I do think we need to be cautious about restricting currently legal identifiers.

benharsh commented 7 months ago

The main concern I have about the keyword mapped is the potential for new users to conflate this keyword with the more abstract concept of "mapping" in other languages. A possible issue here is that the difference between what users usually associate with the word "map" and our definition of mapped is too large, and it makes Chapel seem "weird(er?)" in some way. Another case might be making the docs harder to search when looking for ways to get the desired behavior, and instead coming across the mapped keyword.

Some examples of what I'm talking about in other languages, where you want to apply a function to every element of something:

Python: map(lambda x: x * 2, numbers)
Rust: myVec.iter().map(|element| element * 2)
Java: list.stream().map(number -> number * 2)
Julia: map((x) -> x * 2, [1, 2, 3])

Once a users is "thinking in Chapel" and understands distributions I think the term makes enough sense. But it does seem like one of those things people will have to highlight and explain to newcomers, probably followed by "here's how you actually do this in Chapel".

vasslitvinov commented 7 months ago

The map-a-function meaning of mapped makes me less excited about this choice. Combined with the concern about introducing a new keyword, I suggest this approach:

"No, we do not like 'dmapped' on aesthetic grounds. However it is in-line with our stability promises and we expect it to be only a minor hindrance to new users, and perhaps none for existing users. It is also great for googling."

bradcray commented 7 months ago

I think it's inevitable that we're going to want to add new keywords to the language, to avoid being stuck without anywhere to go in terms of adding new features and making the language better. To that end, I think we should discuss what the process for doing so will be rather than not considering the language we want.

Another example just came up: Making the "static local variable" feature first-class rather than attribute-based (it doesn't feel at all like a good fit for an attribute to me, but even if you disagree with this specific take, it's inevitable that we'll want to add features via keywords that we don't have today). It'd be a shame to prematurely lock ourselves into the language we had when we had n users for all time if it made 1000*n users miserable for years. I've heard the same sentiment from users—that a little pain is probably worth a lot more than trying to be pure in terms of not changing anything.

That only addresses part of Vass and Andy's reactions and none of BHarsh's, but since it keeps coming up, I wanted to capture this opinion (but it should probably get an issue or discussion of its own). I don't (personally) have an immediate response to BHarsh's concern, but am thinking about it.

e-kayrakli commented 7 months ago

@bradcray -- You know my stance already, but wanted to capture it here.

I completely agree that we should be open to adding new keywords and that that process should be discussed. For this context, however, I am still having hard time convincing myself that the improvement in elegance by finding a better word trumps the churn (mostly the new keyword part and not so much the deprecation part). While dmapped is quirky and feels more disconnected with the related concepts after some recent changes as part of the 2.0 effort, it is not that bad or clearly in violation of general Chapel style (e.g. dMapped).

Probably more introspection than anything else, but if I was asked this question several releases after 2.0, I might have been less concerned about it. The fact that we may introduce a new keyword right after releasing 2.0 doesn't feel great.

Some other concerns that also resonate with me (though I wouldn't consider to be deal-breakers):

mapped is likelier to be a variable name compared to dmapped,
"mapping" as a different concept could be confusing to new comers.

p.s. @lydia-duncan's over suggestion above is completely new to me and arguably doesn't have some of those negative side effects that mapped has.

bradcray commented 7 months ago

p.s. @lydia-duncan's over suggestion above is completely new to me and arguably doesn't have some of those negative side effects that mapped has.

Oh, I'd completely forgotten about that one in last week's discussions. I agree that it's positive (and believe my thumbs-up on it dates from it originally being posted), and generally, prepositions are pretty safe words for a language to reserve. Of course over could also be used as a synonym for done, as in const over = false;. But still...

lydia-duncan commented 7 months ago

I think it's reasonable to say that we're establishing the precedent for how to handle adding new keywords in a post-2.0 world. It's not as though it was outside the realm of possibility that we'd do that when replacing dmapped since dmapped was already a keyword and given that domain maps are such a core concept, I'd rather we move the concept as a whole to a more finalized state sooner rather than waiting an extra release or two just because we had other priorities before 2.0

bradcray commented 7 months ago

Thinking some more about over last night, I found myself wondering whether it was a less-good match for layouts, e.g.:

const D = {1..100, 1..100} over new columnMajorOrder();
var SD: sparse subdomain(D) over [new?] compressedSparseRow();

vs.

const D = {1..100, 1..100} mapped new columnMajorOrder();
var SD: sparse subdomain(D) mapped [new?] compressedSparseRow();

I didn't come to a conclusion, but wanted to capture the thought here.

lydia-duncan commented 7 months ago

I still like it for layouts, personally, but I may be a little biased :)

bradcray commented 7 months ago

Thinking about other preopositional(ish)/"innocuous" English words after being reminded of mapped, using came to mind:

const D = {1..n} using new blockDist({1..n});
const D = {1..100, 1..100} using new columnMajorOrder();
var SD: sparse subdomain(D) using [new?] compressedSparseRow();

e-kayrakli commented 7 months ago

using

Two potential confusions:

using in C++
use in Chapel (we also steered away from implemented because of implements)

jeremiah-corrado commented 7 months ago

I've opened a separate issue to discuss the more general topic of if/how we should add new keywords to the language now that 2.0 is passed.

bradcray commented 7 months ago

Continuing to spitball with the "innocuous words" idea, this time focusing on ones already in the lexer:

const D = {1..n} as new blockDist({1..n});
const D = {1..100, 1..100} as new columnMajorOrder();
const D = {1..1000000} as MyDist;

const D = {1..n} by new blockDist({1..n});
const D = {1..100, 1..100} by new columnMajorOrder();
const D = {1..1000000} by MyDist;

const D = {1..n} in new blockDist({1..n});
const D = {1..100, 1..100} in new columnMajorOrder();
const D = {1..1000000} in MyDist;

const D = {1..n} on new blockDist({1..n});
const D = {1..100, 1..100} on new columnMajorOrder();
const D = {1..1000000} on MyDist;

const D = {1..n} type new blockDist({1..n});
const D = {1..100, 1..100} type new columnMajorOrder();
const D = {1..1000000} type MyDist;

const D = {1..n} with new blockDist({1..n});
const D = {1..100, 1..100} with new columnMajorOrder();
const D = {1..1000000} with MyDist;

jeremiah-corrado commented 7 months ago

I think by actually works quite well, where I kind of read the expressions as: "domain (distributed) by distribution".

Although it may be confusing to reuse the same keyword for two related, but different, concepts. Especially for expressions like:

const D = {1..n by 2, 1..m by 4} by myDist'

bradcray commented 7 months ago

Yeah, I had that same reaction, but have been trying not to set these up for failure or success, just curious how people react. Arguably worse than your case is the fact that you can also say:

const D = {1..n, 1..n} by (2, 4);
// so:
const D = {1..n, 1..n} by (2, 4) by myDist;

I don't think this is ambiguous, but definitely agree it's a mixed metaphor ("applying 'by' to a domain could have two very different meanings).

vasslitvinov commented 7 months ago

with appeals to me. by is a remote second, from Brad's most recent list.

jeremiah-corrado commented 7 months ago

In an offline discussion, Brad, Engin and I went over the feedback from the team about mapped as well as the alternatives discussed above. The subteam's proposal is to replace dmapped with over pending a larger discussion about the process for making breaking changes in the language post-2.0.

See this issue for more on that topic.

Rationale:

The feedback about mapped being potentially confused with a mapping operation did resonate with us. Although other languages most often use map, we can see how users might expect mapped to carry out a mapping operation.

We looked closely at how over and some of the existing keywords would look in the context of today's legal dmapped expressions, considering the english meaning and aesthetics as well as some practicalities like parsing ambiguities. In addition to over, we were in favor of by and with to varying degrees:

over:

// creating distributed domains
var d = {1..n} over new blockDist({1..n});
var d = {1..n} over myBlockDist;

// distributed domain type expressions
type dt = domain(1) over myBlockDist;
type dt = domain(1) over new blockDist({1..n});
proc foo(dom: domain(1) over myBlockDist) { ... }
var dom: domain(1) over myBlockDist = ...

// anonymous iteration
forall i in {1..n} over myBlockDist { ... }
forall i in {1..n} over new blockDist({1..n}) { ... }

by:

```chapel // creating distributed domains var d = {1..n} by new blockDist({1..n}); var d = {1..n} by myBlockDist; // distributed domain type expressions type dt = domain(1) by myBlockDist; type dt = domain(1) by new blockDist({1..n}); proc foo(dom: domain(1) by myBlockDist) { ... } var dom: domain(1) by myBlockDist = ... // anonymous iteration forall i in {1..n} by myBlockDist { ... } forall i in {1..n} by new blockDist({1..n}) { ... } ```

with:

```chapel // creating distributed domains var d = {1..n} with new blockDist({1..n}); var d = {1..n} with myBlockDist; // distributed domain type expressions type dt = domain(1) with myBlockDist; type dt = domain(1) with new blockDist({1..n}); proc foo(dom: domain(1) with myBlockDist) { ... } var dom: domain(1) with myBlockDist = ... // anonymous iteration forall i in {1..n} with myBlockDist { ... } forall i in {1..n} with new blockDist({1..n}) { ... } ```

We also considered how the keywords would look in type expressions that are not yet supported, but could be in the future (particularly if we add support for domain types that are distributed over some generic distribution of a given distribution type):

over:

type d = domain(1) over blockDist(?);
proc foo(dom: domain(1) over blockDist(?)) { ... }
proc foo(arr: [?d over ?di]) { ... }

by:

```chapel type d = domain(1) by blockDist(?); proc foo(dom: domain(1) by blockDist(?)) { ... } proc foo(arr: [?d by ?di]) { ... } ```

with:

```chapel type d = domain(1) with blockDist(?); proc foo(dom: domain(1) with blockDist(?)) { ... } proc foo(arr: [?d with ?di]) { ... } ```

Drawbacks with `by`:

Overall, by had a lot of support. There were a couple of drawbacks that we discussed, related to the fact that it would overload the existing meaning of applying a stride to a range/domain:

asymmetry wrt type expressions: by <distribution> would be legal on both sides of a typed domain declaration, while by <integer> is only legal on the value side.

// would be legal:
var d: domain(1) by myDist = {1..n} by myDist by 2;

// would not be legal:
var d: domain(1) by myDist by 2 = {1..n} by myDist by 2;

This isn't a major drawback considering that the second case is obviously illegal; however it could be a source of confusion. The above code could also be more difficult to reason about if 2 is replaced by a variable (or if myDist had a name that was less distribution related).

associativity question with CSR domains: a CSR representation of a strided domain makes sense, but it's not clear whether a stride applied to a CSR representation makes sense. I.e., are the following domains equivalent?

var d1 = {1..100, 1..100} by (2, 2) by new columnMajorOrder();
var d2 = {1..100, 1..100} by new columnMajorOrder() by (2, 2);

The answer may simply be yes; however, given that it wasn't obvious to us, we thought that users could face similar confusion.

Drawbacks with `with`:

We considered with to be a great option in terms of how it looks/reads; however, we found that it could be confusing in the context of iterating over an anonymous distributed domain, i.e., the first two lines below would have a very different meaning than the third, even though they are visually similar. We saw this as a major drawback.

forall i in {1..n} with myDist {}
forall i in {1..n} with (myDist) {}
forall i in {1..n} with (in myDist) {}

chapel-lang / chapel