chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

'Range' vs. 'domain': concrete / generic by default? #17122

Closed vasslitvinov closed 1 year ago

vasslitvinov commented 3 years ago

This is perhaps a different angle at #13566

Currently range is a concrete type vs. domain is a generic type. This is non-intuitive and a newcomer is likely to bump into it at some point.

This issue proposes making them either both concrete or both generic.

Pro: makes Chapel easier to learn (and easier to use??) Con: makes existing Chapel users unhappier

bradcray commented 3 years ago

I don't disagree with the idea here, and don't think we should get rid of range as a concrete type. However, I also don't think domain has as obvious a concrete definition as range does. So if we were to unify them, I'd probably just make domain an error rather than a concrete type, indicating that it wasn't sufficiently specified (where domain(?) could be used to get the current behavior). This would probably also involve renaming the _domain record to domain which I think would be a good thing.

[edit: I may have been a little off-base here, see next comment]

bradcray commented 3 years ago

In today's range review (take 2):

vasslitvinov commented 2 years ago

There are several directions we can pursue to reconcile the genericity of range and domain. Here are the top contenders:

Proposal 1

Range's stridability is generic by default, domain's stridability is not, domain's rank is generic by default.

Proposal 2

Same as Proposal 1, except domain's stridability is generic by default.

Proposal 3

The opposite of Proposals 1 and 2. The benefit is in making genericity explicit, see also #19120 and #19121.

Background and related issues

Please vote below and/or comment! I plan to prototype the most-voted or most-argued-for option to understand the impact.

vasslitvinov commented 2 years ago

Vote here for the above Proposal 1.

vasslitvinov commented 2 years ago

Vote here for the above Proposal 2.

vasslitvinov commented 2 years ago

Vote here for the above Proposal 3.

mstrout commented 2 years ago

I would like to see genericity be explicit to avoid user confusion. This is related to making it explicit in functions https://github.com/chapel-lang/chapel/issues/19121 and in records/classes https://github.com/chapel-lang/chapel/issues/19120.

bradcray commented 2 years ago

@vasslitvinov : How do associative domains like domain(string) play with each of these proposals?

I voted for proposal 1 because even though it relies on an asymmetry between what is generic between the types, I believe it will require the smallest number of changes to user code, and also would consider it annoying to have to type unit in order to write a multidimensional domain declaration (as in proposal 2), particularly after years of simply writing domain(2), say (though nowadays, I hardly ever provide types for dense rectangular domains).

Proposal 3 is unappealing to me because I don't think domain should default to "1D domain" because it seems very arbitrary given all the things a domain could be (associative, sparse, multidimensional).

To Michelle's comment about making genericity explicit, I don't see proposal 3 as getting us there. To me, what would be required to achieve that outcome would be to change the status quo on this issue: https://github.com/chapel-lang/chapel/issues/18214 which essentially asked "If a generic type doesn't have defaults for all of its fields, should users have to write R(?) or range(?) (or variations that only leave certain fields as generic) rather than interpreting R or range as meaning "generic since not all of the generic fields are bound). The last time I brought it up, the discussion favored the status quo (but apparently not so overwhelmingly that I closed the issue).

vasslitvinov commented 2 years ago

All three proposals play nicely with associative and sparse domains -- because the following aspects remain unaffected:

e-kayrakli commented 2 years ago

and also would consider it annoying to have to type unit in order to write a multidimensional domain declaration (as in proposal 2), particularly after years of simply writing domain(2), say (though nowadays, I hardly ever provide types for dense rectangular domains).

This is a good point. But not something that worries me too deeply because of that parenthetical. Or at least, rule simplicity of Proposal 2 is more appealing than the code simplicity of Proposal 1 to me.

I wouldn't argue too much for Proposal 2 over Proposal 1, though.

bradcray commented 2 years ago

One addendum that occurred to me last night, which makes me less opposed to proposal 2 if I've got my facts right. I believe that code like:

const D: domain(2) = {1..n, 1..n};

would not break under proposal 2 because domain(2) would be treated as a partial instantiation, and then unit would be inferred from the initializer (?). So maybe it's not as bad as I thought in that "yes, it requires fully specified domain types to specify more information, but existing code likely won't break." Split initialization would even make cases like:

const D: domain(2);
D = {1..n, 1..n};

work. Where I worry most would be when there is a field:

record R {
  const D: domain(2);
}

In such cases, my hope would be that split initialization-style interpretations would apply in the initializer context as well such that no changes would be required here either, but I'm not confident about how well that is working in practice these days.

This thread makes me wonder: What would the practical impact be of making none of the generic fields in range and domain have defaults? I.e., int is obviously a good default for idxType, but are we really gaining much from that default given that if it's unbound it'll be inferred from context? So maybe that suggests a proposal 4: Default none of idxType, stridedness, rank.

One other observation: In this thread, each time I see range(unit) or domain(unit) I read it as a typo as though the idxType is uint. This makes me wonder whether unit is a poor identifier in this context and should be replaced with one. I'll mention that on the relevant issue as well once I find my way back to it.

vasslitvinov commented 2 years ago

Notes from a subgroup discussion on 8/09.

Seven people -- almost everyone in the discussion -- liked "Proposal 4" from https://github.com/chapel-lang/chapel/issues/17122#issuecomment-1190580050, namely remove all defaults from the range and domain types.

If users need something convenient for a concrete or partially generic range/domain type, they can use a type alias, including a type alias for a partially instantiated type.

Do we want settle #18214 first, then have ranges and domains follow the same rules? Or do we want our conclusions on domains and ranges to feed into #18214 ?

stridable is a bit unique because we have it mostly for optimization purposes. If we didn't care about efficient loop generation, we'd just have stridable ranges and the unit stride would be determined at execution time when needed.

The choice of the default depends on whether we view the defaults as "the most common case" or "the most flexible case". Today for ranges it is the former.

vasslitvinov commented 2 years ago

To keep in mind: domain is different than our typical generic types including range. That is, domain can be instantiated into a rectangular domain with three (instantiated) generic fields or into an associative domain with two generic fields.

So "making domain fully generic" means something different than saying that about ranges.

I propose consider making domain similar to other generic types in this regard and introduce other type(s) for associative and perhaps sparse domains, rather than having it as a one-off. These other type(s) may (or may not) work in the same way w.r.t. arrays as domains do today, just be called differently.

bradcray commented 2 years ago

I propose consider making domain similar to other generic types in this regard and introduce other type(s) for associative and perhaps sparse domains, rather than having it as a one-off.

I'd prefer not to do that, in part because I'm not convinced domains are broken in this regard in their current state, in part because it feels like a slippery slope where we might then feel the need to say "And there should be multiple generic array types rather than just [?D] ?t (say), that express whether it's a sparse array or associative array, or ...

vasslitvinov commented 1 year ago

In today's module review meeting:

A viewpoint: what trips users the most is stridability. Proposal: remove the default from range.stridable; keep the defaults on range.idxType and range.bounded.

A viewpoint: having some fields defaulted and some others not will be confusing to users, they will keep coming back to the documentation to check which is which. Defaulted generics are "sort-of generics." Proposal: remove all defaults; lean heavily on type aliases for common cases / to avoid exposing users to complexity.

type boundedRange = range(bounds = boundedKind.both, ?);
type simpleRange = range(int, boundedKind.both, strideKind.unit);

Related: user-defined type constructors #18817; meaning of writing a generic type's name #18214 #19120.

A viewpoint: the most important / most likely encountered by users are range literals. They should "just work." Handling range variables etc. is a secondary issue. So, let us make range generic at least over stridability.

A concern: when a function is written to accept a range formal and, upon an upgrade to the next Chapel version, starts to accept unbounded ranges, it will be surprising to the user.

A proposal: remove all defaults, however instantiating range with just one parameter gives a fully-concrete type. For domains, the first parameter must be given to obtain a concrete type, so their behavior does not change. Pro: saves typing, ex. range(int) is a concrete type. Con: gives ranges an unusual generic behavior, unlike any other generic type.

A suggestion for improvement to help with migration of range to generic, as well as in other contexts: when the user defines a record with field(s) of generic type(s), they cannot declare variables of that record's type. If that causes a compilation error, the compiler should trace it back to generic fields of the record.

record R {
  var rng: range;   // assume this is generic now
  var c: MyClass;   // generic because no memory management
}

var r: R;   // error: "did you mean to use a concrete range type for 'R.rng'?
            // did you mean to specify the memory management for 'R.c'?"

The audience was split between keeping the status quo vs. removing some generic defaults. There was some support for removing all generic defaults and more support for introducing aliases.

The latest proposal is a variation on the above: remove (some|all) defaults from range; the name of the type itself should be a more extensive name and the basic case should be just "range", ex.

This, however, will make explicit instantiations more verbose.

bradcray commented 1 year ago

The latest proposal is a variation on the above: remove (some|all) defaults from range; the name of the type itself should be a more extensive name and the basic case should be just "range", ex.

  • generalRange - the generic type
  • range - the concrete & simple type, same as today's range, an instantiation of generalRange This, however, will make explicit instantiations more verbose.

I'm not a fan of this proposal. I think that 1..n by 2 should be able to be described simply a "range(...)" and not a type with a different / more verbose name—particularly given that when using generalRange(), additional arguments are likely to be needed making the type name even longer. Overall, this proposal makes me feel like "generics have failed—we need two types to describe one thing."

vasslitvinov commented 1 year ago

An observation about today's genericity of domains by @benharsh :

proc foo(dom: domain(2)) // any 2D domain with ``idxType=int`` and ``stridable=false``

var D : domain(2); // initializes with DefaultDist by default

// warning: initializing local domain from distributed domain
// I suspect this is a carry-over from the "default-init, then assign" model, perhaps necessary in production
var S : domain(2) = D dmapped Block(D);

In short, a domain type expression in a formal's type expression appears to behave generically, while in other contexts it seems to behave as though it has a default for the hypothetical "distribution field". isGenericType(domain(2)) appears to return false today.

We need to account for domain's genericity over its domain map, with the natural default of default rectangular.

vasslitvinov commented 1 year ago

Another scenario to keep in mind, brought up by @benharsh : the following type expression is ambiguous because it can indicate either rectangular or associative domains (or both?).

domain(idxType=int, ?)
vasslitvinov commented 1 year ago

Here is the leading proposal crystalized from the discussions at the subteam meeting on 4/18. It relies on the proposed general rule that a generic type requires ? [somewhere] within parens unless fully-defaulted. This rule is popular at the moment, however has not been formally accepted.

The proposal

Keep the status quo with range, in particular, keep it fully-defaulted. Generic range types are to be written using ?, ex. range(?), range(stridable=?), etc. Improve compiler error messages to guide users in situations like var r: range = 1..3 by 2;.

Keep the status quo in how we think about the genericity of domain, plus disallow standalone occurrences of domain. The fully-generic domain type is to be written domain(?), consistently with the proposed general rule mentioned above.

For example:

Reflecting on the proposal

Even though the proposal is to keep ranges and domains mostly as-is w.r.t. genericity, we now have a stronger rationale for their behavior.

Note that currently the implementation does not allow us to write partially-generic domain types like domain(2,?) or domain(int,parSafe=?). The initial plan is to hold off addressing this until we can do it in the dyno framework.

The following add-ons are more of Vass's thinking. They are here because they blend well with the subteam discussions.

Add-on 1

Make the domain map the 4th generic parameter of rectangular domain types and the 3rd generic parameter of associative domain types, as a replacement for the dmapped clause. In both cases, this parameter has the default of defaultDist. These parameters can be "set" to a value / type / ?. They can also be queried. https://github.com/chapel-lang/chapel/issues/17908#issuecomment-1513929470

Add-on 2

(Base + Add-on 1) imply the following three de-facto type constructors for domains:

proc type domain.init(rank, idxType=int, stridable=false, domainMap=defaultDist)  // rectangular
proc type domain.init(idxType, parSafe=true, domainMap=defaultDist)  // associative
proc type domain.init(parentDomain) // sparse

Add-on 3

Add a param property domain.kind of the type enum domainKind { rectangular, associative, sparse }.

Add one more type constructor that provides the generic types for "all rectangular domains', ditto associative and sparse:

proc type domain.init(param kind: domainKind)

This property can be queried. We might also allow types like domain(.... kind=xyz ....).

Add-on 4

Uphold the special case we have today where a default rectangular domain value represents a type that is generic over the domain map when:

Other

Re add-on 1: the 4th argument may be a type or a value. If the latter, it is a sugar for the 4th argument being the type of that value. We also want to preserve the sugar we have right now for the dmapped clause.

Re add-on 2: domain(idxType=int) and domain(idxType=int, ?) may (or not) be ambiguous between rectangular and associative constructors. That sounds OK to me.

In order to represent the generic domain types discussed here, we need dedicated representation, rather than our _domain record whose type merely redirects to the implementation class. Perhaps it is already in the works in dyno?

vasslitvinov commented 1 year ago

Notes from the subteam meeting on 4/18.

The utility and frequent use of simpleRange in #22108 made us feel that we need two ways to talk about the same thing (i.e., range(int,both,false)). This, in turn, made us think that we did not get it right [when we made range am undefaulted generic]. Which nudged us back to considering the status quo, i.e., keep range fully defaulted. We will improve compiler messaging to help users in cases like var r: range = 1..3 by 2;

If range is fully defaulted, should we make domain be fully defaulted as well? We have considered this option before and did not like it, primarily because there was no default domain type that was as clear a "winner" as the default range type. Here are our thoughts in this meeting. Should the fully-defaulted domain be rectangular or associative? Perhaps it should be associative. Then, there is no natural default index type. Choosing the default rectangular domain did not result in success.

One thought: having the naked domain be a concrete type of 1-d rectangular domains could be justified by its being the simplest domain, not necessarily the most common. Chapel users will learn very quickly that they need to write more to get to other domains concrete or generic domains. Benefit: same look and feel as range. OTOH this default hardly serves a purpose as we do not expect 1-d domains to be common.

Another thought: domains are already special, for example by having the rectangular / associative / sparse distinction and the dmapped clause. So it makes sense for them to be different w.r.t. genericity, ex., have no generic defaults.

Proposal: the fully-generic domain type must be written domain(?), just like user-defined generic types. Note: we have not decided to introduce this general requirement [without which requiring ? on domain does not make sense], however we are leaning in this direction. Question: any surprises when we introduce this requirement?

Answer: not expecting surprises. When looking through Chapel code in:

modules/packages
test/release/examples
test/studies
CHAMPS
Arkouda

the only uses of domain without parentheses are:

the grep that I used grep '[^.]\bdomain\b[^(]' | sed 's@//.*domain@//xyz@; s@ domain (@ domain(@' | grep '[^.]\bdomain\b[^(]'

Another thought: requiring domain(?) is more consistent / less surprising than, for example, the error when passing a stridable range to a naked range formal.

We still need to decide whether domain(2), for example, is generic over its domain map. Currently it is not, except in one special case (see "Add-on 4" in https://github.com/chapel-lang/chapel/issues/17122#issuecomment-1516693176).

Question: to help decide, how are ranges and domains used in Arkouda and CHAMPS? Answer: see the next comment.

vasslitvinov commented 1 year ago

Question: how do CHAMPS and Arkouda use ranges and domains? Answer: see the tables below. The left column contains the occurrence counts.

Arkouda ranges
7 range(stridable=true)
4 range(stridable=false)
3 range(?) as a formal's type
1 var segRange: range = 1..0;
1 (bool, range) as a return type
Arkouda domains
300 xyz.domain
3 domain(1) as a formal's type
9 domain(1) as an array element type
2 domain(1) dmapped Block(...) as a var's type
1 domain(1) as a const's type
1 domain(t, parSafe=false)
1 domain(string) as a var's type
CHAMPS domains
240 var xyz : domain(1) = a domain literal
328 const xyz : domain(1) = a domain literal
8 var xyz : domain(1, stridable=true) = a strided 1-d domain literal (*)
43 var / const xyz : domain(2) = a 2-d domain literal
2 const xyz: domain(3) = a 3-d domain literal
(**)
120 xyz.domain
26 domain(1) as the formal type
2 domain(1) as the return type
18 const .*: domain(1) dmapped ...
10 var xyz: domain(1);
2 domain(string) as a var's type

(*) in one case out of 8, it is a by expression (**): in the cases above this line in the table the type annotation is unnecessary, i.e., if it were dropped, the inferred type of the variable would be the same

CHAMPS ranges
86 const xyz:range = [low]..#[count];
23 const xyz:range = abc.variablesDom_.dim(0);
9 var xyz:range = [low]..#[count];
3 const xyz:range = [low]..[high];
2 new map(int,range)
2 range as a formal's type
1 var elements_ : [sectionDomain_] range;
vasslitvinov commented 1 year ago

The conclusion from the design sub-team discussions last week is to make almost no changes to ranges and domains w.r.t. their genericity. The only change is to require the generic domain type to be written as domain(?) and any partially-instantiated domain type to be written as domain(..., ?) as in #21410

This is discussed under "The proposal" above in https://github.com/chapel-lang/chapel/issues/17122#issuecomment-1516693176 . It is contingent on accepting the general rule requiring (?) after any non-fully-defaulted generic type. If we end up not accepting this general rule or accept another rule, we will apply that to domains as well.

vasslitvinov commented 1 year ago

We are keeping range fully-defaulted and domain fully-generic for 2.0, as discussed on this issue and elsewhere. I removed the 2.0 tag.

bradcray commented 1 year ago

I think we could go so far as to close this issue. I can't imagine we'll change the decision given the time we've invested in it recently, barring some user complaining about something strenuously and persuasively (in which case I think they should open their own issue to do so).

vasslitvinov commented 1 year ago

I agree. Closing as resolved.