Closed bradcray closed 4 years ago
Note that I chose .lowIdx and .highIdx to avoid the potential ambiguity that .low and .high might refer to values rather than indices. Of course, for ordered collections .indices.low and .indices.high should also work.
The concept and the names suggested make sense to me.
As a half-baked idea, I am wondering whether firstIdx
, lastIdx
makes things more general instead of low/high. I am imagining (vaguely) a structure where indices can be in a funny order. In that scenario first/last might make more sense as "this is the index of the first element my default iterator would be yielding". It may also have more clarity for multidimensional indices maybe?
To be clear, I am not objecting to low/high, but more thinking out loud.
The concept and the names suggested make sense to me.
As a half-baked idea, I am wondering whether
firstIdx
,lastIdx
makes things more general instead of low/high. I am imagining (vaguely) a structure where indices can be in a funny order. In that scenario first/last might make more sense as "this is the index of the first element my default iterator would be yielding". It may also have more clarity for multidimensional indices maybe?To be clear, I am not objecting to low/high, but more thinking out loud.
I would prefer firstIdx
and lastIdx
instead of lo/hi because we already have first
and last
for arrays and list
, which are both indexible containers.
If indices
is always a range, we could get away with only supporting that, and having users rely on indices.low
, indices.high
, indices.first
, and indices.last
.
I don't mind the verbosity of this (and may actually prefer it over lowIdx / highIdx
). More importantly, this puts less on the sequential data structure interfaces.
I actually really like that idea @ben-albrecht. I agree that we should avoid adding too many new methods if possible.
I'm open to the proposal to just support .indices
, particularly since it reduces the amount of work I have to do (though... I've already done it :'( ). I think the main case where it hurts a bit could be loops over heterogeneous tuples, which would have to be written:
for param i in myTuple.indices.low..myTuple.indices.high do
...myTuple(i)...
due to the need to unroll such loops to have their bodies make sense. But hopefully issue #14929 will simplify that case which will lessen the pain.
I feel fine with .indices, .lowIdx, and .highIdx in spirit but would do two things to it specifically:
That leads to .indices()
.lowIndex
and .highIndex
.
I think the main case where it hurts a bit could be loops over heterogeneous tuples
I don't personally see this as a big downside, especially since it is a temporary.
I don't personally see this as a big downside, especially since it is a temporary.
I agree; was just pointing out the only hesitation I could come up with.
use parens on indices at least if it has the potential to be an iterator that makes sense to me)
When you say it, this rationale makes sense to me as well, though I think it makes code look regrettably a little uglier. I.e.:
for i in myTuple.indices() do
rather than:
for i in myTuple.indices do
maybe I just need to get used to it, though...
I agree; was just pointing out the only hesitation I could come up with.
Oops, I forgot something a bit more crucial: This:
for param i in myTuple.indices.low..myTuple.indices.high do
...myTuple(i)...
doesn't actually work because we don't have param
ranges (yet, and won't soon), and the bounds of the heterogeneous tuple's param loop have to be params. So without a .lowIdx
or equivalent, loops over heterogeneous tuples would have to continue to hard-code a 0 or 1 into their code until #14929 became available. But I'm still OK leaving these calls behind for now.
rectangular arrays would likely return their domains and associative arrays would implement indices as an iterator.
On rethinking this, I'm thinking that all arrays could just return their domain... I think I was thinking that .indices
would need to be well-ordered in order to return a domain, but now I can't think why that would be...
use parens on indices at least if it has the potential to be an iterator that makes sense to me)
When you say it, this rationale makes sense to me as well, though I think it makes code look regrettably a little uglier.
I feel similarly, but think parens is the right call if it is an iterator.
What would we do about the map
type? Will it have redundant methods map.keys()
and map.indices()
?
On rethinking this, I'm thinking that all arrays could just return their domain...
In which case I am again OK with .indices
and I have a feeling that is what you are going for...
What would we do about the map type? Will it have redundant methods map.keys() and map.indices()?
One could argue that this method/term only makes sense for numeric indices (vs "keys"). Then it might not apply to associative domains/arrays either.
I am again OK with .indices and I have a feeling that is what you are going for...
Not necessarily, if people prefer the parens. I was mostly just self-code-reviewing and realizing my mis-thought.
Are we talking the lower bound and upper bound of an index here?
@damianmoz we might not be using the same terminology but I'm going to try to clarify.
It's not about the index data type so much as about the collection it is addressing.
E.g. for a tuple, right now tuples are indexed starting from 1, but we want to change that to starting from 0 in this release. This issue is about giving programmers a way of writing code for tuples (and strings, bytes, lists, ...) that doesn't require them to make their code use the fact of tuples starting at 0 or 1.
var tup = (10, 20, 30);
writeln(tup(tup.lowIdx)); // always 10, no matter if tuple indexing starts at 0 or 1
writeln(tup(tup.highIdx)); // always 30, no matter if tuple indexing starts at 0 or 1
for i in tup.indices do writeln(tup(i)); // always outputs 10 20 30, no matter if tuple indexing starts at 0 or 1
}
This functionality will be useful for people trying to write Chapel code that works with both previous 1-based releases and the upcoming 0-based release. Additionally it will be useful for programmers that (for whatever reason) wish to avoid encoding whether tuples start from 0 in their programs.
Specifically for the case of tup.indices
in the above example, that is a simpler and more elegant way of writing it than the current practice of in any case.
for i in 1..tup.size do writeln(tup(i)); // current practice
for i in tup.indices do writeln(tup(i)); // this issue requests
In that specific use case, the feature this issue requests allows the loop to be written in a lower-complexity manner.
Just to summarize my current thinking on this issue: I was swayed by @ben-albrecht's well-received comment proposing that we only support indices
and require .low
/.high
/.first
/.last
to be applied to the result of .indices
rather than creating a short-circuit for it. This has the benefit of keeping the interface smaller and avoiding the need to make that naming decision. I also made progress on #14929 on Friday which was the main case that was calling for direct access to low and high. Also, we can always add more helpers later if/when people want and request them, whereas removing them is harder. Note that there may be reason to do so given that they return param
s in some cases which are not available via ranges (in part because we don't support param ranges today; in part because in cases like strings and bytes, the high generally cannot be a param, where the low can be).
I still feel on the fence about whether the indices
method should take parens or not. I'm going to set up a few comments below for those who are following this to vote on.
"Like" this comment to express support for .indices
(you can also post a follow-up comment rationalizing your vote)
Arguments in favor:
"Like" this comment to express support for .indices()
(you can also post a follow-up comment rationalizing your vote)
Arguments in favor:
.indices
is field-like for current cases, it's not technically a field, so perhaps .indices()
is more appropriate.indices
on other types may not have an O(1) representation, so may either require computation to create the set of indices or an iterator-based implementation to yield themI voted for the no-paren version because it is akin to .domain
on an array.
Still undecided here. If it is always a range, I am in favor of indices
. If it is always an iterator, I am in favor of indices()
. If it can be either (which it sounds like it can be), I am not sure.
I feel like it would be an easy logical leap to extend this method to include other arguments controlling what is returned, so it would make sense to me to have parentheses in case we follow that route later
to include other arguments controlling what is returned
What kinds of arguments do you have in mind?
Dimensional limitations in the case of the array version, maybe an explicit stride?
That doesn't feel in the spirit of "tell me what your indices are" to me, particularly since it doesn't extend to any of the 1D cases, which are the main motivator for this query. Seems like if someone wanted that, they should write something like myArray.indices.dim(1) by 4
.
I voted for .indices
.
for currently planned use-cases (tuples, strings, bytes, arrays, lists), it represents an O(1) operation that returns the indices in closed form (a range or domain), so is arguably "property-like" as opposed to "computation-like"
I see that "O(1)ness" as an important constraint. For everything else, there's an iterator...
I read this multiple times without a clear decision, but I am mildly leaning towards, and voted for indices()
. The arguments that I can make about this:
()
Keep in mind that (without caching the result), string.size
is not an O(1) operation either, yet has no parens. I.e., I don't think there's anything that mandates that paren-less functions that are querying properties that could be stored as a field be O(1); it just makes it an easier choice when they are.
I voted for .indices
, but I don't have a strong preference between the two.
I also found myself wondering if .domain
is a good term for this. I think it's accurate (from the spec: "A domain is a first-class representation of an index set") but I'm not sure that I'd want to pay the performance overheads of our domains (heap allocation, reference counting)
@ronawho - that's an intriguing idea. If we imagine for a moment that we had a different type for 1-D non-strided and anonymous domains... well that could just be basically the range type, right?
Separately though I think part of the proposal is that .indices
return a range
and if the type system distinguishes between domain
and range
and considers them different types, then I'm not sure I'd want to have a .domain
method that returned a range (say). I think you're saying that tuple.domain
could literally return a domain, if our implementation for that kind of domain was tight enough. But even then, AFAIK domains have some different methods than ranges that make them more complex to work with supposing you only cared about 1-D non-strided domains. But, that might be not a big problem in practice and might be addressable with the right methods on domains.
BTW there is also .shape
We could define .shape
or .indices
to be somewhat like .domain
:
Right now .shape
returns a tuple of integers, indicating the number of elements in each dimension a la NumPy's ndarray.shape
. Proposed in #5178 by @ben-albrecht. How much are we attached to that?
If it can be either (which it sounds like it can be), I am not sure.
My indecision was based on the assumption that indices
could sometimes be an iterator rather than always a range. However, reading back through the comments, I have not seen a concrete example of this.
Assuming that indices
will only apply to ordered data structures with numeric indices, are there cases where indices
must be an iterator rather than a range?
Assuming that
indices
will only apply to ordered data structures with numeric indices, are there cases whereindices
must be an iterator rather than a range?
If we apply it to an associative domain/array over integers, it would be an iterator.
One could argue that this method/term only makes sense for numeric indices (vs "keys"). Then it might not apply to associative domains/arrays either.
If we apply it to an associative domain/array over integers, it would be an iterator.
Would associative domain/arrays over integers support indices
though? It would be strange for associative arrays and domains to support an indices
method depending on the domain idxType
.
@ronawho: My thinking was similar, but the opposite: That maybe we could retire the somewhat funny .domain
query on arrays (funny in that it relies on a reserved keyword as a method name) in favor of .indices
over time (or now, if there was support). .indices
has the benefit of being a more user-approachable concept while also not suggesting that we need to create domains for types that don't have them (i.e., I think it's reasonable to have .indices
return a range for a tuple and a domain for an array).
@vasslitvinov: I don't think we should re-define shape for this purpose. We talk about rectangular arrays being assignable to one another if they have the same shape. If shape means "indices" then we have to be careful about that. Besides which, nothing about the word shape
suggests "what specific indices can I use to access you" to me.
@ben-albrecht: If we attached .indices
to things that were irregular and didn't have pre-stored set of indices associated with them, then it might be O(n)—either because one might implement it as an iterator (who'd know?) or because they might run that iterator internally to build up the set of indices and return it as... a collection? a domain? something. But you're right that none of the data structures that currently support .indices
in my PR use an iterator for that purpose.
@vasslitvinov wrote:
If we apply it to an associative domain/array over integers, it would be an iterator.
But that's not right (I mis-spoke in the OP and corrected myself): it returns the pre-existing associative domain itself, identically to how .domain
does today.
@ben-albrecht wrote:
Would associative domain/arrays over integers support indices though?
They do in my PR today (because it was easy / seemed orthogonal).
I voted for indices()
because in an ideal start-from-scratch world I would want this to be the single best way to ask for the indices of a 1D sequence type. It would be applicable to tuples, strings, bytes, lists, arrays, and domains. It would take an optional param rank (default 1) and it would always return a range. D.dim()
where D
was a domain would go away and its uses would be replaced by D.indices()
. Then you could write representation-agnostic generic code such as the following, without having to know whether x
was an array or a string or whatever:
if x.type.rank == 1 && /* if we had it */ isSequenceType(x.type) {
// something with x.indices()
}
@gbtitus: I'm not clear on what the role of the optional param rank argument is in your proposal: What would my3DArr.indices(rank=2) do? What would my2DSparseArr(rank=2) do?
Then you could write representation-agnostic generic code such as the following, without having to know whether x was an array or a string or whatever
I don't see anything about the current proposal that would prevent you from writing this same code, simply without the parens.
@ronawho: My thinking was similar, but the opposite: That maybe we could retire the somewhat funny .domain query on arrays (funny in that it relies on a reserved keyword as a method name) in favor of .indices over time (or now, if there was support). .indices has the benefit of being a more user-approachable concept while also not suggesting that we need to create domains for types that don't have them (i.e., I think it's reasonable to have .indices return a range for a tuple and a domain for an array).
Wouldn't that present problems with forall
and distribution of work? E.g. for a 1-D Block distirbuted array, if .indices
returned a range, and we didn't have .domain
, forall i in MyBlockArr.indices { }
would not run across all the locales on which the array is distributed. How could .indices
and .domain
be unified given that?
Wouldn't that present problems with forall and distribution of work?
Maybe I missed a proposal somewhere along the way. In my proposal and branch, .indices
on an array always returns its domain, even if it's 1D.
Oh, sorry I misunderstood. You were saying we'd simply deprecate .domain
but that .indices
would do literally the same thing for arrays (i.e. return a domain). That sounds reasonable to me, although I don't have a strong feeling about it either way.
Right. I'm not proposing that in this issue, but I think it would make sense in that it would remove functionality that's about to become accessible in two ways (.domain
and .indices
) while also removing the weirdness of supporting a method call that's a reserved word (and one that a user can't define themselves, at least without knowing the hidden secrets of our implementation). And if we were to either make all these types respect .domain
or all of them respect .indices
I think .indices
is the obvious better answer (clearer meaning, doesn't rely on reserved word, doesn't imply things have domains when they don't).
Something bothers me about a paren-less method (x.indices
) returning an iterator. Are there many other instances of this in the standard library?
Also, I wonder if will present a challenge with FCF/FCI (#15058) support, where a user would like to capture the callable iterator. If we used parens, they could do this by omitting the parens:
var method = data.indices;
for i in method() {
...
}
If we decide indices
always return a range or domain then I am strongly in favor of no parens (indices
).,
@bradcray The parens are so you can do:
if /* if we had it */ isSequenceType(x.type) {
for param rank in 1..x.type.rank {
// something with x.indices(rank)
}
}
I don't see anything about the current proposal that would prevent you from writing this same code, simply without the parens.
I was worried about the fact that in the current proposal, for an array indices
(with or without parens) returns a domain, not a range, and what that might imply about one's ability to write the same code no matter what kind of sequence type was involved.
Something bothers me about a paren-less method (x.indices) returning an iterator. Are there many other instances of this in the standard library?
I think the first thing we need before being too bothered by this concern is for someone to come up with a compelling case of a type that would want .indices
to return an iterator (or what I've been saying: have it be implemented as an iterator).
[I apologize if I opened up this Pandora's box myself by mistakenly stating that it would need to be an iterator in the cases of sparse or associative arrays.]
for param rank in 1..x.type.rank { // something with x.indices(rank) }
The problem I see with this approach is that the indices of a multidimensional collection aren't always representable as the cross product of per-dimension index ranges, sparse arrays being a key example. It also has the downside of injecting numbering ("are dimensions 0-based or 1-based?") into an interface that's being designed to avoid the need for such numbering in many cases.
I was worried about the fact that in the current proposal, for an array indices (with or without parens) returns a domain, not a range, and what that might imply about one's ability to write the same code no matter what kind of sequence type was involved.
You're obviously correct in saying that if .indices
returns different types in different situations then a given piece of code won't necessarily be generic across all types. But I also think that's OK because I don't view the primary goal of .indices
as being to support generic programming. Rather, I view its goals as being (1) calling a property that's common across a number of types by a consistent name for familiarity, (2) removing hard-coded assumptions about what indices our built-in types use (e.g., "0-based or 1-based?")
By analogy, .size
doesn't return the same type for all ranges, domains, files, etc. but it's still helpful to have a consistent name for it.
That said, like .size
, I also believe there's also enough commonality across domains and ranges (the main types I expect .indices
to return) that many simple patterns will translate across different scenarios even though the types returned may differ. By analogy, many forall loops over domains, arrays, ranges, or tuples can be written similarly in simple cases regardless of the fact that the index types may turn out to be very different. But it's also easy to write forall loops (or code near .size
queries for that matter) that are sufficiently sensitive to the specific type being used that they'll break in other cases. But I don't see that as an indication that forall loops or .size queries are broken, just that when writing generic-ish code, the author needs to be mindful of what assumptions they're baking into it.
... the indices of a multidimensional collection aren't always representable as the cross product of per-dimension index ranges, ...
and
... has the downside of injecting numbering ("are dimensions 0-based or 1-based?") into an interface that's being designed to avoid the need for such numbering ...
Points taken, and I guess for the latter one would just have to go through x.domain.dims()
to get to the dimensions, so indices
or indices()
wouldn't be involved anyway.
I've changed my vote now, because this query seems more data-like than computation-like to me.
Something bothers me about a paren-less method (x.indices) returning an iterator. Are there many other instances of this in the standard library?
I think the first thing we need before being too bothered by this concern is for someone to come up with a compelling case of a type that would want
.indices
to return an iterator (or what I've been saying: have it be implemented as an iterator).[I apologize if I opened up this Pandora's box myself by mistakenly stating that it would need to be an iterator in the cases of sparse or associative arrays.]
OK. I think that clears things up for me.
To summarize: The current design is that indices
will always be a domain
or range
(unless someone proposes a case for otherwise), therefore I am voting in favor of .indices
.
Thanks to everyone for stepping up to vote and for caucusing. Based on the votes and discussion, let's go with paren-less .indices
(it may not have felt like it, but I was pretty on the fence when I posted the voting options, and the discussion definitely helped me make up my mind as well).
[edited to remove
.lowIdx
and.highIdx
from the proposal for simplicity. In the comments, it was argued that such queries can be made from.indices
when applicable and that this keeps things simpler / more orthogonal; we can always reconsider later if compelling arguments for.lowIdx
and.highIdx
come up].As a Chapel programmer, I'd like to see most standard collection types support the method~s~
.indices
~,.lowIdx
, and.highIdx
~ in order to support programming that's neutral w.r.t. whether 0- vs. 1-based vs. something-else-indexing is used by the type. For example, these would permit me to writefor many collection types.
As a starting point, it would be nice to support these for tuples, strings, bytes, lists, and arrays.
[edit: and enums?]
~Interestingly, I think the implementations and possible uses of these are likely to vary a bit between types.~ I would expect tuples, strings, bytes, and lists (edit: and enums) can return a range for indices whereas ~rectangular~ arrays would likely return their domains ~and associative arrays would implement indices as an iterator~.
~Similarly,
lowIdx
can beparam
for tuples, strings, bytes, and lists, but not arrays; andhighIdx
can beparam
only for tuples.~