Closed MasonProtter closed 1 month ago
I agree that it's a bit of a misnomer. I kind of like that it returns indices though. It's cheap and well defined. Maybe we should have both variants.
About returning chunks of data (i.e. elements), what would typeof(chunk)
be for generic A
? E.g. what if A
is a CuArray
? Also a CuArray
?
I think that should be up to the implementer, but the default should be a view
(so that it's just as cheap as the current version)
We could also have chunk_indices
or index_chunks
or something like that for the current behaviour
I agree that the name is not very good, and appears to signify the data of the chunk, not its indices. I have actually implemented chunk_indices
at some point, but took it back. Independently of the name, what I think is that the option that returns the indices is the easiest for the user, because otherwise, in
for chunk in chunks(A; n)
f(chunk)
end
f
is a function that operates on a collection, not in one element of the collection. My impression (and my own use of it) generally is that one has a function that operates on an element, and we want to apply it in parallel in a collection of such elements. Returning the indices makes everything very explicit for the user (and less prone to introducing allocations, instabilities, method errors).
That's why, after experimenting with some names, I ended up not changing the simplest name chunks
to return something other than the indices, but I agree that the name is not ideal.
At the same time, except for the breaking aspect of the changes, I wouldn't mind that we had both behaviors in different functions (or just using eachindex
).
I guess from my point of view, f
is typically map
or reduce
or whatever, so I do think of it as operating on the (sub) collection.
I do not disagree. I think the name is not good.
Also I think that most users of this package will end up using OhMyThreads
instead, so in the long run having chunks(eachindex(),n)
and chunk_indices
here will be ok.
I'm just afraid of releasing another breaking change. From what I've checked none of the dependents (which are only ~10 by now) updated to the new interface.
a non-breaking change could be done by adding chunk_indices
, and then mass-PR the packages? (i can help with that)
We have to mass-PR packages for deprecating the old syntax anyway, so that's one alternative before releasing 3.0.
There are ~10 packages that depend on it, that's simple. I'm worried about the now various discourse threads that propose using this package, and the possible breaking of other non-public scripts.
Speaking of misnomer, Chunk
should probably also be called ChunkIterator
- similar to PartitionIterator
that Iterators.partition
returns - or, in light of this thread, maybe even ChunkIndicesIterator
.
(Even if we don't like the "Iterator" part, it should at least be the plural Chunks
).
To summarize, the current proposal is
chunks
to chunks_indices
,chunks
that returns (views into) data rather than indices,Chunk
type to ChunkIndicesIterator
Chunk
was an internal name up to 2.4.3, and I made it public just because OhMyThreads is using it, but it is not even documented. So that change is essentially non-breaking. getchunk
and internal function. It is documented and exported now, but I do not see any use of it really outside the package, and having it exported just limits out implementation flexibility. See 3.x release :)
I feel like
chunks
is a bit of a misnomer. If I do[x for x in chunks([:a, :b, :c, :d, :e]; n=2)]
, it seems much more natural and also more useful to me that the result would be[[:a, :b, :c], [:d, :e]]
rather than[1:1:3, 4:1:5]
.This way instead of writing
we could just simply write
The current behaviour could then be emulated by doing
Any thoughts on this? @lmiq @carstenbauer ? I think doing it like this would potentially make it easier to do things like support
Dict
andDictionary
or other more exotic containers.