Open mhoemmen opened 2 years ago
Is the ask specifically for a reshape
that changes size, or also for the case of a reshape
with new extents but same backing storage? The latter actually seems vaguely reasonable, possibly as a constructor on the new shape that consumes an existing mdarray or free function, while an actual resizing method seems quite nasty.
Hi Tom! : - ) I'm not quite sure what they are asking, so I'm preemptively writing the discussion why we're not supporting the size-changing thing.
@trws wrote:
... or also for the case of a
reshape
with new extents but same backing storage?
They definitely want that. Did you have something in mind other than just mdspan{existing_backing_storage, new_extents}
plus error checking?
They definitely want that. Did you have something in mind other than just
mdspan{existing_backing_storage, new_extents}
plus error checking?
Essentially that but maybe as a move constructor only? That would give the effect of the free function you mentioned but in a form that people are somewhat used to being wary of.
Either of those approaches seem like a reasonable way to go about a reshape or restride, a fully static input and output could even be checked at compile time. We don't do exactly that because of the way our interfaces work, but we allow swapping out "layouts" in an analagous way as long as they are both valid for the backing storage, as well as allowing an extra 0-length dimension to exist projected out. Both features get a pretty significant amount of use, especially in codes that project their index space differently for different parts of the calculation.
Thanks @trws ! : - ) There are a few different operations being discussed here. I'll list some of them.
Change the extents of an mdarray
in place, possibly with reallocation. This would use a member function.
Take and consume an existing mdarray
. Return a new mdarray
with different layout and/or extents. Reuse the input mdarray
's storage, if possible. This would be either a new kind of mdarray
constructor taking mdarray&&
and (a layout mapping or extents), or a nonmember function taking the same parameters and returning mdarray
.
Take an existing mdspan
in
, and view its data handle as a mdspan
out
using new layout and/or extents. Precondition: new layout and/or extents would not result in out
viewing any elements other than what in
views. This would be a nonmember function that takes an mdspan
and (a layout mapping or extents), and returning mdspan
. It would just invoke mdspan
's constructor, and add a combination of compile-time and run-time error checking.
I would oppose (1) outright. I think (2) is an interesting idea. I don't think (3) needs to be part of the Standard Library, as it's not general enough.
Yup, sounds right @mhoemmen. (1) is right out, (2) is mainly what I was thinking, though having a way to get an mdspan from an mdarray in a manner similar to (3) might be interesting, optional parameter(s) to the view member perhaps, going from span to span doesn't seem like it would buy much.
Why mdarray omits reshape / resize
Some users have asked why mdarray does not have a "reshape" or "resize" function. This would change the extents of an existing mdarray "in place," like
vector::resize
. We should add text to P1684 explaining why mdarray does not have these functions. Here is a draft of this text.Reasons not to add a reshape / resize member function to mdarray
std::array
. Thus, the reshape operation's preconditions would depend on the container type, making it more difficult to write generic code. mdarray would also need a new concept to know whether the container has aresize
method (or some other way to resize).vector
has the same problem.)vector
, then resizing / reshaping would double-initialize the new elements. This is one justification for thestd::string::resize_and_overwrite
member function in C++23.array
(fixed compile-time size, O(size()
) move cost) andvector
(arbitrarily resize-able, O(1) move cost). mdarray is the multidimensional analogy of this missing type in the C++ Standard Library.What about a nonmember function?
We could consider a nonmember
reshape
function that can consume and recycle an existing mdarray's storage, if reuse is possible.This would permit changing the extents, even if all the input extents are static. After return, the input mdarray would be in that uncomfortable moved-from state, but mdarray already has move constructors and move assignment, so this is not a new discomfort.
Note that there's no way for users to define
reshape
in a way that can reuse the container's storage. This is because mdarray does not expose access to the underlying container after mdarray construction. (This was a P1684R2 change.)What about a different container?
Above, I pointed out that C++ arguably should have standardized a dynamically sized, non-resize-able array type, to fill the gap between
array
(fixed compile-time size, O(size()
) move cost) andvector
(arbitrarily resize-able, O(1) move cost). mdarray is the multidimensional analogy of this missing type in the C++ Standard Library.I do agree that it's convenient to be able to reshape an existing container in place. I've certainly written Matlab code that adds a new column to a matrix now and then. However, there's no need for this reshape-able container to be named "mdarray." In general, we shouldn't necessarily be shy in defining new container types. The fundamental vocabulary type is the view, not the container. Different container types communicate with algorithms through mdspan, which is the common interface between containers and algorithms-on-views-of-data. Algorithms that operate on views-of-data should take mdspan . Containers should define a way to get an mdspan viewing the container's data.
Forcing mdarray to be the dynamically reshape-able container could prevent optimizations. For example, a dynamically reshape-able container need not even use contiguous storage. It might store add-ons as separate allocations, and then have a "pack" operation that transforms it into a single contiguous allocation. This could reduce intermediate reallocation costs.
This is analogous to the "string builder" vs. string distinction.