Open mhoemmen opened 3 years ago
Do I interpret correctly that a standard conformant implementation is allowed to return for example a mdspan with a layout_stride when the input mdspan is of layout_left and the slicing arguments to submdspan in fact are all full_extent except the last index/index range?
What would be the use case of allowing this implementation freedom and thus lose type information that tells us and the compiler that the returned mdspan corresponds to a contiguous memory area?
The big downside I see with such a standard specification is that the resulting run-time stride value of the first dimension risks inhibiting auto-vectorization of the compiler. It also prevents people from exploiting the contiguous memory property (without a runtime test) for getting rid of nested for loops when all you want is to apply some element-wise operation. Thus, users would not be able to trust that the mdspan they get after a submdspan operation is efficient. Without this trust, it seems users need to create hacky work-arounds to ensure that their code is efficient.
As a side point, to aid the latter use case of exploiting contiguous memory property further it would be nice having a single-dimension vectorizing operator()(Index_T) that for mdspan:s with layout_left or layout_right makes it possible to iterate over the whole multi-dimensional mdspan as if it was a single dimension vector, just like in e.g. Matlab.
Do I interpret correctly that a standard conformant implementation is allowed to return for example a mdspan with a layout_stride when the input mdspan is of layout_left and the slicing arguments to submdspan in fact are all full_extent except the last index/index range?
That is not correct. Please refer to P0009R14.
As Mark said: we clarified that in the last revision. Regarding the iterate over the whole array as a single dimension vector, we explicitly decided against that. You can still do this by doing something like this:
template<class MDS>
void foo(MDS a) {
auto ptr = a.data();
auto acc = a.accessor();
for(size_t i = 0; i<a.size(); i++)
do_something_with_element(acc(ptr,i));
}
Assuming a
is contiguous.
The reason not to have a convenience function on mdspan itself is that it can lead to highly unexpected results.
For example assuming we had that operator, look at the following thing:
void foo(mdspan<int, dextents<2>, layout_left> a, mdspan<int, dextents<2>, layout_right> b) {
for(int i=0; i<a.size(); i++)
a(i) = b(i);
}
Would you expect this to be a simple assignment? If so you are mistaken, since this actually would represent a transpose (assuming the operator is going through memory in order). The problem is that we got layouts and thus you get unintuitive results when working on the data as a vector.
Do I interpret correctly that a standard conformant implementation is allowed to return for example a mdspan with a layout_stride when the input mdspan is of layout_left and the slicing arguments to submdspan in fact are all full_extent except the last index/index range?
That is not correct. Please refer to P0009R14.
So now I see that the wording was clarified in R14. Great news indeed! It would have been a deal-breaker for many in adopting mdspan and probably hard to fix in later standards due to ABI issues.
mdspan looks set to become a really great addition to C++ standard and it is truly impressive work by all you contributors!
acc(ptr,i)
Thanks for the very clear explanation. I now understand the issue, although I cannot help but feel a bit sad that the layout flexibility prevents this common and convenient syntax which would have contributed to the feeling of using a language intended for numerical computing.
In many applications I suspect people will stick with a single layout policy (layout_left compliant with BLAS) and only rarely change layout while elementwise operations will be very common; changing layout often without careful control of the code in all the affected places would anyway risk serious performance regressions when the innermost loop suddenly jumps with a non-unit runtime stride after a layout change. But I understand that people may have different opinions about the best trade-off here and there is also the heterogenous computing argument that is brought up in the standard paper.
Do I understand correctly that the accessor-based approach you provided above as solution would suffer from the same problem that if I have two or more different mdspans I want to do elementwise operations on and I change the layout of one of them then I have a bug? So people would anyway, just like for the operator()(Index_T) approach, have to guard this kind of code with static assertions on the layout type if they intend to treat the layout as a parameter that you can change without careful code inspection. Element-wise operations are very common so this seems like a real problem for the particular use case of “often switching between different layout types in the same code base” even as now when the operator()(Index_T) method is absent.
Have other short-hand notations to deal with this been considered that makes the code “feel” more like intended for numerical computations while still preventing inadvertent bugs? I’m e.g. thinking of compact syntaxes to obtain a 1D span/mdspan out of a mdspan with layout_left or layout_right while catching “hidden” erroneous transposes as compile-time errors: If my_mdspan is layout_left my_mdspan(lvec) => 1D span my_mdspan(rvec) => static assert fires (or remove it from overload set via a concept constraint)
If my_mdspan is layout_right my_mdspan(lvec) => static assert fires (or remove it from overload set via a concept constraint) my_mdspan(rvec) => 1D span
Or alternatively introducing methods lvec() and rvec() instead of tag based dispatch. (I realize you can do this by manually creating a new span/mdspan, explicitly taking the data pointer out of the mdspan and creating a new span out of it, so this is obviously syntactic sugar plus also bug capturing, but the importance of this should not be underestimated)
I suppose eventually a more generalized reshape functionality is also needed.
It is certainly possible that someone could propose such a convenience thing, but it gets fairly iffy fairly fast in terms of corner cases which would lead to subtle bugs. Note that in general it is more common than you think that we have multiple different layouts in our codes, and furthermore that the best layouts for each data structure are architecture dependent. One particular area where that happens is batched BLAS for example. But also all kinds of finite element codes where it for example depends on the arch whether you wont to interleave vector data per element or leave the vector data per element consecutive.
[mdspan.submdspan] in P0009r12 only specifies the behavior of the returned
mdspan
, without specifying its template parameters. That is, the "see below" insubmdspan
's return type doesn't point to anything (there's no "Returns" clause). That's OK for the layout (we don't even want to specify that), but probably not OK for the element type and accessor.Please see the somewhat related issue https://github.com/kokkos/mdspan/issues/63 .