RFC: add API to enforce a specified dimensionality

Previous discussion: https://github.com/data-apis/array-api/issues/97 Resolution was to ignore the case of data-dependent (unknown) dimensionalities.

I have a suggestion how to implement it by adding a new function to the API with a specific interface.

This allows to support variable dimensionality in a large number of practical cases. Additionally, it allows to partially control dimension sizes that are currently unknown.

Example 1:

x, [d1, d2, d3] = xp.enforce_shape(x, [None, None, None])

enforces dimensionality to be 3, will fail for inputs of other dimensionalities. If dimensionality is not yet known - will fail in runtime at this operation. Elements d1, d2 and d3 are integers or symbols, not Nones.

Returned x is either the same tensor, or tensor with attached information about shape. Main reason for returning a tensor is that operation could not be cut out of the graph if resulting shape is not used to obtain result.

Example 2:

x, [d1, d2, d3] = xp.enforce_shape(x, [d1_input, None, 4])

expects input to have 3 dimensions, first axis matches to (maybe) symbol d1_input, last axis of length 4.

Example 3:

x, [d1, d2, (axes, n_elements), d3] = xp.enforce_shape(x, [1, None, ..., 3])

expects input to have 3 or more dimensions. First axis is of length 1, second of any length, last of length 3.

Note here, that for ellipsis a tuple with two objects is returned: axes corresponds to x.shape[2:-1], and n_elements is a product of elements in axes.

axes is either a tuple of Union[int, Symbol] (if dimensionality is known) or a Symbol for a tuple of unknown size (if dimensionality is unknown), same type as used to represent unknown shape.

n_elements is either int (if size of all axes is known) or symbol for axis.

Signature

Tensor, List[Union[int, None, AxisSymbol, ellipsis]] -> Tuple[Tensor, List[Union[int, AxisSymbol, EllipsisTuple]]] 
EllipsisTuple = Tuple[TupleSymbol, AxisSymbol] 
              | Tuple[Tuple[SizeOrSymbol, ...], SizeOrSymbol] 
SizeOrSymbol = Union[int, AxisSymbol]

I see this as a good intermediate point: sufficient to deal with multiple practical cases while requiring framework developers to maintain very little.

Downstream package dev perspective

there are cases when a dimensionality should be enforced (e.g. 1d/2d). This is covered by proposed function
there are cases when axes can be split into operation-important and non-important. There is a generic path to deal with these cases by first reducing to a fixed dimensionality, and converting back afterwards. Example: last dimension should be rgb, should convert to hsl. Assuming we have a function that implements conversion for 2d array.

x_rgb, [(axes, n_elements), last_axis] = xp.enforce_shape(x_rgb, [..., None])
x_rgb_2d = xp.reshape(x_rgb, (n_elements, last_axis))
x_hsv_2d = convert_rgb_to_hsv(x_rgb_2d) # or any other function that works with 'fixed' dimensionality
result = xp.reshape(x_hsv_2d, axes + (last_axis,))

if there is a need to e.g. specify last or first dimension, now it is possible with graph-based frameworks too. Standard way is to add validations like tf.assert(...) and bind them in the graph, which can't be done in a cross-framework manner

Frameworks perspective

While the function is novel, it does not introduce any new entities: symbols for axes and symbols for shapes (like TensorShape) should exist anyway. Function behavior overlaps with and extends tf.set_shape by supporting ellipsis.

For frameworks without data-dependent shapes, this is straightforward to implement based on already exposed shape property.

Hi @arogozhnikov, thanks for your interesting proposal. I have a few questions:

Based on our prior discussion (#97) we landed on the decision that ndim must be statically known but the sizes along each dimension (shape) can be unknown, meaning len(x.shape) == x.ndim always. Therefore, I fail to see why we need enforce_shape. To me introducing it would be cumbersome, ex: what happens if x.ndim is not equal to len([None, None, None])?
In your Example 2, allowing d1 to appear on both sides implies that array libraries must support symbolic operations, which I think is out of scope and also an overkill.

Perhaps I missed something obvious...?

Hi @leofang

In your Example 2, allowing d1 to appear on both sides implies that array libraries must support symbolic operations, which I think is out of scope and also an overkill.

Proposal does not introduce any new requirement for any backend. Maybe you were confused by using same variable for input and output? I've just renamed input variable.

If backend is define-by-run (all shapes known), d1_input can be a number or None. Returned axis is a number. As I mention, for non-graph frameworks, this can be implemented as a plain python function, no obstacles for them.

If backend is symbolic, d1_input can be a number, a symbol or None. Symbol support can be marked as optional, but I expect that computations on shapes are too common to ignore the case. Returned axis is a number or symbol.

we landed on the decision that ndim must be statically known but the sizes along each dimension (shape) can be unknown

That's what I mention in the first line.

This drops support for any dynamic shapes in graph computations (which is still quite a lot). If you check how may people talk/ask/comment about ellipsis in einops, you'll see how wide is it. Building something that can work with eager tf, but not symbolic is a so-so solution, same for mxnet and torch.fx. In case API gets traction, you'll get this request immediately
Now why was this decision (allow only known dimensionality) was taken: previously, there was no cross-framework API design that could work for static and dynamic shapes. In this proposal I show that there is a minimalistic solution to this technical problem that isn't too complex and that is sufficient for a number of downstream scenarios, thus decision to ignore dynamic shapes should be reevaluated.

Thanks @arogozhnikov, interesting proposal. I have to admit I find it a little hard to wrap my head around - probably because I have worked with and on eager libraries a lot, and don't have concrete use cases in mind where I've needed something like this.

there are cases when axes can be split into operation-important and non-important. There is a generic path to deal with these cases by first reducing to a fixed dimensionality, and converting back afterwards. Example: last dimension should be rgb, should convert to hsl. Assuming we have a function that implements conversion for 2d array.

Let me try to unpack this a little.

There, x_rgb is an N-dimensional array, we don't know what N is, and the same is (n1, n2, ..., n_N, 3).
And prod(n1, n2, ..., n_N) == n_elements.
To figure out n_elements you introduce shape_components (is this a typo, should it also be enforce_shape?).
Isn't n_elements also calculable as xp.prod(xp.shape(x_rgb)[:-1])? I think yes, and that happens to be easy here because of [..., None] - but if that expression gets more complex, it's harder to express without enforce_shape.

If you have more real-world code example to point at, that would help me I think.

On a more philosophical level - we don't have APIs that produce arrays with unknown dimensionality, so do we actually need to be able to deal with that? I may be missing something here, but it seems to me like we should also want/need APIs that produce such arrays in order to need a function to parse those shapes?

To figure out n_elements you introduce shape_components

There are two parts: n_elements (to combine these axes) and axes (to reconstruct original shape, that's also important).

is this a typo, should it also be enforce_shape?

Thanks, this was a typo (changed name while writing proposal), just corrected that, now it is enforce_shape.

Isn't n_elements also calculable as xp.prod(xp.shape(x_rgb)[:-1])

That's right, it should be calculable like that, but for unknown shapes (symbols) xp.shape may contain Nones. And if dimensionality is unknown (number of components in tuple), it may return None as standard says nothing about it (and some frameworks do just return None in their current interface). This prevents reshape-containing manipulation on such tensors.

happens to be easy here because of [..., None] - but if that expression gets more complex, it's harder to express without enforce_shape

For cases I want to address here, you only need to compute prod(x.shape[start:end]), and that's enough (indices start and end may be positive or negative). Again, problem is just this can't be computed with a current standard.

Additionally, currently there is no way to enforce minimal or exact dimensionality. E.g. if first axis is batch, and the last axis is channel, but between them there may be >= 0 axes, there is no way to express this requirement. Expression x.shape[1:-1] does not fail for 1d tensors. Proposed enforce_shape incorporates this (get necessary components, enforce minimal or exact dimensionality, bake check into graph).

On a more philosophical level - we don't have APIs that produce arrays with unknown dimensionality, so do we actually need to be able to deal with that?

You mean - do you need to deal with arbitrary shapes? I am thinking from einops perspective (probably the first adoption would come from lib devs). Einops gets tensors from user, and users define tensors in whatever framework, and can pass tensors of any dimensionality, symbols too, so that's an important piece for me.

If you have more real-world code example to point at, that would help me I think.

Some common cases:

checking dimensionality (and maybe some components) is common, just too common. Currently standard proposes using .shape, x that will not work for graph frameworks. assert x.ndim == 2 is meaningless for symbolic tracing.
Einsum with ellipsis can be can be implemented with this functionality for any framework. Like 'b ... c, c c_out -> b ... c_out '
attention that works over an arbitrary dimensionality (e.g. input can be text, image, or video. This translates to making query to 3d, 4d, 5d tensors). Usually users squash these tensors to 3d and then reshape back, but with proposal that can be done automatically.
picking elements according to criteria (like hard attention). E.g. for every image tensor in a batch we want to pick the vector with highest norm. (input: 'b ... c', output: 'b c')
other functions that require shape (reshape as tensor, repeat as tensor). For symbolic frameworks this currently won't work.

Now, to be fair, there is a number of cases that this proposal won't address. Like, if you have ndimage, and want to make subsampling / upsampling over one specific axis, this is doable with tools currently in a standard. But if one wants to do this for every axis, unfortunately I see no simple universal tool for it right now.

In other words, when axes of variable dimensionality should be manipulated, it's hard to put into symbolic tracing without specific functionality. A bit hard to do this without some kind of generic ifelse operation or dimensionality-dependent dispatching.

and axes (to reconstruct original shape, that's also important).

Agreed, that matters too.

Proposed enforce_shape incorporates this (get necessary components, enforce minimal or exact dimensionality, bake check into graph).

For minimal dimensionality, I've never been too happy with np.atleast_2d & co. But probably that has to do with it also using np.asarray to coerce to ndarray. If enforce_shape would be shorthand for:

if x.ndim < 2:   # example for atleast_2d
    x = reshape_to_2d_by_prepending_axes(x)

then that does seem useful.

You mean - do you need to deal with arbitrary shapes? I am thinking from einops perspective (probably the first adoption would come from lib devs). Einops gets tensors from user, and users define tensors in whatever framework, and can pass tensors of any dimensionality, symbols too, so that's an important piece for me.

Dealing with arbitrary shapes - kind of, but perhaps not exactly. You must as a library author indeed anticipate getting array inputs which can have any dimensionality. I think the difference between "arbitrary shape" and "unknown shape" here is whether input_array.ndim is an integer or unknown.

Einsum with ellipsis can be can be implemented with this functionality for any framework.

@leofang is planning to propose a linalg.einsum function for the standard soon, so that should have synergy here.

In other words, when axes of variable dimensionality should be manipulated, it's hard to put into symbolic tracing without specific functionality. A bit hard to do this without some kind of generic ifelse operation or dimensionality-dependent dispatching.

Thanks for the examples @arogozhnikov. My intuition here is that your proposed function has two goals here:

Make it easier to deal with arbitrary-but-known-dimensionality inputs to array-consuming libraries,
Expand support for delayed/graph-based execution from "unknown axis size is allowed" to "unknown dimensionality is allowed",

(2) is probably a superset of (1). I have to think more about the implications here, and whether the API surface added for (2) also makes sense in eager libraries, or if that's then always a "do-nothing function call".

It'd be great to get some input from maintainers of MXNet, Dask, and TensorFlow here - I think those are the main beneficiaries of (2).

If enforce_shape would be shorthand for: [reshape_to_2d_by_prepending_axes] then that does seem useful.

Hmm, didn't think about it, that's a good point. I can imagine adding one more constant like xp.OptionalAxis:

x, [d1, d2] = xp.enforce_shape(x, [xp.OptionalAxis, None])

with a restriction that all OptionalAxis should be on the left.

My intuition here is that your proposed function has two goals here:

Make it easier to deal with arbitrary-but-known-dimensionality inputs to array-consuming libraries,

yes

Expand support for delayed/graph-based execution from "unknown axis size is allowed" to "unknown dimensionality is allowed",

That's correct.

Additionally: known dimensionality with unknown components still precludes a number of operations. E.g. einops operation i j k -> i (j k) requires knowing dimensions or having symbols. Nones returned by xp.shape are not sufficient.

So the goal is to have a function that can address points above and deal with all scenarios (known shape, unknown components, unknown number of dimensions).

It'd be great to get some input from maintainers of MXNet, Dask, and TensorFlow here - I think those are the main beneficiaries of (2).

Completely agree. Symbolic tracing libs are the ones to benefit, but also the ones who need to make additional steps, so their input is critical.

data-apis / array-api