`FDataGrid.__call__` any (compatible) shape?

ego-thales commented 10 months ago

Hi,

Currently, FDataGrid.__call__ only takes shapes (n_samples, dim_domain) or (dim_domain,) (or () when dim_domain == 1). I think it would be natural to allow any shape + (dim_domain,) to allow for multiple dimension evaluation without going through the trouble of flattening before and unravelling after.

What do you think? Élie

vnmabus commented 10 months ago

Sorry, I have trouble understanding the proposed behaviour, can you show an example?

ego-thales commented 10 months ago

For example, suppose you have fd with dim_domain = 3, dim_codomain = 4. If I want to evaluate fd on an edge of its domain (which is a 2D surface of size, say, n*m), I would do the following:

Generate eval_points, the matrix of all points of the edge, which would have shape (n, m, 3),
Simply call fd(eval_points) and get res with shape (len(fd), n, m, dim_codomain).

Is it a bit more understandable?

It is essentially a situation I faced in implementing #561.

eliegoudout commented 7 months ago

For future reference in case we revisit, I make things more clear.

The proposed version is that :

__call__ accepts in.shape = (any_shape, dim_domain) and returns out.shape = (n_samples, any_shape, dim_codomain),
Furthermore, in the special case dim_domain = 1, the corner case in.shape = () could be allowed and interpreted as in.shape = (1,) (unsure if it's a good idea though...).

We can also discuss about the best position for any_shape in the input and output shapes tuples.

vnmabus commented 7 months ago

So, I am toying with generalizing the functions in https://github.com/GAA-UAM/scikit-fda/tree/feature/ndfunction, and I am starting to modify the evaluation. What I want to achieve is:

All function classes FDataGrid, FDataBasis, etc represent arrays of functions (following a NDFunction protocol), no longer limited to the 1D case. Thus we have an arbitrary shape, with the shape of the array of functions themselves.
All the functions in the array receive as an input an array (so we have an additional property input_shape) and return another array (output_shape).
Apart from that you can evaluate several points at a time (and your suggestion is that they can also have arbitrary shape, lets call it points_shape).
This is further complicated by the aligned and grid parameters (which maybe should be split into separate functions, at least the second one).
We can complicate it further by considering broadcasting possibilities.

This is a bit difficult to reason about, so I would appreciate any suggestions given all these constraints.

vnmabus commented 7 months ago

So, for the aligned non-grid case (the easiest to reason about), we would have:

In-shape: we need to include points_shape (arbitrary) and input_shape (determined by the functional object). I would say that the natural order here is (points_shape, input_shape), as it coincides both with NumPy broadcasting order and with some interfaces such as those in SciPy's interpolation module, e.g. LinearNDInterpolator. It can also be interpreted as passing a list of points in the normal case. A small problem with allowing arbitrary input and output shapes is that input_shape=() and input_shape=(1,) are no longer equivalent. In the first case, if you pass as in-shape (10, 1), the last 1 would be interpreted as part of points_shape and thus adds an additional dimension to the returned points. In the second case, if you pass as in-shape (10,), that would be an error. I would like to hear your opinion about this.
Out-shape: we need to include shape, points_shape and output_shape. If we agree that in-shape must be (points_shape, input_shape), it makes also sense to return (points_shape, output_shape) in that order. That leaves two possibilities, namely placing shape at the leftmost position or at the rightmost one. We were placing it at the left, as the leading dimension was also shape in the internal representation of functions, but that can be changed if there are strong reasons. This would leave out-shape as (shape, points_shape, output_shape).

vnmabus commented 7 months ago

Now, the unaligned case has exactly the same out-shape, but shape has to be present in in-shape too. The most natural way would be to have matching input and output, and so in-shape should be (shape, points_shape, input_shape).

It would have been great to be able to discern between aligned and unaligned from the shape of the evaluation points alone, without the need of an align keyword parameter, but I do not see how that could be possible given that points_shape is arbitrary.

Also, in this proposed API there is no discussion about broadcasting at all. We should probably discuss if/when broadcasting should be allowed and how.

GAA-UAM / scikit-fda

`FDataGrid.call` any (compatible) shape? #562

GAA-UAM / scikit-fda

`FDataGrid.__call__` any (compatible) shape? #562

`FDataGrid.call` any (compatible) shape? #562