Open ego-thales opened 10 months ago
Sorry, I have trouble understanding the proposed behaviour, can you show an example?
For example, suppose you have fd
with dim_domain = 3, dim_codomain = 4
.
If I want to evaluate fd
on an edge of its domain (which is a 2D surface of size, say, n*m
), I would do the following:
eval_points
, the matrix of all points of the edge, which would have shape (n, m, 3)
,fd(eval_points)
and get res
with shape (len(fd), n, m, dim_codomain)
.Is it a bit more understandable?
It is essentially a situation I faced in implementing #561.
For future reference in case we revisit, I make things more clear.
The proposed version is that :
__call__
accepts in.shape = (any_shape, dim_domain)
and returns out.shape = (n_samples, any_shape, dim_codomain)
,dim_domain = 1
, the corner case in.shape = ()
could be allowed and interpreted as in.shape = (1,)
(unsure if it's a good idea though...).We can also discuss about the best position for any_shape
in the input and output shapes tuples.
So, I am toying with generalizing the functions in https://github.com/GAA-UAM/scikit-fda/tree/feature/ndfunction, and I am starting to modify the evaluation. What I want to achieve is:
FDataGrid
, FDataBasis
, etc represent arrays of functions (following a NDFunction protocol), no longer limited to the 1D case. Thus we have an arbitrary shape
, with the shape of the array of functions themselves.input_shape
) and return another array (output_shape
).points_shape
).aligned
and grid
parameters (which maybe should be split into separate functions, at least the second one).This is a bit difficult to reason about, so I would appreciate any suggestions given all these constraints.
So, for the aligned non-grid case (the easiest to reason about), we would have:
points_shape
(arbitrary) and input_shape
(determined by the functional object). I would say that the natural order here is (points_shape, input_shape)
, as it coincides both with NumPy broadcasting order and with some interfaces such as those in SciPy's interpolation module, e.g. LinearNDInterpolator
. It can also be interpreted as passing a list of points in the normal case.
A small problem with allowing arbitrary input and output shapes is that input_shape=()
and input_shape=(1,)
are no longer equivalent. In the first case, if you pass as in-shape (10, 1)
, the last 1
would be interpreted as part of points_shape
and thus adds an additional dimension to the returned points. In the second case, if you pass as in-shape (10,)
, that would be an error. I would like to hear your opinion about this.shape
, points_shape
and output_shape
. If we agree that in-shape must be (points_shape, input_shape)
, it makes also sense to return (points_shape, output_shape)
in that order. That leaves two possibilities, namely placing shape
at the leftmost position or at the rightmost one. We were placing it at the left, as the leading dimension was also shape
in the internal representation of functions, but that can be changed if there are strong reasons. This would leave out-shape as (shape, points_shape, output_shape)
.Now, the unaligned case has exactly the same out-shape, but shape
has to be present in in-shape too. The most natural way would be to have matching input and output, and so in-shape should be (shape, points_shape, input_shape)
.
It would have been great to be able to discern between aligned and unaligned from the shape of the evaluation points alone, without the need of an align
keyword parameter, but I do not see how that could be possible given that points_shape
is arbitrary.
Also, in this proposed API there is no discussion about broadcasting at all. We should probably discuss if/when broadcasting should be allowed and how.
Hi,
Currently,
FDataGrid.__call__
only takes shapes(n_samples, dim_domain)
or(dim_domain,)
(or()
whendim_domain == 1
). I think it would be natural to allow anyshape + (dim_domain,)
to allow for multiple dimension evaluation without going through the trouble of flattening before and unravelling after.What do you think? Élie