How should we handle padding?

faymanns commented 6 months ago

Currently the padding behavior is not consistent. Bellow is a survey of the current behavior:

setting spline.coeffs requires a padded array
getting spline.coeffs returns a padded array
setting spline.knots requires an unpadded array which is then padded automatically
getting spline.knots returns a padded array

This is inconsistent in two ways:

Different behavior between spline.coeffs and spline.knots
Different behavior when setting and getting spline.knots

We should choose one default behavior for all for bullet points above. IMO this should be that the user has to take care of the padding when setting or getting spline.coeffs and spline.knots. This is already the case except for the setting of spline.knots.

To make the user's life easier we could provide some convenience functions for padding and perhaps even include a constructor parameter in the spline class that can change the default behavior to a specific type of padding. If we do include the extra parameter in the constructor, it should also take care of the padding for spline.coeffs.

vuhlmann commented 6 months ago

Excellent points, thanks for flagging this.

I would prefer the padding to be done automatically under the hood so that users get the "intuitive" result when creating an open spline (= the ends points are neatly tied in). This would also be consistent with the behaviour of other spline interpolation libraries.

We could maybe have a "boundary" argument in the constructor for open curves that is set to "mirror" by default and takes care of the padding as it is currently implemented in my version. The two other options could be "none" (= zero padding) and "periodic".

Any thoughts on this?

faymanns commented 6 months ago

Great, I think I agree with you.

One question I still have is about the padding of coefficients. I understand that you can mirror or do edge padding on the knots. Can we just use the same padding strategy for the coefficients? For interpolating splines this is trivial since the knots and coefficients are the same, but for the others it is not obvious to me if we can use the same strategy. I tried it on a B3 spline and the result looks reasonable. @vuhlmann can you confirm that we can use the same padding approach for coefficients and knots?

Another thing I would change is to make the padding/boundary keyword argument accept a function instead of string. We can provide implementations for most sensible defaults and it allows the user to use custom padding functions.

edwardando commented 6 months ago

After discussion we propose:

In the Spline(M) constructor we have a keyword pad that follows numpy.pad(). If pad=None then we disactivate padding completely such that coeff or knot arrays should be M+2*pad_size rather than just M when being set, and are returned in full.

faymanns commented 6 months ago

I discovered a potential problem with this approach. If you set the coefficients, then extract the knots and set those exact knots as the knots of your spline, you get a different spline:

import matplotlib.pyplot as plt
import numpy as np
import splinebox

M = 4
spline = splinebox.Spline(M=M, basis_function=splinebox.B3(), closed=False)
data = np.random.rand(M, 2)

# This line would become spline.coeffs = data once we implement the changes proposed above 
spline.coeffs = np.pad(knots, ((1, 1), (0, 0)), mode="edge")

plt.scatter(knots[:, 1], knots[:, 0])

t = np.linspace(0, M - 1, 1000)

yx = spline.eval(t)
plt.plot(yx[:, 1], yx[:, 0], label="set coeffs")

# This line becomes knots = spline.knots once we implement the changes proposed above.
knots = spline.knots[1:-1]
spline.knots = knots

yx = spline.eval(t)
plt.plot(yx[:, 1], yx[:, 0], label="set knots")

plt.show()

This gives the following plot. As you can see, the two spline are close but not identical. Perhaps it's not a problem but we should be aware of it. output

vuhlmann commented 5 months ago

Sorry for not replying earlier! Getting back to the points above:

can you confirm that we can use the same padding approach for coefficients and knots

No, this is incorrect. Mirroring the knots does not result in mirrored control points, and mirroring the control points does not result in mirroring the knots. The behaviour we want is mirrored knots. I created an example to illustrate the issue here (points to interpolate as red circles, knots as blue crosses): https://github.com/uhlmanngroup/splinebox/blob/clean_for_collab/test_knots_ctrlpts.ipynb

make the padding/boundary keyword argument accept a function instead of string

100%, excellent suggestion.

If pad=None then we disactivate padding completely such that coeff or knot arrays should be M+2*pad_size rather than just M when being set, and are returned in full.

I'm not sure I understand what we mean by "we desactivate padding completely"? We cannot decide not to pad - not doing anything is effectively equivalent to zero padding.

faymanns commented 5 months ago

Mirroring the knots does not result in mirrored control points, and mirroring the control points does not result in mirroring the knots.

I understand that they are not the same. I just think it is not ideal that we end up with the following situation for a spline with n knots: spline.knots = np.arange(n) will work no problem, but spline.coeffs = np.arange(n) will throw an error because the you need padding. This seems inconsistent from the user's perspective, but I don't know how to solve this problem.

The behaviour we want is mirrored knots.

That seems very reasonable. However, the knots are not mirrored anymore when they are returned from the spline. Here is an example of that:

import numpy as np
import splinebox

spline = splinebox.Spline(4, splinebox.B3())
spline.knots = np.array([1, 2, 3, 4])
print(spline.knots)

This prints [0.88596491 1. 2. 3. 4. 3.28070175] instead of [1. 1. 2. 3. 4. 4.], as one might expect. I understand that this is because of the filtering. As discussed above, it is probably best to just return the unpadded array.

If we run print(spline.coeffs), we get [1.15789474 0.68421053 2.10526316 2.89473684 4.31578947 3.84210526].

For me, the ideal solution would be if spline.coeffs returns an unpadded array and you can call spline.coeffs = [0.68421053 2.10526316 2.89473684 4.31578947] and the first and last value (i.e. 1.15789474 and 3.84210526) are computed internally. However, I am not sure that is possible and even if it is possible it would need to generalize to arbitrary padding functions.

I'm not sure I understand what we mean by "we desactivate padding completely"?

I simply mean that the user has to take care of all of the padding, i.e. spline.knots = ... requires you to provide a padded array. Same for spline.coeffs = ... and they both return padded arrays.

faymanns commented 4 months ago

After discussing this with Virginie we decided to no do any padding for the control_points since there is no obvious/intuitive way to do it. This means the setter of the knots attribute will take an unpadded array as input and pad it with the padding function provided. The getter of the knots returns an unpadded array. Getter and setter of the control_points attribute will accept and return padded arrays.

faymanns commented 4 months ago

Note: the tangents of hermite splines should be treated like control points and therefore should not be padded.

faymanns commented 4 months ago

Implemented in b22f551.

EPFL-Center-for-Imaging / splinebox

How should we handle padding? #11