GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
287 stars 51 forks source link

Fix FDataIrregular's __getitem__ #614

Closed pcuestas closed 3 weeks ago

pcuestas commented 3 weeks ago

FDataIrregular's __getitem__ had incorrect computation of the start_indices of the new object.

This could perhaps be considered a hotfix.

Added test to check correction (previous version did not pass this test). The previous version was incorrect unless all samples in the new FDataIrregular object had the same number of measurements; this is why it has been hard to notice the mistake.

vnmabus commented 3 weeks ago

I am honestly unable to see the difference. Could you please explain, maybe using an example?

PS: A hotfix is not necessary as irregular functional data is not yet in master.

pcuestas commented 3 weeks ago

I am honestly unable to see the difference. Could you please explain, maybe using an example?

Why the previous code does not work

Short explanation

The change in line 1412 is not equivalent because the differences between consecutive elements are shifted to the right when I concatenate an initial element 0.

Longer explanation

By definition, in the start_indices array, start_indices[k] contains the first index of the k-th observation's start in the points and values arrays. Therefore, start_indices[k+1] - start_indices[k] must be the numbers of measurements of the k-th observation.

In this code, chunk_sizes contains the number of measurements per observation. So cumsum = np.cumsum(chunk_sizes) is such that cumsum[k+1] - cumsum[k] contains the number of measurements of the k+1-th observation (and cumsum[0] the measurements number of the 0-th) so the resulting ammounts are shifted.

An example

Consider chunk_sizes = [1, 2] (that is, the 0-th observation has 1 measurement and the 1-st observation has 2). Then:

>>> np.cumsum(chunk_sizes) - chunk_sizes[0]
array([0, 2])
>>> np.concatenate([[0], np.cumsum(chunk_sizes)])[:-1]
array([0, 1])

These are not equivalent and the first is wrong, because the 1-st observation should start in position 1 and not 2.

PS: A hotfix is not necessary as irregular functional data is not yet in master.

Oh, OK. Sorry, I thought it was.